The First Attempt in SuSiE mixture

Last updated: 2019-08-01

Checks: 2 0

Knit directory: susie-mixture/analysis/

This reproducible R Markdown analysis was created with workflowr (version 1.4.0). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.

R Markdown file: up-to-date

Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.

Repository version: 746720f

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility. The version displayed above was the version of the Git repository at the time these results were generated.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .RData
    Ignored:    .Rhistory
    Ignored:    .Rproj.user/
    Ignored:    analysis/.RData
    Ignored:    analysis/.Rhistory

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.

These are the previous versions of the R Markdown and HTML files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view them.

File	Version	Author	Date	Message
html	034283b	Zhengyang Fang	2019-07-22	Build site.
html	6d9cbaa	Zhengyang Fang	2019-07-22	Build site.
html	2a474f9	Zhengyang Fang	2019-07-22	Build site.
html	c72a707	Zhengyang Fang	2019-07-22	Build site.
html	8092e5a	Zhengyang Fang	2019-07-19	Build site.
Rmd	3893538	Zhengyang Fang	2019-07-19	wflow_publish(dir())
html	5d3e3ef	Zhengyang Fang	2019-07-11	Build site.
Rmd	16c05dc	Zhengyang Fang	2019-07-11	wflow_publish(c(dir()))
html	34ec326	Zhengyang Fang	2019-07-03	Build site.
Rmd	b28c25c	Zhengyang Fang	2019-07-03	wflow_publish(“derivation_in_susie_mixture.Rmd”)
html	3d7b7bf	Zhengyang Fang	2019-07-03	Build site.
Rmd	b041310	Zhengyang Fang	2019-07-03	wflow_publish(“derivation_in_susie_mixture.Rmd”)
html	3eee187	Zhengyang Fang	2019-07-03	Build site.
Rmd	3954e9b	Zhengyang Fang	2019-07-03	wflow_publish(c(dir()))
html	5b50edb	Zhengyang Fang	2019-07-01	Build site.
Rmd	e570acc	Zhengyang Fang	2019-07-01	wflow_publish(“derivation_in_susie_mixture.Rmd”)
html	f303e14	Zhengyang Fang	2019-07-01	Build site.
Rmd	0af8e3b	Zhengyang Fang	2019-07-01	wflow_publish(c(dir()))
html	def2b27	Zhengyang Fang	2019-06-28	Build site.
Rmd	54b89f6	Zhengyang Fang	2019-06-28	wflow_publish(c(dir()))
html	05f778b	Zhengyang Fang	2019-06-28	Build site.
html	341d99d	Zhengyang Fang	2019-06-27	Build site.
Rmd	8dbe042	Zhengyang Fang	2019-06-27	wflow_publish(“derivation_in_susie_mixture.Rmd”)

We give a first try on SuSiE-mixture model. In our model

\[ \textbf y=\textbf X\beta_1+\textbf X\beta_2+\epsilon,\beta_1\sim N(0,\sigma^2\sigma_1^2I),\beta_2\sim susie.prior(\sigma_2^2,\pi),\epsilon\sim N(0,\sigma^2I). \]

We focus on variable selection, i.e. our prior goal is to have a better estimation on \(\beta_2\). The presence of \(\beta_1\) is to better describe the model. Estimating the prior variance \(\sigma_1^2\) may be of interest, but estimating \(\beta_1\) is less interesting to us.

In the following discussion, we first show how we solve this model for fixed \(\sigma^2_1\), which is not difficult for us. However, estimating \(\sigma^2_1\) can be a problem. We give a first try here, by separating the SuSiE part and the ridge regression part apart - we fix the SuSiE output \(\hat\beta_2\), and accordingly estimate \(\sigma^2_1\) only within the ridge regression part. However, there are some fatal issues with this method. See second attemption for a method making more sense, although more complicated.

Part I. Solving \(\beta_2\) for fixed \(\sigma_1^2\).

First it turns out that, as long as we know \(\sigma_1^2\), the additional \(\textbf X\beta_1\) term to SuSiE model does not introduce much difficulty in solving. With the following derivation, we can easily turn SuSiE-mixture model into a vanilla SuSiE model.

Assume the sample variance \(\sigma^2\) is known. And our goal is to estimate the sparse coefficients \(\beta_2\).

Since

\[ \textbf y=\textbf X\boldsymbol \beta_1+\textbf X\boldsymbol \beta_2+\epsilon, \]

where

\[ \textbf X\boldsymbol \beta_1\sim N(0,\sigma_1^2\sigma^2\textbf X\textbf X^T),\epsilon\sim N(0,\sigma^2I). \]

Thus

\[ \textbf y|\textbf X,\boldsymbol \beta_2,\sigma^2,\sigma_1^2\sim N\left(\textbf X\beta_2, \sigma^2(\sigma_1^2\textbf X\textbf X^T+I)\right). \]

For simplicity, let

\[ H=(\sigma_1^2\textbf X\textbf X^T+I). \]

Let \(L\) be a lower triangular matrix s.t. \(LL^T=H\), i.e. \((L^{-1})^TL^{-1}=H^{-1}\). Further more we let \(\tilde {\textbf y}=L^{-1}\textbf y\), \(\tilde {\textbf X}=L^{-1}\textbf X\).

Then

\[ \tilde {\textbf y}|{\textbf X},\boldsymbol \beta_2,\sigma^2,\sigma_1^2\sim N(L^{-1}{\textbf X}\beta_2, \sigma^2 L^{-1}H(L^{-1})^T). \]

That is,

\[ \tilde {\textbf y}|\tilde{\textbf X},\boldsymbol \beta_2,\sigma^2,\sigma_1^2\sim N(\tilde{\textbf X}\boldsymbol \beta_2, \sigma^2 I). \]

This formula together with the prior assumption on \(\beta_2\), will be exactly the vanillaSuSiE model setting here

\[ \tilde{\textbf y}=\tilde{\textbf X}\boldsymbol \beta_2+\epsilon,\epsilon\sim N(0, \sigma^2I), \]

and

\[ \boldsymbol \beta_2 = \sum_{l=1}^L\gamma_l\beta_l,\gamma_l\sim Mult(1,\pi),\beta_l\sim N(0,\sigma^2\sigma^2_2) \]

We can easily solve this with \(SuSiE(\tilde{\textbf X}, \tilde{\textbf y})\).

Part II. Solving \(\beta_1\) with fixed \(\sigma^2_1\) and \(\hat\beta_2\)

Although normally we are not as interested in estimating \(\beta_1\) as \(\beta_2\), we still provide some details in estimating and doing inference on \(\beta_1\). Basically it’s just a ridge regression. In the following derivation, we fix our previous estimate \(\hat\beta_2\) and ignore the uncertainty of it.

We investigate the posterior distribution of the non-sparse coefficients \(\beta_1\)

\[ p(\boldsymbol \beta_1|\textbf X,\textbf y,\hat{\boldsymbol \beta}_2,\sigma^2,\sigma_1^2)\propto p(\textbf y|\textbf X,\boldsymbol \beta_1,\hat{\boldsymbol \beta}_2,\sigma^2,\sigma_1^2)p(\boldsymbol \beta_1), \]

where we have

\[ \textbf y|\textbf X,\boldsymbol \beta_1,\hat{\boldsymbol \beta}_2,\sigma^2,\sigma_1^2\sim N(\textbf X\boldsymbol \beta_1+\textbf X\hat{\boldsymbol \beta_2}, \sigma^2I_n), \]

and

\[ \boldsymbol \beta_1\sim N(0,\sigma^2\sigma_1^2I_p). \]

We use the fact that a Gaussian prior on \(\boldsymbol \beta_1\) is equivalent to ridge regression, we apply the derivation in ridge derivation

\[ \boldsymbol \beta_1|\textbf X,\textbf y,\hat{\boldsymbol \beta}_2,\sigma^2,\sigma_1^2\sim N\left(\hat{\boldsymbol \beta}_{ridge},\sigma^2(\textbf X^T\textbf X+I_p/\sigma_1^2)^{-1}\right), \]

where \(\hat{\boldsymbol \beta}_{ridge}=(\textbf X^T\textbf X+I_p/\sigma_1^2)^{-1}\textbf X^T(\textbf y-\textbf X\hat\beta_2)\).

Part III. Estimating \(\sigma^2_1\)

From the discussion in Part I we can see, as long as \(\sigma_1^2\) is fixed, we can easily solve SuSiE-mixture model. Now the key point is, how to give an estimation on \(\sigma^2_1\).

If we take solving SuSiE into account, things will be more complicated. We try to take a shortcut - assuming we have already estimated \(\beta_2\) and fixed it.

By maxmizing ELBO in ridge regression

Assume we have already estimated \(\beta_2\) and fixed it, and let \(u=y-X\beta_2\). Then we write down ELBO for ridge regression, taking both \(q\) and \(\sigma^2_1\) into account.

\[ ELBO(q,\sigma^2_1)=const-\mathbb E_q\left[p\log\sigma_1+\frac1{2\sigma^2}\left(\|u-X\beta_1\|^2_2+\frac{\|\beta_1\|^2_2}{\sigma^2_1}\right)+\log q(\beta_1)\right]. \]

Under the mean field assumption

\[ q(\beta)=\prod_{k=1}^kq(\beta_k), \]

and assume \(q(\beta_k)=N(\beta_k;\mu_{1k},\sigma^2_{1k})\). From this derivation, the update step for mean field method will be

\[ \beta_k\sim N\left(\frac{\bar {\textbf r}_k^T\textbf X_k}{1/\sigma_1^2+\|\textbf X_k\|_2^2},\frac{\sigma^2}{1/\sigma_1^2+\|\textbf X_k\|_2^2}\right). \]

We can see that the variance is fixed. Also, the optimal mean of \(q(\beta)\) should be the ridge solution \(\hat{\beta}_{ridge}=(X^TX+I/\sigma^2_1)^{-1}X^Tu\). Hence the optimal \(q(\beta)\) will be

\[ N\left(\hat{\beta}_{ridge},diag\{\sigma_{11}^2,\dots,\sigma_{1p}^2\}\right), \]

where \(\sigma_{1k}^2=\frac{\sigma^2}{1/\sigma_1^2+\|\textbf X_k\|_2^2},1\leq j\leq p\).

We plug this \(q(\beta)\) in the formula of \(ELBO(q,\sigma^2_1)\), and get

\[ \begin{aligned} ELBO(\sigma^2_1)=&-p\log\sigma_1-\frac1{2\sigma^2}\mathbb E_q\|u-X\beta_1\|_2^2-\frac1{2\sigma^2\sigma_1^2}\mathbb E_q\|\beta_1\|_2^2-\mathbb E_q\log q(\beta_1)+const\\ =&-p\log\sigma_1-\frac{\left(\|u-X\hat{\beta}_{ridge}\|_2^2+\sum_{k=1}^p\sigma_{1k}^2\|X_k\|^2_2\right)}{2\sigma^2}-\frac{\left(\|\hat{\beta}_{ridge}\|_2^2+\sum_{k=1}^p\sigma_{1k}^2\right)}{2\sigma^2\sigma_1^2}+\sum_{k=1}^p\log\sigma_{1k}+const \end{aligned} \]

This formula is complicated, and we might need to optimize it numerically. A fatal issue is, it seems that \(\sigma^2\) will inevitable play a role in the expression, but we don’t have access to \(\sigma^2\). I tried to make further assumption to simplify it, e.g. assuming the columns of \(X\) to be orthonormal, but it turns out not to be helpful at all.

An option will be introducing \(\sigma^2\) as a new variable, and numerically optimize \(ELBO(\sigma^2,\sigma_1^2)\) with an alternative iterating method. But that will be really hard. Another option is to fix \(\sigma^2\) with an appropriate value, see the following Issue bullet for which \(\sigma^2\) to choose.

By MLE in ridge regression

We also fix \(\beta_2\) estimated by SuSiE, and let \(u=y-X\beta_2\).

\[ \begin{aligned} p(u,X,\sigma^2_1,\sigma^2)=&\int p(u,X,\sigma^2_1,\sigma^2,\beta)d\beta\\ =&\int p(u,X,\sigma^2,\beta)p(\sigma^2, \sigma^2_1,\beta)d\beta\\ =&\int N(u;X\beta,\sigma^2 I)\cdot N(\beta;0,\sigma^2\sigma_1^2 I)d\beta\\ \propto &\frac{\exp\left\{(\hat{\beta}_{ridge}^TX^Tu-\|u\|^2_2)/(2\sigma^2)\right\}}{\sigma^{n+p}\sigma^p_1}\cdot\int N(\beta;\hat\beta_{ridge},\sigma^2(\textbf X^T\textbf X+I_p/\sigma_1^2)^{-1})d\beta\\ \propto & \frac{\exp\left\{(\hat{\beta}_{ridge}^TX^Tu-\|u\|^2_2)/(2\sigma^2)\right\}}{\sigma^{n+p}\sigma^p_1} \sigma^p |X^TX+I/\sigma^2_1|^{1/2} \end{aligned} \]

It has the same issue as above: they both require the value of \(\sigma^2\) to optimize \(\sigma^2_1\). In the following Issue bullet, we propose an ad-hoc strategy to approximate \(\sigma^2\).

Issue: shared \(\sigma^2\) in SuSiE and ridge regression

One notable fact is that, they share the same residual variance \(\sigma^2\) as in Part I. The negative side is, the method that makes sense would be, approximating \(\sigma^2\) by combining both parts (SuSiE and ridge regression) together, see second attempt.

An ad-hoc strategy is that, since SuSiE also gives an approximation on \(\sigma^2\), we can simply plug it in here, so that \(\sigma^2\) becomes fixed. It ignores the information in the ridge regression part and might lead to a bad approximation on \(\sigma^2\), but at least it makes the algorithm run. See the experiment result for more details.

The First Attempt in SuSiE mixture

Zhengyang Fang

June 27, 2019

Part I. Solving \(\beta_2\) for fixed \(\sigma_1^2\).

Part II. Solving \(\beta_1\) with fixed \(\sigma^2_1\) and \(\hat\beta_2\)

Part III. Estimating \(\sigma^2_1\)