Last updated: 2019-08-01

Checks: 2 0

Knit directory: susie-mixture/analysis/

This reproducible R Markdown analysis was created with workflowr (version 1.4.0). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.


Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility. The version displayed above was the version of the Git repository at the time these results were generated.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .RData
    Ignored:    .Rhistory
    Ignored:    .Rproj.user/
    Ignored:    analysis/.RData
    Ignored:    analysis/.Rhistory

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.


These are the previous versions of the R Markdown and HTML files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view them.

File Version Author Date Message
html 034283b Zhengyang Fang 2019-07-22 Build site.
html 6d9cbaa Zhengyang Fang 2019-07-22 Build site.
html 2a474f9 Zhengyang Fang 2019-07-22 Build site.
html c72a707 Zhengyang Fang 2019-07-22 Build site.
html 8092e5a Zhengyang Fang 2019-07-19 Build site.
Rmd 3893538 Zhengyang Fang 2019-07-19 wflow_publish(dir())
html 5d3e3ef Zhengyang Fang 2019-07-11 Build site.
Rmd 16c05dc Zhengyang Fang 2019-07-11 wflow_publish(c(dir()))
html 34ec326 Zhengyang Fang 2019-07-03 Build site.
Rmd b28c25c Zhengyang Fang 2019-07-03 wflow_publish(“derivation_in_susie_mixture.Rmd”)
html 3d7b7bf Zhengyang Fang 2019-07-03 Build site.
Rmd b041310 Zhengyang Fang 2019-07-03 wflow_publish(“derivation_in_susie_mixture.Rmd”)
html 3eee187 Zhengyang Fang 2019-07-03 Build site.
Rmd 3954e9b Zhengyang Fang 2019-07-03 wflow_publish(c(dir()))
html 5b50edb Zhengyang Fang 2019-07-01 Build site.
Rmd e570acc Zhengyang Fang 2019-07-01 wflow_publish(“derivation_in_susie_mixture.Rmd”)
html f303e14 Zhengyang Fang 2019-07-01 Build site.
Rmd 0af8e3b Zhengyang Fang 2019-07-01 wflow_publish(c(dir()))
html def2b27 Zhengyang Fang 2019-06-28 Build site.
Rmd 54b89f6 Zhengyang Fang 2019-06-28 wflow_publish(c(dir()))
html 05f778b Zhengyang Fang 2019-06-28 Build site.
html 341d99d Zhengyang Fang 2019-06-27 Build site.
Rmd 8dbe042 Zhengyang Fang 2019-06-27 wflow_publish(“derivation_in_susie_mixture.Rmd”)

We give a first try on SuSiE-mixture model. In our model

\[ \textbf y=\textbf X\beta_1+\textbf X\beta_2+\epsilon,\beta_1\sim N(0,\sigma^2\sigma_1^2I),\beta_2\sim susie.prior(\sigma_2^2,\pi),\epsilon\sim N(0,\sigma^2I). \]

We focus on variable selection, i.e. our prior goal is to have a better estimation on \(\beta_2\). The presence of \(\beta_1\) is to better describe the model. Estimating the prior variance \(\sigma_1^2\) may be of interest, but estimating \(\beta_1\) is less interesting to us.

In the following discussion, we first show how we solve this model for fixed \(\sigma^2_1\), which is not difficult for us. However, estimating \(\sigma^2_1\) can be a problem. We give a first try here, by separating the SuSiE part and the ridge regression part apart - we fix the SuSiE output \(\hat\beta_2\), and accordingly estimate \(\sigma^2_1\) only within the ridge regression part. However, there are some fatal issues with this method. See second attemption for a method making more sense, although more complicated.

Part I. Solving \(\beta_2\) for fixed \(\sigma_1^2\).

First it turns out that, as long as we know \(\sigma_1^2\), the additional \(\textbf X\beta_1\) term to SuSiE model does not introduce much difficulty in solving. With the following derivation, we can easily turn SuSiE-mixture model into a vanilla SuSiE model.

Assume the sample variance \(\sigma^2\) is known. And our goal is to estimate the sparse coefficients \(\beta_2\).

Since

\[ \textbf y=\textbf X\boldsymbol \beta_1+\textbf X\boldsymbol \beta_2+\epsilon, \]

where

\[ \textbf X\boldsymbol \beta_1\sim N(0,\sigma_1^2\sigma^2\textbf X\textbf X^T),\epsilon\sim N(0,\sigma^2I). \]

Thus

\[ \textbf y|\textbf X,\boldsymbol \beta_2,\sigma^2,\sigma_1^2\sim N\left(\textbf X\beta_2, \sigma^2(\sigma_1^2\textbf X\textbf X^T+I)\right). \]

For simplicity, let

\[ H=(\sigma_1^2\textbf X\textbf X^T+I). \]

Let \(L\) be a lower triangular matrix s.t. \(LL^T=H\), i.e. \((L^{-1})^TL^{-1}=H^{-1}\). Further more we let \(\tilde {\textbf y}=L^{-1}\textbf y\), \(\tilde {\textbf X}=L^{-1}\textbf X\).

Then

\[ \tilde {\textbf y}|{\textbf X},\boldsymbol \beta_2,\sigma^2,\sigma_1^2\sim N(L^{-1}{\textbf X}\beta_2, \sigma^2 L^{-1}H(L^{-1})^T). \]

That is,

\[ \tilde {\textbf y}|\tilde{\textbf X},\boldsymbol \beta_2,\sigma^2,\sigma_1^2\sim N(\tilde{\textbf X}\boldsymbol \beta_2, \sigma^2 I). \]

This formula together with the prior assumption on \(\beta_2\), will be exactly the vanillaSuSiE model setting here

\[ \tilde{\textbf y}=\tilde{\textbf X}\boldsymbol \beta_2+\epsilon,\epsilon\sim N(0, \sigma^2I), \]

and

\[ \boldsymbol \beta_2 = \sum_{l=1}^L\gamma_l\beta_l,\gamma_l\sim Mult(1,\pi),\beta_l\sim N(0,\sigma^2\sigma^2_2) \]

We can easily solve this with \(SuSiE(\tilde{\textbf X}, \tilde{\textbf y})\).

Part II. Solving \(\beta_1\) with fixed \(\sigma^2_1\) and \(\hat\beta_2\)

Although normally we are not as interested in estimating \(\beta_1\) as \(\beta_2\), we still provide some details in estimating and doing inference on \(\beta_1\). Basically it’s just a ridge regression. In the following derivation, we fix our previous estimate \(\hat\beta_2\) and ignore the uncertainty of it.

We investigate the posterior distribution of the non-sparse coefficients \(\beta_1\)

\[ p(\boldsymbol \beta_1|\textbf X,\textbf y,\hat{\boldsymbol \beta}_2,\sigma^2,\sigma_1^2)\propto p(\textbf y|\textbf X,\boldsymbol \beta_1,\hat{\boldsymbol \beta}_2,\sigma^2,\sigma_1^2)p(\boldsymbol \beta_1), \]

where we have

\[ \textbf y|\textbf X,\boldsymbol \beta_1,\hat{\boldsymbol \beta}_2,\sigma^2,\sigma_1^2\sim N(\textbf X\boldsymbol \beta_1+\textbf X\hat{\boldsymbol \beta_2}, \sigma^2I_n), \]

and

\[ \boldsymbol \beta_1\sim N(0,\sigma^2\sigma_1^2I_p). \]

We use the fact that a Gaussian prior on \(\boldsymbol \beta_1\) is equivalent to ridge regression, we apply the derivation in ridge derivation

\[ \boldsymbol \beta_1|\textbf X,\textbf y,\hat{\boldsymbol \beta}_2,\sigma^2,\sigma_1^2\sim N\left(\hat{\boldsymbol \beta}_{ridge},\sigma^2(\textbf X^T\textbf X+I_p/\sigma_1^2)^{-1}\right), \]

where \(\hat{\boldsymbol \beta}_{ridge}=(\textbf X^T\textbf X+I_p/\sigma_1^2)^{-1}\textbf X^T(\textbf y-\textbf X\hat\beta_2)\).

Part III. Estimating \(\sigma^2_1\)

From the discussion in Part I we can see, as long as \(\sigma_1^2\) is fixed, we can easily solve SuSiE-mixture model. Now the key point is, how to give an estimation on \(\sigma^2_1\).

If we take solving SuSiE into account, things will be more complicated. We try to take a shortcut - assuming we have already estimated \(\beta_2\) and fixed it.

  • By maxmizing ELBO in ridge regression

Assume we have already estimated \(\beta_2\) and fixed it, and let \(u=y-X\beta_2\). Then we write down ELBO for ridge regression, taking both \(q\) and \(\sigma^2_1\) into account.

\[ ELBO(q,\sigma^2_1)=const-\mathbb E_q\left[p\log\sigma_1+\frac1{2\sigma^2}\left(\|u-X\beta_1\|^2_2+\frac{\|\beta_1\|^2_2}{\sigma^2_1}\right)+\log q(\beta_1)\right]. \]

Under the mean field assumption

\[ q(\beta)=\prod_{k=1}^kq(\beta_k), \]

and assume \(q(\beta_k)=N(\beta_k;\mu_{1k},\sigma^2_{1k})\). From this derivation, the update step for mean field method will be

\[ \beta_k\sim N\left(\frac{\bar {\textbf r}_k^T\textbf X_k}{1/\sigma_1^2+\|\textbf X_k\|_2^2},\frac{\sigma^2}{1/\sigma_1^2+\|\textbf X_k\|_2^2}\right). \]

We can see that the variance is fixed. Also, the optimal mean of \(q(\beta)\) should be the ridge solution \(\hat{\beta}_{ridge}=(X^TX+I/\sigma^2_1)^{-1}X^Tu\). Hence the optimal \(q(\beta)\) will be

\[ N\left(\hat{\beta}_{ridge},diag\{\sigma_{11}^2,\dots,\sigma_{1p}^2\}\right), \]

where \(\sigma_{1k}^2=\frac{\sigma^2}{1/\sigma_1^2+\|\textbf X_k\|_2^2},1\leq j\leq p\).

We plug this \(q(\beta)\) in the formula of \(ELBO(q,\sigma^2_1)\), and get

\[ \begin{aligned} ELBO(\sigma^2_1)=&-p\log\sigma_1-\frac1{2\sigma^2}\mathbb E_q\|u-X\beta_1\|_2^2-\frac1{2\sigma^2\sigma_1^2}\mathbb E_q\|\beta_1\|_2^2-\mathbb E_q\log q(\beta_1)+const\\ =&-p\log\sigma_1-\frac{\left(\|u-X\hat{\beta}_{ridge}\|_2^2+\sum_{k=1}^p\sigma_{1k}^2\|X_k\|^2_2\right)}{2\sigma^2}-\frac{\left(\|\hat{\beta}_{ridge}\|_2^2+\sum_{k=1}^p\sigma_{1k}^2\right)}{2\sigma^2\sigma_1^2}+\sum_{k=1}^p\log\sigma_{1k}+const \end{aligned} \]

This formula is complicated, and we might need to optimize it numerically. A fatal issue is, it seems that \(\sigma^2\) will inevitable play a role in the expression, but we don’t have access to \(\sigma^2\). I tried to make further assumption to simplify it, e.g. assuming the columns of \(X\) to be orthonormal, but it turns out not to be helpful at all.

An option will be introducing \(\sigma^2\) as a new variable, and numerically optimize \(ELBO(\sigma^2,\sigma_1^2)\) with an alternative iterating method. But that will be really hard. Another option is to fix \(\sigma^2\) with an appropriate value, see the following Issue bullet for which \(\sigma^2\) to choose.

  • By MLE in ridge regression

We also fix \(\beta_2\) estimated by SuSiE, and let \(u=y-X\beta_2\).

\[ \begin{aligned} p(u,X,\sigma^2_1,\sigma^2)=&\int p(u,X,\sigma^2_1,\sigma^2,\beta)d\beta\\ =&\int p(u,X,\sigma^2,\beta)p(\sigma^2, \sigma^2_1,\beta)d\beta\\ =&\int N(u;X\beta,\sigma^2 I)\cdot N(\beta;0,\sigma^2\sigma_1^2 I)d\beta\\ \propto &\frac{\exp\left\{(\hat{\beta}_{ridge}^TX^Tu-\|u\|^2_2)/(2\sigma^2)\right\}}{\sigma^{n+p}\sigma^p_1}\cdot\int N(\beta;\hat\beta_{ridge},\sigma^2(\textbf X^T\textbf X+I_p/\sigma_1^2)^{-1})d\beta\\ \propto & \frac{\exp\left\{(\hat{\beta}_{ridge}^TX^Tu-\|u\|^2_2)/(2\sigma^2)\right\}}{\sigma^{n+p}\sigma^p_1} \sigma^p |X^TX+I/\sigma^2_1|^{1/2} \end{aligned} \]

It has the same issue as above: they both require the value of \(\sigma^2\) to optimize \(\sigma^2_1\). In the following Issue bullet, we propose an ad-hoc strategy to approximate \(\sigma^2\).

  • Issue: shared \(\sigma^2\) in SuSiE and ridge regression

One notable fact is that, they share the same residual variance \(\sigma^2\) as in Part I. The negative side is, the method that makes sense would be, approximating \(\sigma^2\) by combining both parts (SuSiE and ridge regression) together, see second attempt.

An ad-hoc strategy is that, since SuSiE also gives an approximation on \(\sigma^2\), we can simply plug it in here, so that \(\sigma^2\) becomes fixed. It ignores the information in the ridge regression part and might lead to a bad approximation on \(\sigma^2\), but at least it makes the algorithm run. See the experiment result for more details.