Last updated: 2019-08-01

Checks: 2 0

Knit directory: susie-mixture/analysis/

This reproducible R Markdown analysis was created with workflowr (version 1.4.0). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.


Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility. The version displayed above was the version of the Git repository at the time these results were generated.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .RData
    Ignored:    .Rhistory
    Ignored:    .Rproj.user/
    Ignored:    analysis/.RData
    Ignored:    analysis/.Rhistory

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.


These are the previous versions of the R Markdown and HTML files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view them.

File Version Author Date Message
html 034283b Zhengyang Fang 2019-07-22 Build site.
html 6d9cbaa Zhengyang Fang 2019-07-22 Build site.
html 2a474f9 Zhengyang Fang 2019-07-22 Build site.
html c72a707 Zhengyang Fang 2019-07-22 Build site.
html 8092e5a Zhengyang Fang 2019-07-19 Build site.
html 5d3e3ef Zhengyang Fang 2019-07-11 Build site.
html 3eee187 Zhengyang Fang 2019-07-03 Build site.
html f303e14 Zhengyang Fang 2019-07-01 Build site.
html def2b27 Zhengyang Fang 2019-06-28 Build site.
html 05f778b Zhengyang Fang 2019-06-28 Build site.
html 3f6dd45 Zhengyang Fang 2019-06-26 Build site.
Rmd fd0f282 Zhengyang Fang 2019-06-26 wflow_publish(“SuSiE_summary.Rmd”)
html 9f3f1a6 Zhengyang Fang 2019-06-23 Build site.
Rmd 485bef4 Zhengyang Fang 2019-06-23 wflow_publish(“SuSiE_summary.Rmd”)

Summary of SuSiE (Sum of Single Effects)

Reference: Wang et al, 2012

I. Model

\(\textbf y=\textbf X\textbf b+\textbf e,\textbf e\sim N(0,\sigma^2 I_n),\textbf X,\textbf y\in \mathbb R^n,\textbf b\in \mathbb R^p\).

\(\textbf b=\sum_{l=1}^L b_l\boldsymbol \gamma_l,\boldsymbol \gamma_l\sim Mult(1,\pi),b_l\sim N(0,\sigma_{bl}^2)\).

Here \(L,\pi\) are given and fixed.

We want to estimate the posterior inclusion probability (PIP)

\[ \alpha_j:=\mathbb P(\beta_j\neq 0|\textbf X,\textbf y), \]

and the posterior mean \(\mu_{1j}\) and variance \(\sigma_{1j}^2\) for all \(1\leq j\leq p\).

II. Simple version: single effect regression model (SER)

Here we assume \(L=1\). We introduce SER model because fitting SER model is an important step in fitting SuSiE.

Model

\(\textbf y=\textbf X\beta+\textbf e,\textbf e\sim N(0,\sigma^2 I_n),\textbf X,\textbf y\in \mathbb R^n,\beta\in\mathbb R\).

\(\beta=b\boldsymbol \gamma,\gamma\sim Mult(1,\pi),b\sim N(0,\sigma_b^2)\).

Here \(\pi\) is given and fixed.

Goal: find

\[ PIP_k=\mathbb P(\gamma_k=1|X,y). \]

Method

Assume the variance \(\sigma^2\) and \(\sigma_b^2\) are known. Calculate the Bayes Factor

\[ BF(y,X;\sigma^2,\sigma_b^2)=... \]

Also, the posterior distribution

\[ \beta_k|X_k,y,\sigma^2,\sigma_b^2,\gamma_k=1\sim N(\mu_{1k},\sigma_{1k}^2). \]

The bayes factor, posterior mean \(\mu_{1k}\) and variance \(\sigma_{1k}^2\) all have a close form, and are easy to compute.

Therefore, for given \(\sigma^2,\sigma_b^2\), we have

\[ \alpha_k=\mathbb P(\gamma_k=1|X,y,\sigma^2,\sigma_b^2)=\frac{BF(y,X_k;\sigma^2,\sigma_b^2)\cdot\pi_k}{\sum_{j=1}^p BF(y,X_j;\sigma^2,\sigma_b^2)\cdot\pi_j}. \]

This is also easy to compute. By putting everything above together, we have a function SER with input \((X,y,\sigma^2,\sigma^2_b)\), whic outputs the important parameters of the posterior distribution \((\boldsymbol\alpha=(\alpha_1,\alpha_2,\dots,\alpha_k),\boldsymbol\mu_1=(\mu_{11},\mu_{12},\dots,\mu_{1p}),\boldsymbol\sigma_1^2=(\sigma^2_{11},\sigma^2_{12},\dots,\sigma^2_{1p}))\).

III. Fitting SuSiE: Iterative Bayesian stepwise selection (IBSS)

Algorithm

For given data \(\textbf X,\textbf y\), hyperparameters \(\sigma^2,\boldsymbol \sigma_0^2\), number of effects \(L\).

  • Initialize \(\bar{\textbf b_l}=0\) for all \(1\leq l\leq L\) (or any other initialization)
  • Repeat until converge
    • For i in 1,\(\dots\),L do
      • \(\textbf r_l\leftarrow \textbf y-\sum_{l^\prime\neq l}\textbf X\bar{\textbf b_{l^\prime}}\)
      • \((\boldsymbol\alpha_l,\boldsymbol \mu_{1l},\boldsymbol \sigma_{1l})\leftarrow SER(\textbf r_l,\textbf X;\sigma^2,\sigma_{0l}^2)\)
      • \(\bar{\textbf b_l}\leftarrow \boldsymbol{\alpha}_l\circ \boldsymbol{\mu}_{1l}\)
  • Return \((\boldsymbol\alpha_l,\boldsymbol \mu_{1l},\boldsymbol \sigma_{1l})\) for all \(1\leq l\leq L\).

A variational inference explanation

Variational inference finds an approximation \(q(\textbf b_1,\dots,\textbf b_L)\) to the posterior distribution \(p_{post}:=p(\textbf b_1,\dots,\textbf b_L|\textbf y)\), which minimizes the KL-divergence from \(q\) to \(p_post\), \(D_{KL}(q,p_{post})\).

It can be hard to compute, but we can write it as

\[ D_{KL}(q,p_{post})=\log p(\textbf y|\sigma^2,\boldsymbol \sigma_0^2)-F(q;\sigma^2,\boldsymbol\sigma_0^2), \]

where \(F\) is known as the evidence lower bound (ELBO), and it is easy to compute.