An Extension of SuSiE Model with Mixture-Gaussian Prior

August 2, 2019

Acknowledgements

Advised by Prof. Matthew Stephens.

Thanks to Dr. Peter Carbonetto for the substantial instruction and advice.

Project page [https://stephenslab.github.io/susie-mixture/]

Background - introduce SuSiE model
SuSiE-mixture model
Algorithm
Simulation result

1. Background - introduce SuSiE model

SUm of SIngle Effect model

Motivation: fine-mapping
Bayesian linear regression model
Excellent performance for highly sparse effect and correlated variables
Posterior inclusion probability (PIP)

A motivating example

Linear regression

\[ \textbf y=\textbf X\textbf b+\textbf e,\textbf e\sim N(0,\sigma^2I_n). \]

Number of variables \(p=4\), number of observations \(n\) is large.

Variables are highly correlated, specifically \(\textbf x_1=\textbf x_2\) and \(\textbf x_3=\textbf x_4\).

Assume we know that exactly 2 variables out of the 4 are relevent.

Simple case: The Single Effect Regression (SER) Model

\[ \begin{aligned} &\textbf y= \textbf X\textbf b+\textbf e,\textbf e\sim N(0,\sigma^2I_n),\\ &\textbf b= b\boldsymbol \gamma,\\ &\boldsymbol\gamma\sim Mult(1,\boldsymbol \pi),\\ &b\sim N(0,\sigma_0^2). \end{aligned} \]

SuSiE model: SUm of SIngle Effect

\[ \begin{aligned} &\textbf y= \textbf X\textbf b+\textbf e,\textbf e\sim N(0,\sigma^2I_n),\\ &\textbf b= \sum_{l=1}^L\textbf b_l=\sum_{l=1}^Lb_l\boldsymbol \gamma_l,\\ &\boldsymbol\gamma_l\sim Mult(1,\boldsymbol \pi),\\ &b_l\sim N(0,\sigma_l^2). \end{aligned} \]

Marginal prior distribution of \(b_i\): spike-and-slab.

\(susie(\textbf X,\textbf y,L)\) returns the PIP.

Prior of regression coefficients

2. SuSiE-mixture Model

\[ \begin{aligned} &\textbf y= \textbf X\textbf b+\textbf e,\textbf e\sim N(0,\sigma^2I_n),\\ &\textbf b= \sum_{l=0}^L\textbf b_l=\textbf b_0+\sum_{l=1}^Lb_l\boldsymbol \gamma_l,\\ &\boldsymbol\gamma_l\sim Mult(1,\boldsymbol \pi),b_l\sim N(0,\sigma_l^2),\forall l\geq 1,\\ &\textbf b_0\sim N(0,\sigma_0^2I_p). \end{aligned} \]

Marginal prior distribution of \(b_i\): mixture-Gaussian distribution.

Algorithm

For known \(\sigma_0^2/\sigma^2\)
For unknown \(\sigma_0^2/\sigma^2\)

For known \(\sigma_0^2/\sigma^2\)

Let

\[ H=\frac{\sigma^2_0}{\sigma^2}\textbf X\textbf X^T+I,L L^T= H. \]

And transform the data

\[ \tilde {\textbf X}=L^{-1}\textbf X,\tilde {\textbf y}=L^{-1}\textbf y. \]

Then \(susie(\tilde{\textbf X},\tilde{\textbf y}, L)\) yield the result for SuSiE-mixture model.

For unknown \(\sigma_0^2/\sigma^2\)

Theoretic result

Coordinate ascent method on the evidence lowerbound (ELBO)
Converge to stationary point
Some derivation of variantional inference

Simulation result

Reduce False Discovery Rate

Simulation result

Reference

[1] Wang, Gao, Abhishek K. Sarkar, Peter Carbonetto, and Matthew Stephens. "A simple new approach to variable selection in regression, with application to genetic fine-mapping." bioRxiv (2018): 501114.

[2] Blei, David M., Alp Kucukelbir, and Jon D. McAuliffe. "Variational inference: A review for statisticians." Journal of the American Statistical Association 112, no. 518 (2017): 859-877.

[3] Carbonetto, Peter, and Matthew Stephens. "Scalable variational inference for Bayesian variable selection in regression, and its accuracy in genetic association studies." Bayesian analysis 7, no. 1 (2012): 73-108.

[4] Guan, Yongtao, and Matthew Stephens. "Bayesian variable selection regression for genome-wide association studies and other large-scale problems." The Annals of Applied Statistics 5, no. 3 (2011): 1780-1815.

[5] Zhou, Xiang, Peter Carbonetto, and Matthew Stephens. "Polygenic modeling with Bayesian sparse linear mixed models." PLoS genetics 9, no. 2 (2013): e1003264.

Acknowledgements

Table of Contents

1. Background - introduce SuSiE model

A motivating example

Simple case: The Single Effect Regression (SER) Model

SuSiE model: SUm of SIngle Effect

Prior of regression coefficients

2. SuSiE-mixture Model

Algorithm

For known \(\sigma_0^2/\sigma^2\)

For unknown \(\sigma_0^2/\sigma^2\)

Theoretic result

Simulation result

Simulation result

Reference