Last updated: 2019-08-01
Checks: 2 0
Knit directory: susie-mixture/analysis/
This reproducible R Markdown analysis was created with workflowr (version 1.4.0). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.
Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.
Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility. The version displayed above was the version of the Git repository at the time these results were generated.
Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish
or wflow_git_commit
). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:
Ignored files:
Ignored: .RData
Ignored: .Rhistory
Ignored: .Rproj.user/
Ignored: analysis/.RData
Ignored: analysis/.Rhistory
Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.
These are the previous versions of the R Markdown and HTML files. If you’ve configured a remote Git repository (see ?wflow_git_remote
), click on the hyperlinks in the table below to view them.
File | Version | Author | Date | Message |
---|---|---|---|---|
html | 034283b | Zhengyang Fang | 2019-07-22 | Build site. |
html | 6d9cbaa | Zhengyang Fang | 2019-07-22 | Build site. |
html | 2a474f9 | Zhengyang Fang | 2019-07-22 | Build site. |
html | c72a707 | Zhengyang Fang | 2019-07-22 | Build site. |
html | 8092e5a | Zhengyang Fang | 2019-07-19 | Build site. |
html | 5d3e3ef | Zhengyang Fang | 2019-07-11 | Build site. |
html | 3eee187 | Zhengyang Fang | 2019-07-03 | Build site. |
Rmd | 3954e9b | Zhengyang Fang | 2019-07-03 | wflow_publish(c(dir())) |
html | f303e14 | Zhengyang Fang | 2019-07-01 | Build site. |
Rmd | 0af8e3b | Zhengyang Fang | 2019-07-01 | wflow_publish(c(dir())) |
html | def2b27 | Zhengyang Fang | 2019-06-28 | Build site. |
html | e5c3213 | Zhengyang Fang | 2019-06-28 | Build site. |
Rmd | dfc09f2 | Zhengyang Fang | 2019-06-28 | wflow_publish(“ridge_derivation.Rmd”) |
html | 05f778b | Zhengyang Fang | 2019-06-28 | Build site. |
html | a6fdfd8 | Zhengyang Fang | 2019-06-27 | Build site. |
Rmd | ecd9a3d | Zhengyang Fang | 2019-06-27 | wflow_publish(“ridge_derivation.Rmd”) |
Assume the observation to be \(x=(x_1,\dots,x_n)\), and the latent variable to be \(z=(z_1,\dots,z_m)\). We are interested in posterior distribution \(p(z|x)\). We approximate it with a family of functions \(q\), and minimize the KL-divergence
:
\[ \begin{aligned} KL(q,p)&=\mathbb E_q\log\frac{q(z)}{p(z|x)}=\mathbb E_q\log q(z)-\mathbb E_q\log p(x,z)+\mathbb E_q\log p(x)\\ &=\log p(x)-[\mathbb E_q\log p(x,z)-\mathbb E_q\log q(z)]\\ &:=\log p(x)-ELBO(q), \end{aligned} \]
where we define the evidence lowerbound
\[ ELBO(q)=\mathbb E_q\log p(x,z)-\mathbb E_q\log q(z). \]
As \(\log p(x)\) is constant, minimizing the KL-divergence
is equivalent to maximizing the ELBO(q)
over \(q\).
Now we focus on the following setting, assume the function family s.t.
\[ q(z_1,\dots,z_m)=\prod_{j=1}^mq(z_j). \] i.e. those latent variables are independent.
We introduce a coordinate ascend inference
framework to maximize ELBO
We have \[ p(z_{1:m},x)=p(x)\cdot\prod_{j=1}^m p(z_j|z_{1:(j-1)},x). \] Note that the order of the index of \(z\) does not really matter. We can arbitrarily change the order of those latent variables.
And also
\[ \mathbb E_q \log q(z_{1:m})=\sum_{j=1}^m\mathbb E_{q_j}\log q(z_j). \]
Therefore,
\[ \begin{aligned} ELBO(q)&=\mathbb E_q\log p(x,z)-\mathbb E_q\log q(z)\\ &=\mathbb E_q\log\left( p(x)\cdot\prod_{j=1}^m p(z_j|z_{1:(j-1)},x)\right)-\sum_{j=1}^m\mathbb E_{q_j}\log q(z_j)\\ &=\log p(x)+\sum_{i=1}^m \mathbb E_q\log p(z_j|z_{1:(j-1)},x)-\sum_{j=1}^m\mathbb E_{q_j}\log q(z_j). \end{aligned} \] And the order of the index of \(z\) does not matter. We can arbitrarily swap them. Specifically, if we want to investigate \(q(z_k),1\leq k\leq m\), we can swap \(z_k\) and \(z_m\), and let \(z_k\) appears in only one term of \(\mathbb E_q\log p(z_j|z_{1:(j-1)},x)\), i.e. equivalent to \(k=m\). Then the only term containing \(z_k\) will be \(\mathbb E_q\log p(z_k|z_{-k},x)\).
As we maximize \(ELBO(q)\) and only focus on some \(z_k\), i.e. fix the other part of this function family \(q(z_{-k})\) to be constant, then we have \[ ELBO(q(z_k))=\mathbb E_q\log p(z_k|z_{-k},x)-\mathbb E_{q_k}\log q(z_k)+const \]
Take the restriction into account
\[ \int q(z_k)dz_k=1. \]
The Lagrangian function
\[ \mathcal L_k=\int q(z_k)\mathbb E_{q(z_{-k})}\log p(z_k|z_{-k},x)dz_k-\int q(z_k)\log q(z_k)dz_k-\lambda\left(\int q(z_k)dz_k-1\right). \]
Then we take a derivative w.r.t \(q(z_k)\) (imagine \(q(z_k)\) to be a vector - actually it has an infinite dimension)
\[ \frac{d \mathcal L_k}{dq(z_k)}=\mathbb E_{q(z_{-k})}\log p(z_k|z_{-k},x)-\log q(z_k)-1-\lambda. \] Then \[ q(z_k)=\exp\{\mathbb E_{q(z_{-k})}\log p(z_k|z_{-k},x)\}\cdot\exp\{1+\lambda\} \]
Hence it leads to a coordinate ascend updating step
\[ q^\star(z_k)\propto \exp\{\mathbb E_{q(z_{-k})}\log p(z_k|z_{-k},x)\}\propto\exp\{\mathbb E_{q(z_{-k})}\log p(z_k,z_{-k},x)\}. \]
We use mean field assumption \(q(\beta_1,\dots,\beta_p)=\prod_{i=1}^p q(\beta_i)\). And we assume \(q\) to be Gaussian, i.e. \(\beta_i\sim N(\mu_{1i},\sigma^2_{1i})\).
Assume \(\sigma^2,\sigma^2_b\) are known. For a specific \(1\leq k\leq p\), the updating step will be
\[ q^\star(\beta_k)\propto\exp\{\mathbb E_{q(\beta_{-k})}\log p(\beta_k|\beta_{-k},\textbf X,\textbf y)\}. \]
Let \(\textbf r_k=\textbf y-\textbf X_{-k}\beta_{-k}\), we have \(\textbf r_k\sim N\left(\textbf y-\textbf X_{-k}\mu_{-k},\textbf X_{-k}\textbf D_{-k}\textbf X_{-k}^T\right)\), where \(\textbf D_{-k}=diag\left\{\sigma_{11}^2,\dots,\sigma^2_{1(k-1)},\sigma^2_{1(k+1)},\dots,\sigma_{1p}^2\right\}\). For simplicity, we let \(\bar {\textbf r}_k=\textbf y-\textbf X_{-k}\mu_{-k}\).
Since \[ p(\beta_k|\beta_{-k},\textbf y,\textbf X)\propto p(\beta_k)\cdot L(\beta_k;\beta_{-k},\textbf X,\textbf y), \]
then
\[ \begin{aligned} \log p(\beta_k|\beta_{-k},\textbf X,\textbf y)&=const+\log N(\beta_k;0,\sigma^2\sigma_b^2)+\log N(\textbf r_k-\textbf X_k\beta_k;0,\sigma^2I)\\ &=const-\frac{\beta_k^2}{2\sigma^2\sigma^2_b}-\frac1{2\sigma^2}\|\textbf r_k-\textbf X_k\beta_k\|^2_2. \end{aligned} \]
Under \(q_{-k}\), we have
\[ \textbf r_k-\textbf X_k\beta_k\sim N(\textbf y-\textbf X_{-k}\mu_{-k}-\textbf X_k\beta_k,\textbf X_{-k}\textbf D_{-k}\textbf X_{-k}^T) \]
Since \(\textbf X_{-k}\textbf D_{-k}\textbf X_{-k}^T\) is positive semi-definite, we have decomposition
\[ \textbf X_{-k}\textbf D_{-k}\textbf X_{-k}^T=\textbf P^T\Lambda \textbf P, \] where \(P\) is orthogonal and \(\Lambda\) is diagnal. Then
\[ \|\textbf r_k-\textbf X_k\beta_k\|^2_2=\|\textbf P(\textbf r_k-\textbf X_k\beta_k)\|^2_2. \] And
\[ \textbf P(\textbf r_k-\textbf X_k\beta_k)\sim N(\textbf P(\bar {\textbf r}_k-\textbf X_k\beta_k),\Lambda). \]
Thus
\[ \mathbb E\|\textbf P(\textbf r_k-\textbf X_k\beta_k)\|^2_2=trace(\Lambda)+\|P(\bar{\textbf r}_k-\textbf X_k\beta_k)\|_2^2=trace(\Lambda)+\|\bar {\textbf r}_k-\textbf X_k\beta_k\|_2^2. \]
Therefore
\[ \begin{aligned} q^\star(\beta_k)=&\exp\{\mathbb E_{q(z_{-k})}\log p(\beta_k|\beta_{-k},\textbf y,\textbf X)\}\\ =&\exp\left\{const-\frac{\beta_k^2}{2\sigma^2\sigma^2_b}-\frac1{2\sigma^2}\mathbb E\|\textbf r_k-\textbf X_k\beta_k\|^2_2\right\}\\ =&\exp\left\{const-\frac{\beta_k^2}{2\sigma^2\sigma^2_b}-\frac1{2\sigma^2}\left(trace(\Lambda)+\|\bar {\textbf r}_k-\textbf X_k\beta_k\|_2^2\right)\right\}\\ \propto &\exp\left\{-\frac1{2\sigma^2}\left[(1/\sigma_b^2+\|\textbf X_k\|_2^2)\beta_k^2-2\bar {\textbf r}_k^T\textbf X_k\beta_k\right]\right\}\\ \propto &\exp\left\{-\frac1{2\sigma^2/(1/\sigma_b^2+\|\textbf X_k\|_2^2)}\left(\beta_k-\frac{\bar {\textbf r}_k^T\textbf X_k}{1/\sigma_b^2+\|\textbf X_k\|_2^2}\right)^2\right\} \end{aligned} \] Thus the updated result \(q^\star (\beta_k)\) indicates
\[ \beta_k\sim N\left(\frac{\bar {\textbf r}_k^T\textbf X_k}{1/\sigma_b^2+\|\textbf X_k\|_2^2},\frac{\sigma^2}{1/\sigma_b^2+\|\textbf X_k\|_2^2}\right). \] That is to say, in the updating step, it should be \(\hat\mu_{1k}=\frac{\bar {\textbf r}_k^T\textbf X_k}{1/\sigma_b^2+\|\textbf X_k\|_2^2}\), \(\hat\sigma_{1k}^2=\frac{\sigma^2}{1/\sigma_b^2+\|\textbf X_k\|_2^2}\). Actually, we don’t have to update \(\hat\sigma_{1k}^2\) because it will be constant.