Last updated: 2018-01-05
Code version: d801d06
# Load required packages
library(mashr); library(ExtremeDeconvolution); library(flashr2); library(mclust)
Loading required package: ashr
Package 'mclust' version 5.4
Type 'citation("mclust")' for citing this R package in publications.
Attaching package: 'mclust'
The following object is masked from 'package:ashr':
dens
# read data
data = readRDS('../data/ImmuneQTLSummary.4MASH.rds')
data$max$se = data$max$beta/data$max$z
data$null$se = data$null$beta / data$null$z
# set parameters
vhat = 1
We estimate the covariance using column-centered Z scores
D.center = apply(as.matrix(data$max$z), 2, function(x) x - mean(x))
mash_data_center = mashr::set_mash_data(Bhat = as.matrix(D.center))
From Flash, we have \[\tilde{Z} = LF' + E\] where F is \(7 \times 5\), L is \(n \times 5\), E is \(n\times7\).
\[F = \left( \begin{array}{c c c c} f_{1} & f_{2} & \cdots & f_{5} \end{array}\right)_{p\times 5}\] For each gene i, \[z_{i} = \sum_{k=1}^{5}l_{ik} f_{k}\]. The covariance matrix for gene i is \[U_{i} = z_{i}z_{i}'\] which could capture the patterns in the ith sample.
FlashResult = readRDS('~/Documents/GitHub/mash-application-immune/output/Immune.flash2.center.greedy.K10.rds')
n = nrow(FlashResult$L_flash)
U = list()
Z = matrix(0, nrow=n, ncol=7)
for(i in 1:n){
zi = apply(t(FlashResult$L_flash[i,] * t(FlashResult$F_flash)), 1, sum)
Z[i,] = zi
U[[i]] = zi %*% t(zi)
}
The covariance matrices for two genes will be similar if they have similar \(z_{i}\) vectors.
So we try to cluster \(z_{i}\) vectors.
mod <- Mclust(Z)
summary(mod$BIC)
Best BIC values:
EEE,1 EEV,1 EVE,1
BIC 2341099 2341099 2341099
BIC diff 0 0 0
summary(mod, parameters = TRUE)
----------------------------------------------------
Gaussian finite mixture model fitted by EM algorithm
----------------------------------------------------
Mclust XXX (ellipsoidal multivariate normal) model with 1 component:
log.likelihood n df BIC ICL
1170724 21485 35 2341099 2341099
Clustering table:
1
21485
Mixing probabilities:
1
1
Means:
[,1]
[1,] -0.002812614
[2,] -0.001675162
[3,] -0.001880870
[4,] -0.001894170
[5,] -0.001462748
[6,] 0.007792370
[7,] -0.002153637
Variances:
[,,1]
[,1] [,2] [,3] [,4] [,5] [,6] [,7]
[1,] 6.096131 4.666525 4.608853 4.461147 4.293118 4.251914 5.534549
[2,] 4.666525 4.322069 3.836683 3.903263 3.514890 3.664657 4.490756
[3,] 4.608853 3.836683 3.990491 3.504557 3.585947 3.368166 4.466553
[4,] 4.461147 3.903263 3.504557 3.916026 3.365035 3.509967 4.160460
[5,] 4.293118 3.514890 3.585947 3.365035 3.299126 3.200339 4.082921
[6,] 4.251914 3.664657 3.368166 3.509967 3.200339 4.287603 4.005367
[7,] 5.534549 4.490756 4.466553 4.160460 4.082921 4.005367 5.196203
There is only one cluster identified here. The estimated model is \[z_{i} \sim N_{7}(\hat{\mu}, \hat{\Sigma})\]
saveRDS(list(mod$parameters$variance$Sigma), '../output/Immune.flash.ind.reduce.cov.rds')
We can use this estiamted \(\hat{\Sigma}\) to estiamte the covariance matrix for all genes.
sessionInfo()
R version 3.4.3 (2017-11-30)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS High Sierra 10.13.2
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] mclust_5.4 flashr2_0.3-3
[3] ExtremeDeconvolution_1.3 mashr_0.2-4
[5] ashr_2.1-27
loaded via a namespace (and not attached):
[1] Rcpp_0.12.14 compiler_3.4.3 git2r_0.20.0
[4] plyr_1.8.4 iterators_1.0.9 tools_3.4.3
[7] digest_0.6.13 evaluate_0.10.1 tibble_1.3.4
[10] gtable_0.2.0 lattice_0.20-35 rlang_0.1.6
[13] Matrix_1.2-12 foreach_1.4.4 yaml_2.1.16
[16] parallel_3.4.3 mvtnorm_1.0-6 stringr_1.2.0
[19] knitr_1.17 rprojroot_1.2 grid_3.4.3
[22] rmarkdown_1.8 rmeta_2.16 ggplot2_2.2.1
[25] magrittr_1.5 backports_1.1.2 scales_0.5.0
[28] codetools_0.2-15 htmltools_0.3.6 MASS_7.3-47
[31] assertthat_0.2.0 colorspace_1.3-2 stringi_1.1.6
[34] lazyeval_0.2.1 pscl_1.5.2 doParallel_1.0.11
[37] munsell_0.4.3 truncnorm_1.0-7 SQUAREM_2017.10-1
This R Markdown site was created with workflowr