Last updated: 2018-01-05

Code version: d801d06

Set up the data

# Load required packages
library(mashr); library(ExtremeDeconvolution); library(flashr2); library(mclust)
Loading required package: ashr
Package 'mclust' version 5.4
Type 'citation("mclust")' for citing this R package in publications.

Attaching package: 'mclust'
The following object is masked from 'package:ashr':

    dens
# read data
data = readRDS('../data/ImmuneQTLSummary.4MASH.rds')
data$max$se = data$max$beta/data$max$z
data$null$se = data$null$beta / data$null$z
# set parameters
vhat = 1

We estimate the covariance using column-centered Z scores

D.center = apply(as.matrix(data$max$z), 2, function(x) x - mean(x))
mash_data_center = mashr::set_mash_data(Bhat = as.matrix(D.center))

Generate covariance matrices for each row

From Flash, we have \[\tilde{Z} = LF' + E\] where F is \(7 \times 5\), L is \(n \times 5\), E is \(n\times7\).

\[F = \left( \begin{array}{c c c c} f_{1} & f_{2} & \cdots & f_{5} \end{array}\right)_{p\times 5}\] For each gene i, \[z_{i} = \sum_{k=1}^{5}l_{ik} f_{k}\]. The covariance matrix for gene i is \[U_{i} = z_{i}z_{i}'\] which could capture the patterns in the ith sample.

FlashResult = readRDS('~/Documents/GitHub/mash-application-immune/output/Immune.flash2.center.greedy.K10.rds')
n = nrow(FlashResult$L_flash)
U = list()
Z = matrix(0, nrow=n, ncol=7)
for(i in 1:n){
  zi = apply(t(FlashResult$L_flash[i,] * t(FlashResult$F_flash)), 1, sum)
  Z[i,] = zi
  U[[i]] = zi %*% t(zi) 
}

The covariance matrices for two genes will be similar if they have similar \(z_{i}\) vectors.

So we try to cluster \(z_{i}\) vectors.

mod <- Mclust(Z)
summary(mod$BIC)
Best BIC values:
           EEE,1   EEV,1   EVE,1
BIC      2341099 2341099 2341099
BIC diff       0       0       0
summary(mod, parameters = TRUE)
----------------------------------------------------
Gaussian finite mixture model fitted by EM algorithm 
----------------------------------------------------

Mclust XXX (ellipsoidal multivariate normal) model with 1 component:

 log.likelihood     n df     BIC     ICL
        1170724 21485 35 2341099 2341099

Clustering table:
    1 
21485 

Mixing probabilities:
1 
1 

Means:
             [,1]
[1,] -0.002812614
[2,] -0.001675162
[3,] -0.001880870
[4,] -0.001894170
[5,] -0.001462748
[6,]  0.007792370
[7,] -0.002153637

Variances:
[,,1]
         [,1]     [,2]     [,3]     [,4]     [,5]     [,6]     [,7]
[1,] 6.096131 4.666525 4.608853 4.461147 4.293118 4.251914 5.534549
[2,] 4.666525 4.322069 3.836683 3.903263 3.514890 3.664657 4.490756
[3,] 4.608853 3.836683 3.990491 3.504557 3.585947 3.368166 4.466553
[4,] 4.461147 3.903263 3.504557 3.916026 3.365035 3.509967 4.160460
[5,] 4.293118 3.514890 3.585947 3.365035 3.299126 3.200339 4.082921
[6,] 4.251914 3.664657 3.368166 3.509967 3.200339 4.287603 4.005367
[7,] 5.534549 4.490756 4.466553 4.160460 4.082921 4.005367 5.196203

There is only one cluster identified here. The estimated model is \[z_{i} \sim N_{7}(\hat{\mu}, \hat{\Sigma})\]

saveRDS(list(mod$parameters$variance$Sigma), '../output/Immune.flash.ind.reduce.cov.rds')

We can use this estiamted \(\hat{\Sigma}\) to estiamte the covariance matrix for all genes.

Session information

sessionInfo()
R version 3.4.3 (2017-11-30)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS High Sierra 10.13.2

Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] mclust_5.4               flashr2_0.3-3           
[3] ExtremeDeconvolution_1.3 mashr_0.2-4             
[5] ashr_2.1-27             

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.14      compiler_3.4.3    git2r_0.20.0     
 [4] plyr_1.8.4        iterators_1.0.9   tools_3.4.3      
 [7] digest_0.6.13     evaluate_0.10.1   tibble_1.3.4     
[10] gtable_0.2.0      lattice_0.20-35   rlang_0.1.6      
[13] Matrix_1.2-12     foreach_1.4.4     yaml_2.1.16      
[16] parallel_3.4.3    mvtnorm_1.0-6     stringr_1.2.0    
[19] knitr_1.17        rprojroot_1.2     grid_3.4.3       
[22] rmarkdown_1.8     rmeta_2.16        ggplot2_2.2.1    
[25] magrittr_1.5      backports_1.1.2   scales_0.5.0     
[28] codetools_0.2-15  htmltools_0.3.6   MASS_7.3-47      
[31] assertthat_0.2.0  colorspace_1.3-2  stringi_1.1.6    
[34] lazyeval_0.2.1    pscl_1.5.2        doParallel_1.0.11
[37] munsell_0.4.3     truncnorm_1.0-7   SQUAREM_2017.10-1

This R Markdown site was created with workflowr