Last updated: 2018-02-08

Loading required package: ashr

Simulation Design

In the simplest case, there is no true deviation exists. \[c_{j} = \mu_{j} 1\] \[\hat{c}_{j} \sim N_{8}(c_{j}, \frac{1}{2}I)\] Let L be the contrast matrix. Therefore, \[\hat{\delta}_{j} = L\hat{c}_{j} \sim N_{7}(0, \frac{1}{2}LL')\]

We first generate the data:

set.seed(1)
data = sim.contrast1(nsamp = 10000, ncond = 8)

Set up the contrast matrix and the mash contrast data object

L = rbind(c(-1,1,0,0,0,0,0,0),
          c(-1,0,1,0,0,0,0,0),
          c(-1,0,0,1,0,0,0,0),
          c(-1,0,0,0,1,0,0,0),
          c(-1,0,0,0,0,1,0,0),
          c(-1,0,0,0,0,0,1,0),
          c(-1,0,0,0,0,0,0,1))
row.names(L) = c('2-1','3-1','4-1','5-1','6-1','7-1','8-1')
mash_data = mash_set_data(Bhat=data$Chat, Shat=data$Shat)
mash_data_L = mash_set_data_contrast(mash_data, L)

Set up the covariance matrices:

There are two types of covariance matrix you can use in mash: “canonical” and “data-driven”.

# canonical
U.c = cov_canonical(mash_data_L)
# data driven
mash_data_L.z = mash_data_L
mash_data_L.z$Bhat = mash_data_L$Bhat/mash_data_L$Shat
m.1by1 = mash_1by1(mash_data_L.z)
strong = get_significant_results(m.1by1,0.05)
# no strong
# no data driven

Fit mashcontrast model

mashcontrast.model = mash(mash_data_L, U.c, algorithm.version = 'R')

 - Computing 10000 x 169 likelihood matrix.
 - Likelihood calculations took 1.06 seconds.
 - Fitting model with 169 mixture components.
 - Model fitting took 1.55 seconds.
 - Computing posterior matrices.
 - Computation allocated took 0.47 seconds.

length(get_significant_results(mashcontrast.model))

[1] 0

There is no discovery, which is as we expected. The true deviations are zero for all samples.

barplot(get_estimated_pi(mashcontrast.model),las = 2, cex.names = 0.7)

Comparing discover result with \(\hat{\delta}_{j} \sim N_{7}(0,I)\):

Indep.data = mash_set_data(Bhat = data$Chat%*%t(L))
U.c = cov_canonical(Indep.data)
Indep.model = mash(Indep.data, U.c, algorithm.version = 'R')

 - Computing 10000 x 169 likelihood matrix.
 - Likelihood calculations took 0.70 seconds.
 - Fitting model with 169 mixture components.
 - Model fitting took 1.63 seconds.
 - Computing posterior matrices.
 - Computation allocated took 0.06 seconds.

length(get_significant_results(Indep.model))

[1] 2818

There are more false positives.

barplot(get_estimated_pi(Indep.model),las = 2,cex.names = 0.7)

Comparing discover result with \(\hat{z} \sim N_{7}(0,I)\):

In this case, the observed deviations are truly independent.

# simulate true independent delta
D = matrix(0, nrow=10000, ncol = 7)
Shat = matrix(1, nrow=10000, ncol = 7)
set.seed(1)
E = matrix(rnorm(length(Shat), mean=0, sd=Shat), nrow=10000, ncol = 7)
Dhat = D + E
row_ids = paste0("sample_", 1:nrow(D))
col_ids = paste0("condition_", 1:ncol(D))
rownames(D) = row_ids
colnames(D) = col_ids
rownames(Dhat) = row_ids
colnames(Dhat) = col_ids
rownames(Shat) = row_ids
colnames(Shat) = col_ids

mash_delta = mash_set_data(Bhat = Dhat, Shat = Shat)
U.c = cov_canonical(mash_delta)
mash_delta_model = mash(mash_delta, U.c, algorithm.version = 'R')

 - Computing 10000 x 169 likelihood matrix.
 - Likelihood calculations took 0.67 seconds.
 - Fitting model with 169 mixture components.
 - Model fitting took 1.72 seconds.
 - Computing posterior matrices.
 - Computation allocated took 0.04 seconds.

length(get_significant_results(mash_delta_model))

[1] 0

barplot(get_estimated_pi(mash_delta_model),las = 2,cex.names = 0.7)

From the above comparisons, we show that the wrong assumption \(\hat{\delta}\sim N_{7}(0,I)\) produces many more false positives.

Session information

sessionInfo()

R version 3.4.3 (2017-11-30)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS High Sierra 10.13.3

Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] mashr_0.2-4 ashr_2.2-4 

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.15      knitr_1.17        magrittr_1.5     
 [4] REBayes_1.2       MASS_7.3-47       doParallel_1.0.11
 [7] pscl_1.5.2        SQUAREM_2017.10-1 lattice_0.20-35  
[10] foreach_1.4.4     plyr_1.8.4        stringr_1.2.0    
[13] tools_3.4.3       parallel_3.4.3    grid_3.4.3       
[16] rmeta_2.16        htmltools_0.3.6   iterators_1.0.9  
[19] assertthat_0.2.0  yaml_2.1.16       rprojroot_1.2    
[22] digest_0.6.13     Matrix_1.2-12     codetools_0.2-15 
[25] evaluate_0.10.1   rmarkdown_1.8     stringi_1.1.6    
[28] compiler_3.4.3    Rmosek_8.0.69     backports_1.1.2  
[31] mvtnorm_1.0-7     truncnorm_1.0-7

Simple Simulation Example

Yuxin Zou

2018-01-10

Simulation Design

Comparing discover result with \(\hat{\delta}_{j} \sim N_{7}(0,I)\):

Comparing discover result with \(\hat{z} \sim N_{7}(0,I)\):

Session information