Simulation for generic Cormotif model

Last updated: 2017-09-27

Code version: 2fa73e6

Simulation

We set unit number \(n = 5000\), and study number \(R = 4\). We set every observation has the same standard error, \(s_{jr}=1\). That is \(\hat\beta_{jr}|\beta_{jr}\sim N(\beta_{jr};0,1)\). The 5000 units come from 4 patterns (\(K=4\)): 4000 units have zero effects in all four studies, that is \(\beta_{jr}=0\), for \(r = 1,2,3,4\); 400 units have effect \(\beta_{jr}\sim N(0,4^2)\) for \(r =1, 2\) and \(\beta_{jr}=0\) for \(r =3, 4\); 400 units have effect \(\beta_{jr}\sim N(0,4^2)\) for \(r =2, 3\) and \(\beta_{jr}=0\) for \(r =1, 4\); 200 unitss have effect \(\beta_{jr}\sim N(0,4^2)\) for \(r =1, 2,3,4\).

Known K

First assume we know \(K=4\) and fit the model.

source('../code/function.R')
K = 4
D = 4
G = 5000
sigma2 = c(16,16,16,16)
pi0 = c(0.04,0.08,0.08,0.8)
q0 = matrix(0,nrow=K,ncol=D)
q0[1,] = c(1,1,1,1)
q0[2,] = c(1,1,0,0)
q0[3,] = c(0,1,1,0)
q0[4,] = c(0,0,0,0)
X = matrix(0,nrow=G,ncol=D)
rows = rep(1:K,times=pi0*G)
A = q0[rows,]
set.seed(111)
for(g in 1:G){
  for(d in 1:D){
    if(A[g,d]==1){
      X[g,d] = rnorm(1,0,sqrt(1+sigma2[d]))
    } else{
      X[g,d] = rnorm(1,0,1)
    }
  }
}
betahat=X
sebetahat=matrix(1,nrow=G,ncol=D)
# fit the model
fit1 = generic.cormotif(betahat,sebetahat,K=4,mess=FALSE)

After fit the generic cormotif model, the standard deviation \(\sigma_0,\sigma_1,\ldots,\sigma_L\) built by our method is

fit1$mixsd

 [1]  0.00000000  0.09227791  0.13050068  0.18455582  0.26100135
 [6]  0.36911165  0.52200270  0.73822330  1.04400540  1.47644660
[11]  2.08801080  2.95289319  4.17602160  5.90578638  8.35204320
[16] 11.81157277 16.70408640 23.62314554 33.40817280

The estimated \(\pi\) for \(K = 4\) is

fit1$bestmotif$pi

[1] 0.79871274 0.03963030 0.07958924 0.08206772

Then we could check the estimation for \(g\), i.e. the values of \(w\), which is a \(K\times L\) matrix.

The estimated \(w\) for studies 1 is

fit1$bestmotif$W[[1]]

             [,1]         [,2]         [,3]         [,4]         [,5]
[1,] 9.998756e-01 8.067845e-05 3.112463e-05 3.994159e-06 2.636422e-08
[2,] 0.000000e+00 0.000000e+00 2.782219e-13 2.608497e-06 7.416840e-02
[3,] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
[4,] 3.932410e-13 1.539324e-11 5.520029e-13 4.203118e-08 8.245025e-01
             [,6]         [,7]         [,8]         [,9]        [,10]
[1,] 0.000000e+00 3.996803e-15 0.000000e+00 5.467503e-08 0.000000e+00
[2,] 6.217249e-15 5.329071e-15 3.330669e-15 0.000000e+00 4.440892e-15
[3,] 0.000000e+00 1.998401e-15 0.000000e+00 0.000000e+00 0.000000e+00
[4,] 2.384626e-05 0.000000e+00 0.000000e+00 1.749625e-01 0.000000e+00
         [,11]        [,12]        [,13]        [,14] [,15]        [,16]
[1,] 0.0000000 0.0000000000 8.477791e-06 0.000000e+00     0 2.220446e-16
[2,] 0.0000000 0.2448840449 6.809447e-01 6.661338e-16     0 0.000000e+00
[3,] 0.2989694 0.0000000000 5.619378e-01 1.390928e-01     0 0.000000e+00
[4,] 0.0000000 0.0005111344 0.000000e+00 2.220446e-15     0 0.000000e+00
     [,17] [,18]        [,19]
[1,]     0     0 0.000000e+00
[2,]     0     0 2.174175e-07
[3,]     0     0 0.000000e+00
[4,]     0     0 0.000000e+00

The estimated \(w\) for studies 2 is

fit1$bestmotif$W[[2]]

             [,1]         [,2]         [,3]         [,4]         [,5]
[1,] 9.998313e-01 9.879179e-05 4.818813e-05 1.113090e-05 5.173824e-07
[2,] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
[3,] 0.000000e+00 0.000000e+00 0.000000e+00 6.661338e-16 0.000000e+00
[4,] 2.442491e-15 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
             [,6] [,7]         [,8]         [,9]     [,10]        [,11]
[1,] 4.659209e-10    0 3.552714e-15 1.998401e-15 0.0000000 8.881784e-16
[2,] 0.000000e+00    0 0.000000e+00 0.000000e+00 0.2493882 0.000000e+00
[3,] 2.442491e-15    0 0.000000e+00 0.000000e+00 0.1249867 0.000000e+00
[4,] 0.000000e+00    0 0.000000e+00 0.000000e+00 0.0000000 0.000000e+00
            [,12]        [,13]        [,14]        [,15] [,16]
[1,] 1.332268e-15 1.008336e-05 5.676348e-12 2.220446e-16     0
[2,] 0.000000e+00 7.506118e-01 4.440892e-15 0.000000e+00     0
[3,] 1.003605e-04 8.749130e-01 8.881784e-16 0.000000e+00     0
[4,] 0.000000e+00 9.375195e-01 6.248048e-02 0.000000e+00     0
            [,17]        [,18]        [,19]
[1,] 0.000000e+00 0.000000e+00 0.000000e+00
[2,] 0.000000e+00 0.000000e+00 0.000000e+00
[3,] 2.220446e-16 0.000000e+00 0.000000e+00
[4,] 0.000000e+00 2.220446e-16 2.403162e-08

The estimated \(w\) for studies 3 is

fit1$bestmotif$W[[3]]

             [,1]         [,2]         [,3]         [,4]         [,5]
[1,] 9.999050e-01 6.847827e-05 2.172051e-05 1.551853e-06 4.012694e-09
[2,] 1.762806e-05 1.053651e-08 3.743262e-09 0.000000e+00 0.000000e+00
[3,] 9.999107e-01 8.764284e-05 2.604859e-10 8.726353e-14 4.507505e-14
[4,] 3.708948e-02 1.355611e-05 4.064208e-08 2.384759e-13 4.662937e-14
             [,6]         [,7]         [,8]         [,9]        [,10]
[1,] 2.881384e-11 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
[2,] 0.000000e+00 3.552714e-15 0.000000e+00 0.000000e+00 0.000000e+00
[3,] 2.065015e-14 1.088019e-14 5.995204e-15 3.330669e-15 2.220446e-15
[4,] 6.705747e-14 2.464695e-14 2.065015e-14 0.000000e+00 0.000000e+00
     [,11]        [,12]        [,13]        [,14]        [,15] [,16] [,17]
[1,]     0 0.000000e+00 3.953720e-07 2.844806e-06 0.000000e+00     0     0
[2,]     0 0.000000e+00 9.999824e-01 0.000000e+00 0.000000e+00     0     0
[3,]     0 2.693303e-08 1.585607e-06 0.000000e+00 0.000000e+00     0     0
[4,]     0 3.353433e-01 6.275536e-01 0.000000e+00 6.661338e-16     0     0
     [,18]        [,19]
[1,]     0 0.000000e+00
[2,]     0 0.000000e+00
[3,]     0 0.000000e+00
[4,]     0 4.304066e-08

The estimated \(w\) for studies 4 is

fit1$bestmotif$W[[4]]

          [,1]         [,2]         [,3]         [,4]         [,5]
[1,] 0.9998889 7.809257e-05 2.915675e-05 3.499558e-06 1.945535e-08
[2,] 0.0000000 0.000000e+00 0.000000e+00 0.000000e+00 1.110223e-15
[3,] 0.9571039 5.654011e-04 4.764297e-08 2.580158e-13 1.330047e-13
[4,] 0.5935001 1.403105e-02 3.167747e-05 7.184842e-08 7.521539e-12
             [,6]         [,7]         [,8]         [,9]        [,10]
[1,] 3.187450e-12 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
[2,] 1.110223e-15 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
[3,] 0.000000e+00 3.796963e-14 2.176037e-14 9.547918e-15 9.769963e-15
[4,] 3.732009e-01 0.000000e+00 4.996004e-14 0.000000e+00 0.000000e+00
            [,11]       [,12]        [,13]        [,14] [,15] [,16]
[1,] 2.220446e-16 0.00000e+00 8.340217e-12 3.613825e-07     0     0
[2,] 3.177552e-01 0.00000e+00 6.822448e-01 0.000000e+00     0     0
[3,] 0.000000e+00 0.00000e+00 4.233069e-02 0.000000e+00     0     0
[4,] 2.287059e-14 2.88658e-15 0.000000e+00 1.923619e-02     0     0
            [,17] [,18]        [,19]
[1,] 2.220446e-16     0 0.000000e+00
[2,] 0.000000e+00     0 1.132427e-14
[3,] 0.000000e+00     0 0.000000e+00
[4,] 0.000000e+00     0 2.165357e-11

All of them matches well with our setting, which verifies our algorithm.

Also, we could compute the lfdr:

head(fit1$lfdr)

             [,1]         [,2]         [,3]         [,4]
[1,] 1.088245e-04 1.088195e-04 1.569545e-01 1.789218e-16
[2,] 7.257377e-07 7.257067e-07 1.725544e-08 2.380520e-03
[3,] 1.700067e-02 1.704622e-02 8.539593e-01 8.262259e-01
[4,] 1.681810e-25 4.153149e-20 2.906425e-03 7.648126e-10
[5,] 7.439623e-04 7.439258e-04 7.115568e-04 4.926416e-01
[6,] 9.283226e-01 9.282822e-01 9.768985e-01 9.877510e-01

And lfsr:

head(fit1$lfsr)

             [,1]       [,2]         [,3]         [,4]
[1,] 2.392062e-01 0.10315753 2.465397e-01 3.330669e-16
[2,] 2.923302e-01 0.29868876 1.823938e-08 2.735329e-03
[3,] 1.734466e-02 0.04450630 8.894963e-01 8.369793e-01
[4,] 2.432347e-13 0.06869995 3.596712e-03 1.070624e-09
[5,] 4.454955e-01 0.07755810 7.126057e-04 6.108283e-01
[6,] 9.393170e-01 0.93460537 9.821318e-01 9.922992e-01

Unknown K

Then suppose we don’t know \(k\). Set \(K = 1:10\) and apply BIC and AIC.

fit2 = generic.cormotif(betahat,sebetahat,K=1:10,mess=FALSE)

we can check the BIC and AIC values obtained by all cluster numbers:

fit2$bic

       K      bic
 [1,]  1 69682.98
 [2,]  2 69257.00
 [3,]  3 69827.52
 [4,]  4 70506.85
 [5,]  5 71123.52
 [6,]  6 71690.41
 [7,]  7 72260.93
 [8,]  8 72936.01
 [9,]  9 73553.12
[10,] 10 74183.75

The AIC values:

fit2$aic

       K      aic
 [1,]  1 69213.74
 [2,]  2 68312.01
 [3,]  3 68406.77
 [4,]  4 68610.35
 [5,]  5 68751.26
 [6,]  6 68842.40
 [7,]  7 68937.16
 [8,]  8 69136.48
 [9,]  9 69277.84
[10,] 10 69432.71

\(K\) with the smallest BIC is

fit2$bestmotif$K

[1] 2

We can see that \(K=4\) doesn’t give the smallest BIC or AIC.

And the “best” estimated \(\pi\) is

fit2$bestmotif$pi

[1] 0.8096659 0.1903341

The loglikelihood values:

fit2$loglike

       K   loglike
 [1,]  1 -34534.87
 [2,]  2 -34011.01
 [3,]  3 -33985.39
 [4,]  4 -34014.18
 [5,]  5 -34011.63
 [6,]  6 -33984.20
 [7,]  7 -33958.58
 [8,]  8 -33985.24
 [9,]  9 -33982.92
[10,] 10 -33987.36

The estimated \(w\) is

fit2$bestmotif$W

[[1]]
             [,1]         [,2]         [,3]         [,4]         [,5]
[1,] 9.863418e-01 0.0001490823 0.0001115726 6.127758e-05 1.691367e-05
[2,] 1.211253e-12 0.0000000000 0.0000000000 4.090950e-12 2.266273e-01
             [,6]         [,7]        [,8]         [,9] [,10]      [,11]
[1,] 7.805352e-07 1.288723e-09 0.00000e+00 0.000000e+00     0 0.00000000
[2,] 2.180048e-01 0.000000e+00 1.59206e-13 5.817569e-14     0 0.07371411
         [,12]      [,13]        [,14] [,15] [,16] [,17] [,18]
[1,] 0.0000000 0.01331855 1.332268e-15     0     0     0     0
[2,] 0.1462514 0.26524495 7.015742e-02     0     0     0     0
            [,19]
[1,] 0.000000e+00
[2,] 2.491928e-08

[[2]]
          [,1]         [,2]        [,3]         [,4]         [,5]
[1,] 0.9996975 1.442457e-04 0.000101021 4.599266e-05 7.254852e-06
[2,] 0.0000000 5.329071e-15 0.000000000 0.000000e+00 5.773160e-15
             [,6]         [,7]        [,8]         [,9]        [,10]
[1,] 7.227194e-08 5.673240e-13 1.84297e-14 9.103829e-15 4.662937e-15
[2,] 0.000000e+00 7.771561e-15 0.00000e+00 0.000000e+00 8.336625e-03
            [,11]        [,12]        [,13]       [,14] [,15] [,16] [,17]
[1,] 1.712708e-07 0.000000e+00 3.778936e-06 0.000000000     0     0     0
[2,] 2.486900e-14 2.264855e-14 9.883931e-01 0.003270298     0     0     0
     [,18]        [,19]
[1,]     0 0.000000e+00
[2,]     0 2.284735e-10

[[3]]
          [,1]         [,2]         [,3]         [,4]         [,5]
[1,] 0.9944746 0.0001036397 5.348976e-05 1.399110e-05 8.877072e-07
[2,] 0.4026270 0.0001853133 2.280705e-06 5.096368e-12 1.969536e-13
             [,6]         [,7] [,8] [,9]        [,10]        [,11]
[1,] 2.569784e-09 0.000000e+00    0    0 0.000000e+00 0.000000e+00
[2,] 9.148238e-14 4.751755e-14    0    0 4.374279e-14 9.078076e-08
         [,12]       [,13]        [,14] [,15] [,16] [,17] [,18]
[1,] 0.0000000 0.005353401 3.330669e-15     0     0     0     0
[2,] 0.1414547 0.455730570 0.000000e+00     0     0     0     0
            [,19]
[1,] 0.000000e+00
[2,] 3.633456e-09

[[4]]
          [,1]         [,2]         [,3]         [,4]         [,5]
[1,] 0.9999106 6.868422e-05 1.761913e-05 4.114410e-08 1.755149e-06
[2,] 0.7175851 1.106274e-02 6.520545e-06 1.995773e-09 1.721512e-12
             [,6]         [,7] [,8]         [,9] [,10]     [,11] [,12]
[1,] 1.069038e-06 1.109516e-08    0 0.000000e+00     0 0.0000000     0
[2,] 8.004708e-13 2.939871e-13    0 8.171241e-14     0 0.1197596     0
         [,13]      [,14] [,15]        [,16] [,17] [,18] [,19]
[1,] 0.0000000 0.00000000     0 2.396953e-07     0     0     0
[2,] 0.1361554 0.01543059     0 0.000000e+00     0     0     0

Also, we could compute the lfdr:

head(fit2$lfdr)

             [,1]         [,2]         [,3]         [,4]
[1,] 6.304282e-05 6.335460e-05 5.518482e-01 1.223416e-16
[2,] 1.298596e-03 1.303816e-03 6.353908e-08 5.107575e-03
[3,] 2.192443e-02 1.172386e-01 7.142204e-01 7.377571e-01
[4,] 4.820681e-18 1.183634e-08 1.688913e-02 4.961997e-08
[5,] 9.741883e-02 9.774740e-02 8.503613e-04 7.736216e-01
[6,] 9.301289e-01 9.385385e-01 9.778549e-01 9.914780e-01

And lfsr:

head(fit2$lfsr)

             [,1]       [,2]         [,3]         [,4]
[1,] 3.144659e-01 0.09302265 6.001150e-01 0.000000e+00
[2,] 3.671532e-01 0.28837164 6.472917e-08 5.174014e-03
[3,] 2.255837e-02 0.13849877 7.842441e-01 7.569143e-01
[4,] 9.552944e-13 0.05749769 1.762843e-02 5.024915e-08
[5,] 4.885600e-01 0.16582965 8.513886e-04 7.957534e-01
[6,] 9.437495e-01 0.94371483 9.828817e-01 9.939640e-01

Discussion

Our approach can conduct accurate estimation with known \(K\), but without any information about \(K\), neither AIC and BIC could give accurate estimation of \(K\).

Session information

sessionInfo()

R version 3.3.3 (2017-03-06)
Platform: x86_64-redhat-linux-gnu (64-bit)
Running under: Scientific Linux 7.2 (Nitrogen)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  base     

other attached packages:
[1] SQUAREM_2016.8-2

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.12       codetools_0.2-15   lattice_0.20-34   
 [4] digest_0.6.12      rprojroot_1.2      MASS_7.3-45       
 [7] grid_3.3.3         MatrixModels_0.4-1 backports_1.1.1   
[10] git2r_0.19.0       magrittr_1.5       evaluate_0.10.1   
[13] coda_0.19-1        stringi_1.1.5      SparseM_1.77      
[16] Matrix_1.2-8       rmarkdown_1.6      tools_3.3.3       
[19] stringr_1.2.0      compiler_3.3.3     yaml_2.1.14       
[22] mcmc_0.9-5         htmltools_0.3.6    knitr_1.17        
[25] quantreg_5.33      MCMCpack_1.4-0     methods_3.3.3

This R Markdown site was created with workflowr