Last updated: 2017-09-27
Code version: 2fa73e6
We set unit number \(n = 5000\), and study number \(R = 4\). We set every observation has the same standard error, \(s_{jr}=1\). That is \(\hat\beta_{jr}|\beta_{jr}\sim N(\beta_{jr};0,1)\). The 5000 units come from 4 patterns (\(K=4\)): 4000 units have zero effects in all four studies, that is \(\beta_{jr}=0\), for \(r = 1,2,3,4\); 400 units have effect \(\beta_{jr}\sim N(0,4^2)\) for \(r =1, 2\) and \(\beta_{jr}=0\) for \(r =3, 4\); 400 units have effect \(\beta_{jr}\sim N(0,4^2)\) for \(r =2, 3\) and \(\beta_{jr}=0\) for \(r =1, 4\); 200 unitss have effect \(\beta_{jr}\sim N(0,4^2)\) for \(r =1, 2,3,4\).
First assume we know \(K=4\) and fit the model.
source('../code/function.R')
K = 4
D = 4
G = 5000
sigma2 = c(16,16,16,16)
pi0 = c(0.04,0.08,0.08,0.8)
q0 = matrix(0,nrow=K,ncol=D)
q0[1,] = c(1,1,1,1)
q0[2,] = c(1,1,0,0)
q0[3,] = c(0,1,1,0)
q0[4,] = c(0,0,0,0)
X = matrix(0,nrow=G,ncol=D)
rows = rep(1:K,times=pi0*G)
A = q0[rows,]
set.seed(111)
for(g in 1:G){
for(d in 1:D){
if(A[g,d]==1){
X[g,d] = rnorm(1,0,sqrt(1+sigma2[d]))
} else{
X[g,d] = rnorm(1,0,1)
}
}
}
betahat=X
sebetahat=matrix(1,nrow=G,ncol=D)
# fit the model
fit1 = generic.cormotif(betahat,sebetahat,K=4,mess=FALSE)
After fit the generic cormotif model, the standard deviation \(\sigma_0,\sigma_1,\ldots,\sigma_L\) built by our method is
fit1$mixsd
[1] 0.00000000 0.09227791 0.13050068 0.18455582 0.26100135
[6] 0.36911165 0.52200270 0.73822330 1.04400540 1.47644660
[11] 2.08801080 2.95289319 4.17602160 5.90578638 8.35204320
[16] 11.81157277 16.70408640 23.62314554 33.40817280
The estimated \(\pi\) for \(K = 4\) is
fit1$bestmotif$pi
[1] 0.79871274 0.03963030 0.07958924 0.08206772
Then we could check the estimation for \(g\), i.e. the values of \(w\), which is a \(K\times L\) matrix.
The estimated \(w\) for studies 1 is
fit1$bestmotif$W[[1]]
[,1] [,2] [,3] [,4] [,5]
[1,] 9.998756e-01 8.067845e-05 3.112463e-05 3.994159e-06 2.636422e-08
[2,] 0.000000e+00 0.000000e+00 2.782219e-13 2.608497e-06 7.416840e-02
[3,] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
[4,] 3.932410e-13 1.539324e-11 5.520029e-13 4.203118e-08 8.245025e-01
[,6] [,7] [,8] [,9] [,10]
[1,] 0.000000e+00 3.996803e-15 0.000000e+00 5.467503e-08 0.000000e+00
[2,] 6.217249e-15 5.329071e-15 3.330669e-15 0.000000e+00 4.440892e-15
[3,] 0.000000e+00 1.998401e-15 0.000000e+00 0.000000e+00 0.000000e+00
[4,] 2.384626e-05 0.000000e+00 0.000000e+00 1.749625e-01 0.000000e+00
[,11] [,12] [,13] [,14] [,15] [,16]
[1,] 0.0000000 0.0000000000 8.477791e-06 0.000000e+00 0 2.220446e-16
[2,] 0.0000000 0.2448840449 6.809447e-01 6.661338e-16 0 0.000000e+00
[3,] 0.2989694 0.0000000000 5.619378e-01 1.390928e-01 0 0.000000e+00
[4,] 0.0000000 0.0005111344 0.000000e+00 2.220446e-15 0 0.000000e+00
[,17] [,18] [,19]
[1,] 0 0 0.000000e+00
[2,] 0 0 2.174175e-07
[3,] 0 0 0.000000e+00
[4,] 0 0 0.000000e+00
The estimated \(w\) for studies 2 is
fit1$bestmotif$W[[2]]
[,1] [,2] [,3] [,4] [,5]
[1,] 9.998313e-01 9.879179e-05 4.818813e-05 1.113090e-05 5.173824e-07
[2,] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
[3,] 0.000000e+00 0.000000e+00 0.000000e+00 6.661338e-16 0.000000e+00
[4,] 2.442491e-15 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
[,6] [,7] [,8] [,9] [,10] [,11]
[1,] 4.659209e-10 0 3.552714e-15 1.998401e-15 0.0000000 8.881784e-16
[2,] 0.000000e+00 0 0.000000e+00 0.000000e+00 0.2493882 0.000000e+00
[3,] 2.442491e-15 0 0.000000e+00 0.000000e+00 0.1249867 0.000000e+00
[4,] 0.000000e+00 0 0.000000e+00 0.000000e+00 0.0000000 0.000000e+00
[,12] [,13] [,14] [,15] [,16]
[1,] 1.332268e-15 1.008336e-05 5.676348e-12 2.220446e-16 0
[2,] 0.000000e+00 7.506118e-01 4.440892e-15 0.000000e+00 0
[3,] 1.003605e-04 8.749130e-01 8.881784e-16 0.000000e+00 0
[4,] 0.000000e+00 9.375195e-01 6.248048e-02 0.000000e+00 0
[,17] [,18] [,19]
[1,] 0.000000e+00 0.000000e+00 0.000000e+00
[2,] 0.000000e+00 0.000000e+00 0.000000e+00
[3,] 2.220446e-16 0.000000e+00 0.000000e+00
[4,] 0.000000e+00 2.220446e-16 2.403162e-08
The estimated \(w\) for studies 3 is
fit1$bestmotif$W[[3]]
[,1] [,2] [,3] [,4] [,5]
[1,] 9.999050e-01 6.847827e-05 2.172051e-05 1.551853e-06 4.012694e-09
[2,] 1.762806e-05 1.053651e-08 3.743262e-09 0.000000e+00 0.000000e+00
[3,] 9.999107e-01 8.764284e-05 2.604859e-10 8.726353e-14 4.507505e-14
[4,] 3.708948e-02 1.355611e-05 4.064208e-08 2.384759e-13 4.662937e-14
[,6] [,7] [,8] [,9] [,10]
[1,] 2.881384e-11 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
[2,] 0.000000e+00 3.552714e-15 0.000000e+00 0.000000e+00 0.000000e+00
[3,] 2.065015e-14 1.088019e-14 5.995204e-15 3.330669e-15 2.220446e-15
[4,] 6.705747e-14 2.464695e-14 2.065015e-14 0.000000e+00 0.000000e+00
[,11] [,12] [,13] [,14] [,15] [,16] [,17]
[1,] 0 0.000000e+00 3.953720e-07 2.844806e-06 0.000000e+00 0 0
[2,] 0 0.000000e+00 9.999824e-01 0.000000e+00 0.000000e+00 0 0
[3,] 0 2.693303e-08 1.585607e-06 0.000000e+00 0.000000e+00 0 0
[4,] 0 3.353433e-01 6.275536e-01 0.000000e+00 6.661338e-16 0 0
[,18] [,19]
[1,] 0 0.000000e+00
[2,] 0 0.000000e+00
[3,] 0 0.000000e+00
[4,] 0 4.304066e-08
The estimated \(w\) for studies 4 is
fit1$bestmotif$W[[4]]
[,1] [,2] [,3] [,4] [,5]
[1,] 0.9998889 7.809257e-05 2.915675e-05 3.499558e-06 1.945535e-08
[2,] 0.0000000 0.000000e+00 0.000000e+00 0.000000e+00 1.110223e-15
[3,] 0.9571039 5.654011e-04 4.764297e-08 2.580158e-13 1.330047e-13
[4,] 0.5935001 1.403105e-02 3.167747e-05 7.184842e-08 7.521539e-12
[,6] [,7] [,8] [,9] [,10]
[1,] 3.187450e-12 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
[2,] 1.110223e-15 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
[3,] 0.000000e+00 3.796963e-14 2.176037e-14 9.547918e-15 9.769963e-15
[4,] 3.732009e-01 0.000000e+00 4.996004e-14 0.000000e+00 0.000000e+00
[,11] [,12] [,13] [,14] [,15] [,16]
[1,] 2.220446e-16 0.00000e+00 8.340217e-12 3.613825e-07 0 0
[2,] 3.177552e-01 0.00000e+00 6.822448e-01 0.000000e+00 0 0
[3,] 0.000000e+00 0.00000e+00 4.233069e-02 0.000000e+00 0 0
[4,] 2.287059e-14 2.88658e-15 0.000000e+00 1.923619e-02 0 0
[,17] [,18] [,19]
[1,] 2.220446e-16 0 0.000000e+00
[2,] 0.000000e+00 0 1.132427e-14
[3,] 0.000000e+00 0 0.000000e+00
[4,] 0.000000e+00 0 2.165357e-11
All of them matches well with our setting, which verifies our algorithm.
Also, we could compute the lfdr:
head(fit1$lfdr)
[,1] [,2] [,3] [,4]
[1,] 1.088245e-04 1.088195e-04 1.569545e-01 1.789218e-16
[2,] 7.257377e-07 7.257067e-07 1.725544e-08 2.380520e-03
[3,] 1.700067e-02 1.704622e-02 8.539593e-01 8.262259e-01
[4,] 1.681810e-25 4.153149e-20 2.906425e-03 7.648126e-10
[5,] 7.439623e-04 7.439258e-04 7.115568e-04 4.926416e-01
[6,] 9.283226e-01 9.282822e-01 9.768985e-01 9.877510e-01
And lfsr:
head(fit1$lfsr)
[,1] [,2] [,3] [,4]
[1,] 2.392062e-01 0.10315753 2.465397e-01 3.330669e-16
[2,] 2.923302e-01 0.29868876 1.823938e-08 2.735329e-03
[3,] 1.734466e-02 0.04450630 8.894963e-01 8.369793e-01
[4,] 2.432347e-13 0.06869995 3.596712e-03 1.070624e-09
[5,] 4.454955e-01 0.07755810 7.126057e-04 6.108283e-01
[6,] 9.393170e-01 0.93460537 9.821318e-01 9.922992e-01
Then suppose we don’t know \(k\). Set \(K = 1:10\) and apply BIC and AIC.
fit2 = generic.cormotif(betahat,sebetahat,K=1:10,mess=FALSE)
we can check the BIC and AIC values obtained by all cluster numbers:
fit2$bic
K bic
[1,] 1 69682.98
[2,] 2 69257.00
[3,] 3 69827.52
[4,] 4 70506.85
[5,] 5 71123.52
[6,] 6 71690.41
[7,] 7 72260.93
[8,] 8 72936.01
[9,] 9 73553.12
[10,] 10 74183.75
The AIC values:
fit2$aic
K aic
[1,] 1 69213.74
[2,] 2 68312.01
[3,] 3 68406.77
[4,] 4 68610.35
[5,] 5 68751.26
[6,] 6 68842.40
[7,] 7 68937.16
[8,] 8 69136.48
[9,] 9 69277.84
[10,] 10 69432.71
\(K\) with the smallest BIC is
fit2$bestmotif$K
[1] 2
We can see that \(K=4\) doesn’t give the smallest BIC or AIC.
And the “best” estimated \(\pi\) is
fit2$bestmotif$pi
[1] 0.8096659 0.1903341
The loglikelihood values:
fit2$loglike
K loglike
[1,] 1 -34534.87
[2,] 2 -34011.01
[3,] 3 -33985.39
[4,] 4 -34014.18
[5,] 5 -34011.63
[6,] 6 -33984.20
[7,] 7 -33958.58
[8,] 8 -33985.24
[9,] 9 -33982.92
[10,] 10 -33987.36
The estimated \(w\) is
fit2$bestmotif$W
[[1]]
[,1] [,2] [,3] [,4] [,5]
[1,] 9.863418e-01 0.0001490823 0.0001115726 6.127758e-05 1.691367e-05
[2,] 1.211253e-12 0.0000000000 0.0000000000 4.090950e-12 2.266273e-01
[,6] [,7] [,8] [,9] [,10] [,11]
[1,] 7.805352e-07 1.288723e-09 0.00000e+00 0.000000e+00 0 0.00000000
[2,] 2.180048e-01 0.000000e+00 1.59206e-13 5.817569e-14 0 0.07371411
[,12] [,13] [,14] [,15] [,16] [,17] [,18]
[1,] 0.0000000 0.01331855 1.332268e-15 0 0 0 0
[2,] 0.1462514 0.26524495 7.015742e-02 0 0 0 0
[,19]
[1,] 0.000000e+00
[2,] 2.491928e-08
[[2]]
[,1] [,2] [,3] [,4] [,5]
[1,] 0.9996975 1.442457e-04 0.000101021 4.599266e-05 7.254852e-06
[2,] 0.0000000 5.329071e-15 0.000000000 0.000000e+00 5.773160e-15
[,6] [,7] [,8] [,9] [,10]
[1,] 7.227194e-08 5.673240e-13 1.84297e-14 9.103829e-15 4.662937e-15
[2,] 0.000000e+00 7.771561e-15 0.00000e+00 0.000000e+00 8.336625e-03
[,11] [,12] [,13] [,14] [,15] [,16] [,17]
[1,] 1.712708e-07 0.000000e+00 3.778936e-06 0.000000000 0 0 0
[2,] 2.486900e-14 2.264855e-14 9.883931e-01 0.003270298 0 0 0
[,18] [,19]
[1,] 0 0.000000e+00
[2,] 0 2.284735e-10
[[3]]
[,1] [,2] [,3] [,4] [,5]
[1,] 0.9944746 0.0001036397 5.348976e-05 1.399110e-05 8.877072e-07
[2,] 0.4026270 0.0001853133 2.280705e-06 5.096368e-12 1.969536e-13
[,6] [,7] [,8] [,9] [,10] [,11]
[1,] 2.569784e-09 0.000000e+00 0 0 0.000000e+00 0.000000e+00
[2,] 9.148238e-14 4.751755e-14 0 0 4.374279e-14 9.078076e-08
[,12] [,13] [,14] [,15] [,16] [,17] [,18]
[1,] 0.0000000 0.005353401 3.330669e-15 0 0 0 0
[2,] 0.1414547 0.455730570 0.000000e+00 0 0 0 0
[,19]
[1,] 0.000000e+00
[2,] 3.633456e-09
[[4]]
[,1] [,2] [,3] [,4] [,5]
[1,] 0.9999106 6.868422e-05 1.761913e-05 4.114410e-08 1.755149e-06
[2,] 0.7175851 1.106274e-02 6.520545e-06 1.995773e-09 1.721512e-12
[,6] [,7] [,8] [,9] [,10] [,11] [,12]
[1,] 1.069038e-06 1.109516e-08 0 0.000000e+00 0 0.0000000 0
[2,] 8.004708e-13 2.939871e-13 0 8.171241e-14 0 0.1197596 0
[,13] [,14] [,15] [,16] [,17] [,18] [,19]
[1,] 0.0000000 0.00000000 0 2.396953e-07 0 0 0
[2,] 0.1361554 0.01543059 0 0.000000e+00 0 0 0
Also, we could compute the lfdr:
head(fit2$lfdr)
[,1] [,2] [,3] [,4]
[1,] 6.304282e-05 6.335460e-05 5.518482e-01 1.223416e-16
[2,] 1.298596e-03 1.303816e-03 6.353908e-08 5.107575e-03
[3,] 2.192443e-02 1.172386e-01 7.142204e-01 7.377571e-01
[4,] 4.820681e-18 1.183634e-08 1.688913e-02 4.961997e-08
[5,] 9.741883e-02 9.774740e-02 8.503613e-04 7.736216e-01
[6,] 9.301289e-01 9.385385e-01 9.778549e-01 9.914780e-01
And lfsr:
head(fit2$lfsr)
[,1] [,2] [,3] [,4]
[1,] 3.144659e-01 0.09302265 6.001150e-01 0.000000e+00
[2,] 3.671532e-01 0.28837164 6.472917e-08 5.174014e-03
[3,] 2.255837e-02 0.13849877 7.842441e-01 7.569143e-01
[4,] 9.552944e-13 0.05749769 1.762843e-02 5.024915e-08
[5,] 4.885600e-01 0.16582965 8.513886e-04 7.957534e-01
[6,] 9.437495e-01 0.94371483 9.828817e-01 9.939640e-01
Our approach can conduct accurate estimation with known \(K\), but without any information about \(K\), neither AIC and BIC could give accurate estimation of \(K\).
sessionInfo()
R version 3.3.3 (2017-03-06)
Platform: x86_64-redhat-linux-gnu (64-bit)
Running under: Scientific Linux 7.2 (Nitrogen)
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets base
other attached packages:
[1] SQUAREM_2016.8-2
loaded via a namespace (and not attached):
[1] Rcpp_0.12.12 codetools_0.2-15 lattice_0.20-34
[4] digest_0.6.12 rprojroot_1.2 MASS_7.3-45
[7] grid_3.3.3 MatrixModels_0.4-1 backports_1.1.1
[10] git2r_0.19.0 magrittr_1.5 evaluate_0.10.1
[13] coda_0.19-1 stringi_1.1.5 SparseM_1.77
[16] Matrix_1.2-8 rmarkdown_1.6 tools_3.3.3
[19] stringr_1.2.0 compiler_3.3.3 yaml_2.1.14
[22] mcmc_0.9-5 htmltools_0.3.6 knitr_1.17
[25] quantreg_5.33 MCMCpack_1.4-0 methods_3.3.3
This R Markdown site was created with workflowr