Last updated: 2016-12-09

Code version: 263cb495f719be371368abbbfa59f066e8482fba

A referee reported a simulation of 10,000 hypotheses, a fraction 0.8 are null (\(\mu_i = 0\)), and a fraction 0.2 are alternative with effect \(\mu_i \sim N(0,1)\). For each effect \(i\) there are n=30 observations from \(x_{i,j} \sim N(\mu_i,1)\) and a one-sample t test is used to compute a \(p\) value. The referee correctly notes that these simulations i) obey the unimodal assumption (UA), and ii) have the property that “most” \(z\) scores near 0 are null (Efron’s “Zero Assumption”; ZA). The motivation is to demonstrate that the UA and the ZA are not contradictory.

I agree with this, if we define the ZA as “most” \(z\) scores near 0 are null. However, in practice the existing methods effectively make a stronger assumption (or at least, behave as if they do): that all \(z\) scores near 0 are null. Consequently they end up creating a hole in the distribution of alternative \(z\) scores at 0. Here we illustrate this by applying qvalue to this scenario.

First, we load the necessary libraries, and specify settings for displaying the plots in the rendered document.


First, we generate a small data set.

mu = c(rep(0,8000),rnorm(2000))
x = matrix(rnorm(30*10000),nrow=10000,ncol=30)+mu
ttest.pval = function(x){return(t.test(x)$p.value)}
x.p = apply(x,1,ttest.pval)

Here is how qvalue (arguably, implicitly) decomposes the distribution of p values into null and alternative

lfdr.qv = qvalue::qvalue(x.p)$lfdr
altnullhist(x.p,lfdr.qv,main="p values: qvalue",ncz=40,xlab="p value",cex.axis=0.8)

And here is the corresponding plot in z score space. Note the “hole” in the distribution of \(z\) scores at 0 under the alternative.

ttest.est = function(x){return(t.test(x)$estimate)}
x.est = apply(x,1,ttest.est)
zscore = x.est/(1/sqrt(30))
nullalthist(zscore,lfdr.qv,main="qvalue's implicit partition of z \n into null and alternative",ncz=60,cex.axis=0.8,ylim=c(0,0.3),xlim=c(-6,6),xlab="z score")

For comparison here is the decomposition, using ash, of \(z\) scores into null and alternative components.

nullalthist(zscore,lfdr.ash,main="ash partition of z \n into null and alternative",ncz=60,cex.axis=0.8,ylim=c(0,0.3),xlim=c(-6,6),xlab="z score")

For completeness here is (roughly) the ``true" decomposition of \(z\) scores into null and alternative components.

trueg = ashr::normalmix(c(0.8,0.2),c(0,0),c(0,1))
nullalthist(zscore,lfdr.true,main="true partition of z \n into null and alternative",ncz=60,cex.axis=0.8,ylim=c(0,0.3),xlim=c(-6,6),xlab="z score")


In estimating \(\pi_0\) qvalue assumes that all p values near 1 are null. This is equivalent to assuming that all z scores near 0 are null. Thus, despite the fact that qvalue does not explicitly model the distribution of \(p\) values or \(z\) scores under the alternative, its way of estimating \(\pi_0\) necessarily creates a hole in the \(z\) score distribution at 0 under the alternative.

Session information

R version 3.3.1 (2016-06-21)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.12.1 (Sierra)

[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] ashr_2.0.4 knitr_1.14

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.8       magrittr_1.5      REBayes_0.63     
 [4] MASS_7.3-45       splines_3.3.1     pscl_1.4.9       
 [7] munsell_0.4.3     doParallel_1.0.10 lattice_0.20-33  
[10] SQUAREM_2016.8-2  colorspace_1.2-6  foreach_1.4.3    
[13] stringr_1.0.0     plyr_1.8.4        tools_3.3.1      
[16] parallel_3.3.1    grid_3.3.1        gtable_0.2.0     
[19] iterators_1.0.8   htmltools_0.3.5   assertthat_0.1   
[22] yaml_2.1.13       rprojroot_1.1     digest_0.6.10    
[25] Matrix_1.2-6      reshape2_1.4.1    ggplot2_2.1.0    
[28] formatR_1.4       codetools_0.2-14  qvalue_2.4.2     
[31] evaluate_0.9      rmarkdown_1.2     stringi_1.1.1    
[34] Rmosek_7.1.3      scales_0.4.0      backports_1.0.4  
[37] truncnorm_1.0-7