Last updated: 2017-01-15

Code version: 94a3347615361216b39b8a9333bfa8354794e0ff

First, we load the necessary libraries.

library(ashr)
library(qvalue)
library(locfdr)

Introduction

This example comes from Efron (2008) p16 when examining the false coverage rate (FCR). I selected this example because the distribution of the non-zero effects is highly assymetric and not at all unimodal at zero, both issues a referee asked me to elaborate on. Specifically, the distribution of the non-zero effects is N(-3,1). Here I simulate data, and apply ash (with the “halfuniform” option to allow for asymmetric g).

set.seed(10)
nsamp   = 1e4
altmean = -3
mu0     = rep(0,nsamp)
mu1     = rnorm(nsamp,altmean,1)
comp    = rbinom(nsamp,1,0.1)
mu      = ifelse(comp == 0,mu0,mu1)
z       = rnorm(nsamp,mu,1)

# Fit the model.
res.ash = ash(z,1,mixcompdist = "halfuniform")

We can also run ash with the “true” g, to allow us to compare the lfsr, lfdr, etc.

true.g       = normalmix(c(0.9,0.1),c(0,-3),c(0,1))
res.ash.true = ash(z,1,g = true.g,fixg = TRUE)

Here we can see how the partition of \(z\) scores compares with the truth. Note the effect of the unimodal assumption is to extend the inferred alternative distribution toward 0. Here, nullalthist plots a histogram of z scores, highlighting the alternative distribution of z scores that is implied by the localfdr values lfdr.

source("../R/nullalthist.R")
par(mfcol = c(2,1),cex.main = 1,font.main = 1)
nullalthist(z,lfdr = get_lfdr(res.ash.true),main = "true partition (res.ash.true)")
nullalthist(z,lfdr = get_lfdr(res.ash),main = "inferred partition (res.ash)")

Comparing the inferred posterior means, lfdr, and lfsr with the true values of these quantities, we find reassuringly good correspondence.

par(mfrow = c(1,3))
plot(get_pm(res.ash.true),get_pm(res.ash),xlab = "Truth",ylab = "ash.hu",
     main = "Posterior Mean (inferred vs truth)",pch = 20,cex = 0.6,
     xlim = c(-6,1),ylim = c(-6,1))
abline(a = 0,b = 1,col = "orangered",lwd = 2,lty = "dotted")

plot(get_lfdr(res.ash.true),get_lfdr(res.ash),xlab = "Truth", ylab = "ash.hu",
     main = "lfdr (inferred vs truth)",pch = 20,cex = 0.6,
     xlim = c(0,1),ylim = c(0,1))
abline(a = 0,b = 1,col = "orangered",lwd = 2,lty = "dotted")

plot(get_lfsr(res.ash.true),get_lfsr(res.ash),xlab = "Truth", ylab = "ash.hu",
     main = "lfsr (inferred vs truth)",pch = 20,cex = 0.6,
     xlim = c(0,1),ylim = c(0,1))
abline(a = 0,b = 1,col = "orangered",lwd = 2,lty = "dotted")

Comparison with qvalue and locfdr

We can also run qvalue and locfdr. We see that locfdr perhaps performs a bit better than ash for the decomposition here, but the estimated local fdrs are pretty similar. Here qvalue does less well because of the asymmetry which it didn’t take account of.

res.locfdr = locfdr(z,nulltype = 0)

res.qvalue = qvalue(p = pchisq(z^2,df = 1,lower.tail = FALSE))

par(mfrow = c(1,3))
plot(get_lfdr(res.ash.true),get_lfdr(res.ash),pch = 20,cex = 0.6,
     xlab = "Truth (lfdr)",ylab = "ash.hu",main = "ash.hu",
     xlim = c(0,1),ylim = c(0,1))
abline(a = 0,b = 1,col = "orangered",lty = "dotted",lwd = 2)

plot(get_lfdr(res.ash.true),res.locfdr$fdr,pch = 20,cex = 0.6,
     xlab = "Truth (lfdr)",ylab = "Estimate",main = "locfdr",
     xlim = c(0,1),ylim = c(0,1))
abline(a = 0,b = 1,lwd = 2,col = "orangered",lty = "dotted")

plot(get_lfdr(res.ash.true),res.qvalue$lfdr,pch = 20,cex = 0.6,
     xlab = "Truth (lfdr)",ylab = "Estimate",main = "qvalue",
     xlim = c(0,1),ylim = c(0,1))
abline(a = 0,b = 1,lwd = 2,col = "orangered",lty = "dotted")

Uniform tail curtails Credible Intervals

The following plot compares the (symmetric-tail) 95% CIs from ash (red) for the “significant” observations with Bayes rule (green), similar to Figure 8 from Efron. Note that the lower 97.5% point is pretty accurate, but the upper 97.5% point is curtailed - presumably due, at least in part, to the short tails of the uniform mixture.

CImatrix = ashci(res.ash,level = 0.95)
BayesComparePlot = function(CImatrix, altmean = -3,...) {
  plot(z,mu,xlim = c(-8,0),pch = 20,cex = 0.6,...)
  points(z,CImatrix[,1],col = "orangered",cex = 0.6,pch = 20)
  points(z,CImatrix[,2],col = "orangered",cex = 0.6,pch = 20)

  fdr   = 0.9*dnorm(z)/(0.9*dnorm(z) + 0.1*dnorm(z,altmean,sqrt(2)))
  o     = order(z)
  upper = ifelse(fdr[o] < 0.025,
                 (z[o] + altmean)/2 + qnorm(0.975 + fdr[o])/sqrt(2),0)
  lines(z[o],upper,col = "limegreen",lwd = 2)
  lines(z[o],(z[o] + altmean)/2 - qnorm(0.975)/sqrt(2),col = "limegreen",lwd = 2)
  abline(v = max(z[fdr < 0.05]))
}
par(mfrow = c(1,1))
BayesComparePlot(CImatrix,
  main = "CIs for highly asymmetric and non-unimodal-at-zero data")

Variational version

Although not a focus of the paper, ash does have an option to do variational inference for the mixture components (with a Dirichlet prior). In practice this approach usually ends up spreading the posterior mass up more among the mixture components. It seemed plausible that this might lead to slightly less extreme tail behaviour than above (because the model will put a little more weight on the uniforms with larger variance, which are essentially set to zero in the above).

res.ash.VB  = ash(z,1,mixcompdist = "halfuniform",optmethod = "mixVBEM")
CImatrix.VB = ashci(res.ash.VB,level = 0.95)

Again, we can compare results with Bayes rule:

BayesComparePlot(CImatrix.VB,
  main = paste("CIs for highly asymmetric and non-unimodal-at-zero data\n",
              "Variational Version"))

Session information

sessionInfo()
R version 3.3.2 (2016-10-31)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: macOS Sierra 10.12.2

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] knitr_1.15.1 locfdr_1.1-8 qvalue_2.4.2 ashr_2.0.5  

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.8       magrittr_1.5      REBayes_0.63     
 [4] splines_3.3.2     MASS_7.3-45       munsell_0.4.3    
 [7] doParallel_1.0.10 pscl_1.4.9        colorspace_1.2-6 
[10] SQUAREM_2016.8-2  lattice_0.20-34   foreach_1.4.3    
[13] plyr_1.8.4        stringr_1.1.0     tools_3.3.2      
[16] parallel_3.3.2    grid_3.3.2        gtable_0.2.0     
[19] htmltools_0.3.5   iterators_1.0.8   assertthat_0.1   
[22] lazyeval_0.2.0    yaml_2.1.14       rprojroot_1.1    
[25] digest_0.6.10     tibble_1.2        Matrix_1.2-7.1   
[28] reshape2_1.4.2    ggplot2_2.2.0     codetools_0.2-15 
[31] evaluate_0.10     rmarkdown_1.3     stringi_1.1.2    
[34] Rmosek_7.1.3      scales_0.4.1      backports_1.0.4  
[37] truncnorm_1.0-7