Last updated: 2017-02-22

Code version: 95e9860305126112b4781bb4184483c350b30004

The goal here is to illustrate the “adaptive” nature of the adaptive shrinkage. The shrinkage is adaptive in two senses. First, the amount of shrinkage depends on the distribution \(g\) of the true effects, which is learned from the data: when \(g\) is very peaked about zero then ash learns this and deduces that signals should be more strongly shrunk towards zero than when \(g\) is less peaked about zero. Second, the amount of shrinkage of each observation depends on its standard error: the smaller the standard error, the more informative the data, and so the less shrinkage that occurs. From an Empirical Bayesian perspective both of these points are entirely natural: the posterior depends on both the prior and the likelihood; the prior, \(g\), is learned from the data, and the likelihood incorporates the standard error of each observation.

First, we load the necessary libraries.

library(REBayes)
## Loading required package: Matrix
library(ashr)
library(ggplot2)
library(scales)

We simulate from two scenarios: in the first scenario, the effects are more peaked about zero (sim.spiky); in the second scenario, the effects are less peaked at zero (sim.bignormal). A summary of the two data sets is printed at the end of this chunk.

set.seed(100)
source("../code/dsc-shrink/datamakers/datamaker.R")

NSAMP = 1000
s     = 1/rgamma(NSAMP,5,5)

sim.spiky =
  rnormmix_datamaker(args = list(g = normalmix(c(0.4,0.2,0.2,0.2),
                                               c(0,0,0,0),
                                               c(0.25,0.5,1,2)),
                                  min_pi0   = 0,
                                  max_pi0   = 0,
                                  nsamp     = NSAMP,
                                  betahatsd = s))

sim.bignormal =
  rnormmix_datamaker(args = list(g         = normalmix(1,0,4),
                                 min_pi0   = 0,
                                 max_pi0   = 0,
                                 nsamp     = NSAMP,
                                 betahatsd = s))

cat("Summary of observed beta-hats:\n")
Summary of observed beta-hats:
print(rbind(spiky     = quantile(sim.spiky$input$betahat,seq(0,1,0.1)),
            bignormal = quantile(sim.bignormal$input$betahat,seq(0,1,0.1))),
      digits = 3)
              0%   10%   20%    30%    40%     50%   60%   70%  80%  90%
spiky      -6.86 -1.99 -1.19 -0.717 -0.326  0.0380 0.385 0.728 1.23 2.04
bignormal -14.26 -5.03 -3.48 -1.989 -0.984 -0.0941 1.026 2.141 3.63 5.61
          100%
spiky     15.2
bignormal 13.4

Now we run ash on both data sets.

beta.spiky.ash     = ash(sim.spiky$input$betahat,s)
beta.bignormal.ash = ash(sim.bignormal$input$betahat,s)

Next we plot the shrunken estimates against the observed values, colored according to the (square root of) precision: precise estimates being colored red, and less precise estimates being blue. Two key features of the plots illustrate the ideas of adaptive shrinkage: i) the estimates under the spiky scenario are shrunk more strongly, illustrating that shrinkage adapts to the underlying distribution of beta; ii) in both cases, estimates with large standard error (blue) are shrunk more than estimates with small standard error (red) illustrating that shrinkage adapts to measurement precision.

make_df_for_ashplot =
  function (sim1, sim2, ash1, ash2, name1 = "spiky", name2 = "big-normal") {
    n = length(sim1$input$betahat)
    x = c(get_lfsr(ash1),get_lfsr(ash2))
    return(data.frame(betahat  = c(sim1$input$betahat,sim2$input$betahat),
                      beta_est = c(get_pm(ash1),get_pm(ash2)),
                      lfsr     = x,
                      s        = c(sim1$input$sebetahat,sim2$input$sebetahat),
                      scenario = c(rep(name1,n),rep(name2,n)),
                      signif   = x < 0.05))
  }

ashplot = function(df,xlab="Observed beta-hat",ylab="Shrunken beta estimate")
  ggplot(df,aes(x = betahat,y = beta_est,color = 1/s)) +
    xlab(xlab) + ylab(ylab) + geom_point() +
    facet_grid(.~scenario) +
    geom_abline(intercept = 0,slope = 1,linetype = "dotted") +
    scale_colour_gradient2(midpoint = median(1/s),low = "blue",
                           mid = "white",high = "red",space = "Lab") +
    coord_fixed(ratio = 1)

df = make_df_for_ashplot(sim.spiky,sim.bignormal,beta.spiky.ash,
                         beta.bignormal.ash)
print(ashplot(df))

Now plot lfsr against z scores, colored according to the (square root of) precision.

z_lfsr_plot = function(df,ylab = "Observed Z score",xlab = "lfsr")
  ggplot(df,aes(x = lfsr,y = betahat/s,color = 1/s)) +
    xlab(xlab) + ylab(ylab) + geom_point() + facet_grid(.~scenario) +
    scale_colour_gradient2(midpoint = median(1/s),low = "blue",
                           mid = "white",high = "red",space = "Lab")

print(z_lfsr_plot(df))

A related consequence is that significance of each observation is no longer monotonic with \(p\) value.

pval_plot = function (df)
  ggplot(df,aes(x = pnorm(-abs(betahat/s)),y = lfsr,color = log(s))) +
  geom_point() + facet_grid(.~scenario) + xlim(c(0,0.025)) +
  xlab("p value") + ylab("lfsr") +
  scale_colour_gradient2(midpoint = 0,low = "red",
                         mid = "white",high = "blue")

print(pval_plot(df))
Warning: Removed 1274 rows containing missing values (geom_point).

Let’s see how these are affected by changing the modelling assumptions so that the standardized beta are exchangeable (rather than the beta being exchangeable).

beta.bignormal.ash.ET =
  ash(sim.bignormal$input$betahat,s,alpha = 1,mixcompdist = "normal")
beta.spiky.ash.ET =
  ash(sim.spiky$input$betahat,s,alpha = 1,mixcompdist = "normal")
df.ET = make_df_for_ashplot(sim.spiky,sim.bignormal,beta.spiky.ash.ET,
                            beta.bignormal.ash.ET)
ashplot(df.ET,ylab = "Shrunken beta estimate (ET model)")

pval_plot(df.ET)
Warning: Removed 1274 rows containing missing values (geom_point).

This is a “volcano plot” showing effect size against p value. The blue points are “significant” in that they have lfsr < 0.05.

print(ggplot(df,aes(x = betahat,y = -log10(2*pnorm(-abs(betahat/s))),
                    col = signif)) +
  geom_point(alpha = 1,size = 1.75) + facet_grid(.~scenario) +
  theme(legend.position = "none") + xlim(c(-10,10)) + ylim(c(0,15)) +
  xlab("Effect (beta)") + ylab("-log10 p-value"))
Warning: Removed 89 rows containing missing values (geom_point).

In this case the significance by lfsr is not quite the same as cutting off at a given p value (you can see that the decision boundary is not quite the same as drawing a horizontal line), but also not that different, presumably because the standard errors, although varying across observations, do not vary greatly.

hist(s,main = "histogram of standard errors")

print(summary(s))
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  0.345   0.800   1.060   1.260   1.520   6.650 

Session information.

print(sessionInfo())
R version 3.3.2 (2016-10-31)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: macOS Sierra 10.12.3

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] knitr_1.15.1   scales_0.4.1   ggplot2_2.2.1  ashr_2.1.3    
[5] REBayes_0.63   Matrix_1.2-7.1

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.9       magrittr_1.5      MASS_7.3-45      
 [4] munsell_0.4.3     doParallel_1.0.10 pscl_1.4.9       
 [7] colorspace_1.2-6  SQUAREM_2016.8-2  lattice_0.20-34  
[10] foreach_1.4.3     plyr_1.8.4        stringr_1.1.0    
[13] tools_3.3.2       parallel_3.3.2    grid_3.3.2       
[16] gtable_0.2.0      htmltools_0.3.5   iterators_1.0.8  
[19] assertthat_0.1    lazyeval_0.2.0    yaml_2.1.14      
[22] rprojroot_1.2     digest_0.6.10     tibble_1.2       
[25] reshape2_1.4.2    codetools_0.2-15  evaluate_0.10    
[28] rmarkdown_1.3     labeling_0.3      stringi_1.1.2    
[31] Rmosek_7.1.3      backports_1.0.5   truncnorm_1.0-7