vignettes/flash_mash.Rmd
flash_mash.Rmd
This is continuation of the eQTL analysis vignette. In that vignette we have used PCA to compute data driven covariances. Here we demonstrate the use of additional data driven covariance, via flash decomposition.
Same as the eQTL analysis vignette we simulate a toy data-set,
library(ashr)
library(mashr)
set.seed(1)
simdata = simple_sims(10000,5,1) # simulates data on 40k tests
# identify a subset of strong tests
m.1by1 = mash_1by1(mash_set_data(simdata$Bhat,simdata$Shat))
strong.subset = get_significant_results(m.1by1,0.05)
# identify a random subset of 5000 tests
random.subset = sample(1:nrow(simdata$Bhat),5000)
and create random
and strong
sets,
data.temp = mash_set_data(simdata$Bhat[random.subset,],simdata$Shat[random.subset,])
Vhat = estimate_null_correlation_simple(data.temp)
rm(data.temp)
data.random = mash_set_data(simdata$Bhat[random.subset,],simdata$Shat[random.subset,],V=Vhat)
data.strong = mash_set_data(simdata$Bhat[strong.subset,],simdata$Shat[strong.subset,], V=Vhat)
We first perform the empirical Bayes matrix factorization via
flashr
using its default settings,
U.f = cov_flash(data.strong)
U.f
Alternatively, as suggested
by Jason Willwerscheid for multi-tissue QTL studies, constrain
factors to non-negative values. We can use tag
parameter to
customize the names in the output list:
U.f = cov_flash(data.strong, factors="nonneg", tag="non_neg")
U.f
U.pca = cov_pca(data.strong, 5)
U.ed = cov_ed(data.strong, c(U.f, U.pca))
U.c = cov_canonical(data.random)
Now we fit mash to the random tests using both data-driven and canonical covariances.
Now we can compute posterior summaries etc for any subset of tests
using the above mash fit. Here we do this for the strong
tests.
m2 = mash(data.strong, g=get_fitted_g(m), fixg=TRUE)
head(get_lfsr(m2))