The aim of this vignette is to introduce the basic steps of a mr.mash analysis through a toy example. To learn more about mr.mash, please see the paper.

First, we set the seed to make the results more easily reproducible, and we load the “mr.mash.alpha” package.

We illustrate the application of mr.mash to a data set simulated from a multivariate, multiple linear regression with 5 responses in which the coefficients are the same for all responses. In the target application considered in the paper—prediction of multi-tissue gene expression from genotypes—this would correspond to the situation in which we would like to predict expression of a single gene in 5 different tissues from genotype data at multiple SNPs, and the SNPs have the same effects on gene expression in all 5 tissues. (In multi-tissue gene expression we would normally like to predict expression of many genes, but to simplify this vignette here we illustrate the key ideas with a single gene.)

Although this simulation is not particularly realistic, this is meant to illustrate the benefits of mr.mash: by modeling the sharing of effects across tissues, mr.mash is able to more accurately estimate the effects in multiple tissues, and therefore is able to obtain better predictions.

We start by simulating 150 samples from a multivariate, multiple linear regression model in which 5 out of the 800 variables (SNPs) affect the 5 responses (expression levels).

dat <- simulate_mr_mash_data(n = 150,p = 800,p_causal = 5,r = 5,pve = 0.5,
                             V_cor = 0.25)

Next we split the samples into a training set (with 100 samples) and test set (with 50 samples).

ntest  <- 50 
Ytrain <- dat$Y[-(1:ntest),]
Xtrain <- dat$X[-(1:ntest),]
Ytest  <- dat$Y[1:ntest,]
Xtest  <- dat$X[1:ntest,]

Define the mr.mash prior

To run mr.mash, we need to first specify the covariances in the mixture of normals prior. The idea is that the chosen collection of covariance matrices should include a variety of potential effect sharing patterns, and in the model fitting stage the prior should then assign most weight to the sharing patterns that are present in the data, and little or no weight on patterns that are inconsistent with the data. In general, we recommend learning “data-driven” covariance matrices. But here, for simplicity, we instead use “canonical” covariances which are not adaptive, but nonetheless well suited for this toy example since the true effects are the same across responses/tissues.

S0 <- compute_canonical_covs(r = 5,singletons = TRUE,
                             hetgrid = seq(0,1,0.25))

This gives a mixture of 10 covariance matrices capturing a variety of “canonical” effect-sharing patterns:

names(S0)
#  [1] "singleton1"  "singleton2"  "singleton3"  "singleton4"  "singleton5" 
#  [6] "independent" "shared0.25"  "shared0.5"   "shared0.75"  "shared1"

To illustrate the benefits of modeling a variety of effect-sharing patterns, we also try out mr.mash with a simpler mixture of covariance matrices in which the effects are effectively independent across tissues. Although this may seem to be a very poor choice of prior, particularly for this example, it turns out that several multivariate regression methods assume, implicitly or explicitly, this prior.

S0_ind <- compute_canonical_covs(r = 5,singletons = FALSE,
                                 hetgrid = c(0,0.001,0.01))
names(S0_ind)
# [1] "independent" "shared0.001" "shared0.01"

Regardless of the covariance matrices are chosen, it is recommended to also consider a variety of effect scales in the prior. This is normally achieved in mr.mash by expanding the mixture across a specifed grid of scaling factors. Here we choose this grid in an adaptive fashion based on the data:

univ_sumstats <- compute_univariate_sumstats(Xtrain,Ytrain,standardize = TRUE)
scaling_grid <- autoselect.mixsd(univ_sumstats,mult = sqrt(2))^2
S0 <- expand_covs(S0,scaling_grid)
S0_ind <- expand_covs(S0_ind,scaling_grid)

Fit a mr.mash model to the data

Having specified the mr.mash prior, we are now ready to fit a mr.mash model to the training data (this may take a few minutes to run). Given that the majority of the SNPs have no effect, we initialize the mixture weights with 99% of the weight on the null component and the rest of the weight split equally across the remaining components.

null_weight <- 0.99
non_null_weight <- 1-null_weight
w0_init <- c(null_weight, rep(non_null_weight/(length(S0)-1), (length(S0)-1)))
fit <- mr.mash(X=Xtrain,Y=Ytrain,S0=S0, w0=w0_init, update_V = TRUE)

And for comparison we fit a second mr.mash model using the simpler and less flexible prior:

w0_init <- c(null_weight, rep(non_null_weight/(length(S0_ind)-1), (length(S0_ind)-1)))
fit_ind <- mr.mash(X=Xtrain,Y=Ytrain,S0=S0_ind,w0=w0_init,update_V = TRUE)

(Notice that the less complex model also takes less time to fit.)

For prediction, the key output is the posterior mean estimates of the regression coefficients, stored in the “mu1” output. Let’s compare the estimates to the ground truth:

par(mfrow = c(1,2))
plot(dat$B,fit_ind$mu1,pch = 20,xlab = "true",ylab = "estimated",
     main = sprintf("cor = %0.3f",
                    cor(as.vector(dat$B),as.vector(fit_ind$mu1))))
abline(a = 0,b = 1,col = "royalblue",lty = "dotted")
plot(dat$B,fit$mu1,pch = 20,xlab = "true",ylab = "estimated",
     main = sprintf("cor = %0.3f",
                    cor(as.vector(dat$B),as.vector(fit$mu1))))
abline(a = 0,b = 1,col = "royalblue",lty = "dotted")

As expected, the coefficients on the left-hand side obtained using an “independent effects” prior are not as accurate as the the coefficients estimated using the more flexible prior (right-hand side).

While perhaps not of primary interest, for diagnostic purposes it is often helpful to examine the estimated mixture weights in the prior as well as the estimated residual covariance matrix.

Inspecting the top prior mixture weights from the better model, it is helpful to see that the “null” and “shared1” components are among the top components by weight. (The top component is the null component because most of the SNPs have no effect on gene expression.)

head(sort(fit$w0,decreasing = TRUE),n = 10)
#              null    shared1_grid12 singleton1_grid10  singleton5_grid8 
#      9.822578e-01      5.431954e-03      4.477378e-03      3.180946e-03 
#  singleton5_grid7  singleton5_grid6 singleton1_grid11  singleton5_grid5 
#      1.531493e-03      3.655315e-04      1.818372e-04      1.496145e-04 
#  singleton5_grid4  singleton5_grid3 
#      9.529390e-05      7.656325e-05

Also, reassuringly, the estimated residual variance-covariance matrix is close to the matrix used to simulate the data:

dat$V
#          [,1]     [,2]     [,3]     [,4]     [,5]
# [1,] 6.622079 1.655520 1.655520 1.655520 1.655520
# [2,] 1.655520 6.622079 1.655520 1.655520 1.655520
# [3,] 1.655520 1.655520 6.622079 1.655520 1.655520
# [4,] 1.655520 1.655520 1.655520 6.622079 1.655520
# [5,] 1.655520 1.655520 1.655520 1.655520 6.622079
fit$V
#           [,1]     [,2]      [,3]      [,4]      [,5]
# [1,] 5.7356423 1.585484 1.3494127 0.8463714 0.9642602
# [2,] 1.5854844 6.578508 2.4138518 1.5273860 1.6646585
# [3,] 1.3494127 2.413852 6.0258742 0.4755238 1.5601590
# [4,] 0.8463714 1.527386 0.4755238 4.7464138 1.3982834
# [5,] 0.9642602 1.664659 1.5601590 1.3982834 5.0244255

Use the fitted mr.mash model to make predictions

We can use the fitted mr.mash model to predict gene expression from a genotype sample, including a sample not included in the training set. This is implemented by the “predict” method. Let’s compare the predictions from the two mr.mash models:

par(mfrow = c(1,2))
Ypred <- predict(fit,Xtest)
Ypred_ind <- predict(fit_ind,Xtest)
plot(Ytest,Ypred_ind,pch = 20,col = "darkblue",xlab = "true",
     ylab = "predicted",
     main = sprintf("cor = %0.3f",cor(as.vector(Ytest),as.vector(Ypred_ind))))
abline(a = 0,b = 1,col = "magenta",lty = "dotted")
plot(Ytest,Ypred,pch = 20,col = "darkblue",xlab = "true",
     ylab = "predicted",
     main = sprintf("cor = %0.3f",cor(as.vector(Ytest),as.vector(Ypred))))
abline(a = 0,b = 1,col = "magenta",lty = "dotted")

Indeed, mr.mash with the more flexible prior (right-hand plot) produces more accurate predictions than mr.mash with the “independent effects” prior.