Model fitting algorithms for Multiple Regression with Adaptive Shrinkage ("Mr.ASH"). Mr.ASH is a variational empirical Bayes (VEB) method for multiple linear regression. The fitting algorithms (locally) maximize the approximate marginal likelihood (the "evidence lower bound", or ELBO) via coordinate-wise updates.
Usage
mr.ash(
X,
y,
Z = NULL,
sa2 = NULL,
method_q = c("sigma_dep_q", "sigma_indep_q"),
method_g = c("caisa", "accelerate", "block"),
max.iter = 1000,
min.iter = 1,
beta.init = NULL,
update.pi = TRUE,
pi = NULL,
update.sigma2 = TRUE,
sigma2 = NULL,
update.order = NULL,
standardize = FALSE,
intercept = TRUE,
tol = set_default_tolerance(),
verbose = TRUE
)Arguments
- X
The input matrix, of dimension (n,p); each column is a single predictor; and each row is an observation vector. Here, n is the number of samples and p is the number of predictors. The matrix cannot be sparse.
- y
The observed continuously-valued responses, a vector of length p.
- Z
The covariate matrix, of dimension (n,k), where k is the number of covariates. This feature is not yet implemented;
Zmust be set toNULL.- sa2
The vector of prior mixture component variances. The variances should be in increasing order, starting at zero; that is,
sort(sa2)should be the same assa2. Whensa2isNULL, the default setting is used,sa2[k] = (2^(0.05*(k-1)) - 1)^2, fork = 1:20. For this default setting,sa2[1] = 0, andsa2[20]is roughly 1.- method_q
The algorithm used to update the variational approximation to the posterior distribution of the regression coefficients,
method = "sigma_dep_q",method = "sigma_indep_q"and"sigma_scaled_beta", take different approaches to updating the residual variance \(sigma^2\).- method_g
method = "caisa", an abbreviation of "Cooridinate Ascent Iterative Shinkage Algorithm", fits the model by approximate EM; it iteratively updates the variational approximation to the posterior distribution of the regression coefficients (the approximate E-step) and the model parameters (mixture weights and residual covariance) in an approximate M-step. Settingsmethod = "block"andmethod = "accelerate"are considered experimental.- max.iter
The maximum number of outer loop iterations allowed.
- min.iter
The minimum number of outer loop iterations allowed.
- beta.init
The initial estimate of the (approximate) posterior mean regression coefficients. This should be
NULL, or a vector of length p. Whenbeta.initisNULL, the posterior mean coefficients are all initially set to zero.- update.pi
If
update.pi = TRUE, the mixture proportions in the mixture-of-normals prior are estimated from the data. In the manuscript,update.pi = TRUE.- pi
The initial estimate of the mixture proportions \(\pi_1, \ldots, \pi_K\). If
piisNULL, the mixture weights are initialized torep(1/K,K)
, where
K = length(sa2).
- update.sigma2
If
update.sigma2 = TRUE, the residual variance \(sigma^2\) is estimated from the data. In the manuscript,update.sigma = TRUE.- sigma2
The initial estimate of the residual variance, \(\sigma^2\). If
sigma2 = NULL, the residual variance is initialized to the empirical variance of the residuals based on the initial estimates of the regression coefficients,beta.init, after removing linear effects of the intercept and any covariances.- update.order
The order in which the co-ordinate ascent updates for estimating the posterior mean coefficients are performed.
update.ordercan beNULL,"random", or any permutation of \((1,\ldots,p)\), wherepis the number of columns in the input matrixX. Whenupdate.orderisNULL, the co-ordinate ascent updates are performed in order in which they appear inX; this is equivalent to settingupdate.order = 1:p. Whenupdate.order = "random", the co-ordinate ascent updates are performed in a randomly generated order, and this random ordering is different at each outer-loop iteration.- standardize
The logical flag for standardization of the columns of X variable, prior to the model fitting. The coefficients are always returned on the original scale.
- intercept
When
intercept = TRUE, an intercept is included in the regression model.- tol
Additional settings controlling behaviour of the model fitting algorithm.
tol$convtolcontrols the termination criterion for the model fitting. The outer-loop updates stop when the relative L2 change in the estimates of the posterior mean coefficients is less thanconvtol, i.e.,||beta_new - beta_old||_2 / max(1, ||beta_old||_2) < convtol.tol$epstolis a small, positive number added to the likelihoods to avoid logarithms of zero.
Value
A list object with the following elements:
- intercept
The estimated intercept.
- beta
A vector containing posterior mean estimates of the regression coefficients for all predictors.
- sigma2
The estimated residual variance.
- pi
A vector of containing the estimated mixture proportions.
- iter
The number of outer-loop iterations that were performed.
- update.order
The ordering used for performing the coordinate-wise updates. For
update.order = "random", the orderings for outer-loop iterations are provided in a vector of lengthp*max.iter, wherepis the number of predictors.- varobj
A vector of length
numiter, containing the value of the variational objective (equal to the negative "evidence lower bound") attained at each (outer-loop) model fitting iteration. Note that the objective does not account for the intercept term, even whenintercept = TRUE; therefore, this value shoudl be interpreted as being an approximation to the marginal likelihood conditional on the estimate of the intercept.- data
The preprocessed data (X, Z, y) provided as input to the model fitting algorithm.
data$wis equal todiag(crossprod(X)), in whichXis the preprocessed data matrix. Additionally,data$sa2gives the prior variances used.
Details
Mr.ASH is a statistical inference method for the following multiple linear regression model: $$y | X, \beta, \sigma^2 ~ N(X \beta, \sigma I_n),$$ in which the regression coefficients \(\beta\) admit a mixture-of-normals prior, $$\beta | \pi, \sigma ~ g = \sum_{k=1}^K N(0, \sigma^2 \sigma_k^2).$$ Each mixture component in the prior, \(g\), is a normal density centered at zero, with variance \(\sigma^2 \sigma_k^2\).
The fitting algorithm, it run for a large enough number of
iterations, will find an approximate posterior for the regression
coefficients, denoted by \(q(\beta)\), residual variance
parameter \(sigma^2\), and prior mixture weights \(\pi_1,
\ldots, \pi_K\) maximizing the evidence lower bound, $$F(q, \pi,
\sigma^2) = E_q \log p(y | X, \beta, \sigma^2) - \sum_{j=1}^p
D_{KL}(q_j || g),$$ where \(D_{KL}(q_j || g)\) denotes the
Kullback-Leibler (KL) divergence, a measure of the "distance"
between (approximate) posterior \(q_j(\beta_j)\) and prior
\(g(\beta_j)\). The fitting algorithm iteratively updates the
approximate posteriors \(q_1, \ldots, q_p\), separately for each
\(j = 1, \ldots, p\) (in an order determined by
update.order), then separately updates the mixture weights
and \(\pi\) and residual variance \(\sigma^2\). This
coordinate-wise update scheme iterates until the convergence
criterion is met, or until the algorithm hits an upper bound on
the number of iterations (specified by max.iter). Coordinate-wise
optimization algorithms for model fitting are implemented in C++ for
efficient handling of large-scale data
See ‘References’ for more details about the model and algorithm.
References
Y. Kim (2020), Bayesian shrinkage methods for high dimensional regression. Ph.D. thesis, University of Chicago.
Examples
### generate synthetic data
set.seed(1)
n = 200
p = 300
X = matrix(rnorm(n*p),n,p)
beta = double(p)
beta[1:10] = 1:10
y = X %*% beta + rnorm(n)
### fit Mr.ASH
fit.mr.ash = mr.ash(X,y, method_q = "sigma_indep_q")
#> Mr.ASH terminated at iteration 24: max|beta|=1.0032e+01, sigma2=3.0606e+00, pi0=0.7758
fit.mr.ash = mr.ash(X,y, method_q = "sigma_scaled_beta")
#> Error in match.arg(method_q): 'arg' should be one of “sigma_dep_q”, “sigma_indep_q”
fit.mr.ash = mr.ash(X,y, method_q = "sigma_dep_q")
#> Mr.ASH terminated at iteration 25: max|beta|=1.0032e+01, sigma2=3.0601e+00, pi0=0.7703
### prediction routine
Xnew = matrix(rnorm(n*p),n,p)
ynew = Xnew %*% beta + rnorm(n)
ypred = predict(fit.mr.ash, Xnew)
### test error
rmse = norm(ynew - ypred, '2') / sqrt(n)
### coefficients
betahat = predict(fit.mr.ash, type = "coefficients")
# this equals c(fit.mr.ash$intercept, fit.mr.ash$beta)