Model fitting algorithms for Multiple Regression with Adaptive Shrinkage ("Mr.ASH"). Mr.ASH is a variational empirical Bayes (VEB) method for multiple linear regression. The fitting algorithms (locally) maximize the approximate marginal likelihood (the "evidence lower bound", or ELBO) via coordinate-wise updates.

mr_ash(
  X,
  y,
  sa2 = NULL,
  beta.init = NULL,
  pi = NULL,
  sigma2 = NULL,
  standardize = FALSE,
  intercept = TRUE,
  control = list(),
  verbose = c("progress", "detailed", "none")
)

mr_ash_control_default()

Arguments

X

The input matrix, of dimension (n,p); each column is a single predictor; and each row is an observation vector. Here, n is the number of samples and p is the number of predictors. The matrix cannot be sparse.

y

The observed continuously-valued responses, a vector of length p.

sa2

The vector of prior mixture component variances. The variances should be in increasing order, starting at zero; that is, sort(sa2) should be the same as sa2. When sa2 is NULL, the default setting is used, sa2[k] = (2^(0.05*(k-1)) - 1)^2, for k = 1:20. For this default setting, sa2[1] = 0, and sa2[20] is roughly 1.

beta.init

The initial estimate of the (approximate) posterior mean regression coefficients. This should be NULL, or a vector of length p. When beta.init is NULL, the posterior mean coefficients are all initially set to zero.

pi

The initial estimate of the mixture proportions \(\pi_1, \ldots, \pi_K\). If pi is NULL, the mixture weights are initialized to rep(1/K,K)

sigma2

The initial estimate of the residual variance, \(\sigma^2\). If sigma2 = NULL, the residual variance is initialized to the empirical variance of the residuals based on the initial estimates of the regression coefficients, beta.init, after removing linear effects of the intercept and any covariances.

standardize

The logical flag for standardization of the columns of X variable, prior to the model fitting. The coefficients are always returned on the original scale.

intercept

When intercept = TRUE, an intercept is included in the regression model.

control

A list of parameters controlling the behaviour of the optimization algorithm. See ‘Details’.

verbose

When verbose = "detailed", detailed information about the algorithm's progress is printed to the console at each iteration; when verbose = "progressbar", a plus (“+”) is printed for each outer-loop iteration; and when verbose = "none", no progress information is printed.

Value

A list object with the following elements:

intercept

The estimated intercept.

beta

Posterior mean estimates of the regression coefficients.

sigma2

The estimated residual variance.

pi

A vector of containing the estimated mixture proportions.

iter

The number of outer-loop iterations that were performed.

update.order

The ordering used for performing the coordinate-wise updates. For update.order = "random", the orderings for outer-loop iterations are provided in a vector of length p*max.iter, where p is the number of predictors.

varobj

A vector of length numiter, containing the value of the variational objective (equal to the negative "evidence lower bound") attained at each (outer-loop) model fitting iteration. Note that the objective does not account for the intercept term, even when intercept = TRUE; therefore, this value should be interpreted as being an approximation to the marginal likelihood conditional on the estimate of the intercept.

data

The preprocessed data (X, Z, y) provided as input to the model fitting algorithm. data$w is equal to diag(crossprod(X)), in which X is the preprocessed data matrix. Additionally, data$sa2 gives the prior variances used.

Details

Mr.ASH is a statistical inference method for the following multiple linear regression model: $$y | X, \beta, \sigma^2 ~ N(X \beta, \sigma I_n),$$ in which the regression coefficients \(\beta\) admit a mixture-of-normals prior, $$\beta | \pi, \sigma ~ g = \sum_{k=1}^K N(0, \sigma^2 \sigma_k^2).$$ Each mixture component in the prior, \(g\), is a normal density centered at zero, with variance \(\sigma^2 \sigma_k^2\).

The fitting algorithm, it run for a large enough number of iterations, will find an approximate posterior for the regression coefficients, denoted by \(q(\beta)\), residual variance parameter \(sigma^2\), and prior mixture weights \(\pi_1, \ldots, \pi_K\) maximizing the evidence lower bound, $$F(q, \pi, \sigma^2) = E_q \log p(y | X, \beta, \sigma^2) - \sum_{j=1}^p D_{KL}(q_j || g),$$ where \(D_{KL}(q_j || g)\) denotes the Kullback-Leibler (KL) divergence, a measure of the "distance" between (approximate) posterior \(q_j(\beta_j)\) and prior \(g(\beta_j)\). The fitting algorithm iteratively updates the approximate posteriors \(q_1, \ldots, q_p\), separately for each \(j = 1, \ldots, p\) (in an order determined by update.order), then separately updates the mixture weights and \(\pi\) and residual variance \(\sigma^2\). This coordinate-wise update scheme iterates until the convergence criterion is met, or until the algorithm hits an upper bound on the number of iterations (specified by max.iter).

See ‘References’ for more details about the model and algorithm.

The control argument is a list in which any of the following named components will override the default algorithm settings (as defined by mr_ash_control_default):

min.iter

The minimum number of outer loop iterations.

max.iter

The maximum number of outer loop iterations.

convtol

When update.pi = TRUE, the outer-loop updates terminate when the largest change in the mixture weights is less than convtol*K; when update.pi = FALSE, the outer-loop updates stop when the largest change in the estimates of the posterior mean coefficients is less than convtol*K.

update.pi

If update.pi = TRUE, the mixture proportions in the mixture-of-normals prior are estimated from the data.

update.sigma2

If update.sigma2 = TRUE, the residual variance parameter \(sigma^2\) is estimated from the data.

update.order

The order in which the co-ordinate ascent updates for estimating the posterior mean coefficients are performed. update.order can be NULL, "random", or any permutation of \((1,\ldots,p)\), where p is the number of columns in the input matrix X. When update.order is NULL, the co-ordinate ascent updates are performed in order in which they appear in X; this is equivalent to setting update.order = 1:p. When update.order = "random", the co-ordinate ascent updates are performed in a randomly generated order, and this random ordering is different at each outer-loop iteration.

epstol

A small, positive number added to the likelihoods to avoid logarithms of zero.

References

Y. Kim (2020), Bayesian shrinkage methods for high dimensional regression. Ph.D. thesis, University of Chicago.

See also

Examples

# Simulate a data set. set.seed(1) n <- 200 p <- 300 X <- matrix(rnorm(n*p),n,p) beta <- double(p) beta[1:10] <- 1:10 y <- drop(X %*% beta + rnorm(n)) ### fit Mr.ASH fit.mr.ash <- mr_ash(X,y)
#> Fitting mr.ash model (mr.ash 0.1-61). #> number of samples: 200 #> number of variables: 300 #> number of mixture components: 20 #> +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ #> Mr.ASH terminated at iteration 109.
### prediction routine Xnew = matrix(rnorm(n*p),n,p) ynew = Xnew %*% beta + rnorm(n) ypred = predict(fit.mr.ash, Xnew) ### test error rmse = norm(ynew - ypred, '2') / sqrt(n) ### coefficients betahat = predict(fit.mr.ash, type = "coefficients") # this equals c(fit.mr.ash$intercept, fit.mr.ash$beta)