Skip to contents

Performs SuSiE regression using z-scores and correlation matrix. Supports both standard RSS (lambda = 0) and RSS with regularized LD matrix (lambda > 0).

Usage

susie_rss(
  z = NULL,
  R = NULL,
  n = NULL,
  X = NULL,
  bhat = NULL,
  shat = NULL,
  var_y = NULL,
  L = min(10, if (!is.null(R)) ncol(R) else ncol(X)),
  lambda = 0,
  maf = NULL,
  maf_thresh = 0,
  z_ld_weight = 0,
  prior_variance = 50,
  scaled_prior_variance = 0.2,
  residual_variance = NULL,
  prior_weights = NULL,
  null_weight = 0,
  standardize = TRUE,
  intercept_value = 0,
  estimate_residual_variance = FALSE,
  estimate_residual_method = c("MoM", "MLE", "Servin_Stephens"),
  estimate_prior_variance = TRUE,
  estimate_prior_method = c("optim", "EM", "simple"),
  unmappable_effects = c("none", "inf", "ash"),
  check_null_threshold = 0,
  prior_tol = 1e-09,
  residual_variance_lowerbound = 0,
  residual_variance_upperbound = Inf,
  model_init = NULL,
  s_init = NULL,
  coverage = 0.95,
  min_abs_corr = 0.5,
  max_iter = 100,
  tol = 0.001,
  convergence_method = c("elbo", "pip"),
  verbose = FALSE,
  track_fit = FALSE,
  check_input = FALSE,
  check_prior = TRUE,
  check_R = TRUE,
  check_z = FALSE,
  n_purity = 100,
  r_tol = 1e-08,
  refine = FALSE,
  stochastic_ld_sample = NULL,
  alpha0 = 0.1,
  beta0 = 0.1
)

Arguments

z

A p-vector of z-scores.

R

A p by p correlation matrix. Exactly one of R or X must be provided.

n

The sample size, not required but recommended.

X

A factor matrix (B x p) such that R = crossprod(X) / nrow(X) approximates the LD (correlation) matrix. When nrow(X) >= ncol(X), the correlation matrix R is formed explicitly and the standard path is used. When nrow(X) < ncol(X), a low-rank path is used that avoids forming the p x p matrix, reducing per-iteration cost from O(Lp^2) to O(LBp). Columns of X are standardized internally. If z_ld_weight > 0 or var_y with shat are provided, the full correlation matrix is formed from X and the standard path is used.

bhat

Alternative summary data giving the estimated effects (a vector of length p). This, together with shat, may be provided instead of z.

shat

Alternative summary data giving the standard errors of the estimated effects (a vector of length p). This, together with bhat, may be provided instead of z.

var_y

The sample variance of y, defined as \(y'y/(n-1)\). When the sample variance is not provided, the coefficients (returned from coef) are computed on the “standardized” X, y scale.

L

Maximum number of non-zero effects in the model. If L is larger than the number of covariates, p, L is set to p.

lambda

Regularization parameter for LD matrix. When lambda > 0, you cannot use unmappable_effects methods.

maf

A p-vector of minor allele frequencies; to be used along with maf_thresh to filter input summary statistics.

maf_thresh

Variants with MAF smaller than this threshold are not used.

z_ld_weight

This parameter is included for backwards compatibility with previous versions of the function, but it is no longer recommended to set this to a non-zero value. When z_ld_weight > 0, the matrix R is adjusted to be cov2cor((1-w)*R + w*tcrossprod(z)), where w = z_ld_weight.

prior_variance

This specifies the prior variance parameter for the SuSiE-RSS variant with the “regularized” LD matrix R. This is ignored when lambda = 0.

scaled_prior_variance

The prior variance, divided by var(y) (or by (1/(n-1))yty for susie_ss); that is, the prior variance of each non-zero element of b is var(y) * scaled_prior_variance. The value provided should be either a scalar or a vector of length L. If estimate_prior_variance = TRUE, this provides initial estimates of the prior variances.

residual_variance

Variance of the residual. If estimate_residual_variance = TRUE, this value provides the initial estimate of the residual variance. By default, it is set to var(y) in susie and (1/(n-1))yty in susie_ss.

prior_weights

A vector of length p, in which each entry gives the prior probability that corresponding column of X has a nonzero effect on the outcome, y.

null_weight

Prior probability of no effect (a number between 0 and 1, and cannot be exactly 1).

standardize

If standardize = TRUE, standardize the columns of X to unit variance prior to fitting (or equivalently standardize XtX and Xty to have the same effect). Note that scaled_prior_variance specifies the prior on the coefficients of X after standardization (if it is performed). If you do not standardize, you may need to think more carefully about specifying scaled_prior_variance. Whatever your choice, the coefficients returned by coef are given for X on the original input scale. Any column of X that has zero variance is not standardized.

intercept_value

Real number specifying the intercept. This is ignored when lambda = 0.

estimate_residual_variance

The default is FALSE, the residual variance is fixed to 1 or variance of y. If the in-sample LD matrix is provided, we recommend setting estimate_residual_variance = TRUE.

estimate_residual_method

The method used for estimating residual variance. For the original SuSiE model, "MLE" and "MoM" estimation is equivalent, but for the infinitesimal model, "MoM" is more stable. We recommend using "Servin_Stephens" when n < 80 for improved coverage, although it is currently only implemented for individual-level data.

estimate_prior_variance

If estimate_prior_variance = TRUE, the prior variance is estimated (this is a separate parameter for each of the L effects). If provided, scaled_prior_variance is then used as an initial value for the optimization. When estimate_prior_variance = FALSE, the prior variance for each of the L effects is determined by the value supplied to scaled_prior_variance.

estimate_prior_method

The method used for estimating prior variance. When estimate_prior_method = "simple" is used, the likelihood at the specified prior variance is compared to the likelihood at a variance of zero, and the setting with the larger likelihood is retained.

unmappable_effects

The method for modeling unmappable effects: "none", "inf", "ash".

check_null_threshold

When the prior variance is estimated, compare the estimate with the null, and set the prior variance to zero unless the log-likelihood using the estimate is larger by this threshold amount. For example, if you set check_null_threshold = 0.1, this will "nudge" the estimate towards zero when the difference in log-likelihoods is small. A note of caution that setting this to a value greater than zero may lead the IBSS fitting procedure to occasionally decrease the ELBO. This setting is disabled when using unmappable_effects = "inf" or unmappable_effects = "ash".

prior_tol

When the prior variance is estimated, compare the estimated value to prior_tol at the end of the computation, and exclude a single effect from PIP computation if the estimated prior variance is smaller than this tolerance value.

residual_variance_lowerbound

Lower limit on the estimated residual variance. It is only relevant when estimate_residual_variance = TRUE.

residual_variance_upperbound

Upper limit on the estimated residual variance. It is only relevant when estimate_residual_variance = TRUE.

model_init

A previous susie fit with which to initialize.

coverage

A number between 0 and 1 specifying the “coverage” of the estimated confidence sets.

min_abs_corr

Minimum absolute correlation allowed in a credible set. The default, 0.5, corresponds to a squared correlation of 0.25, which is a commonly used threshold for genotype data in genetic studies.

max_iter

Maximum number of IBSS iterations to perform.

tol

tol A small, non-negative number specifying the convergence tolerance for the IBSS fitting procedure.

convergence_method

When converge_method = "elbo" the fitting procedure halts when the difference in the variational lower bound, or “ELBO” (the objective function to be maximized), is less than tol. When converge_method = "pip" the fitting procedure halts when the maximum absolute difference in alpha is less than tol.

verbose

If verbose = TRUE, the algorithm's progress, a summary of the optimization settings, and refinement progress (if refine = TRUE) are printed to the console.

track_fit

If track_fit = TRUE, trace is also returned containing detailed information about the estimates at each iteration of the IBSS fitting procedure.

check_input

If check_input = TRUE, susie_ss performs additional checks on XtX and Xty. The checks are: (1) check that XtX is positive semidefinite; (2) check that Xty is in the space spanned by the non-zero eigenvectors of XtX.

check_prior

If check_prior = TRUE, it checks if the estimated prior variance becomes unreasonably large (comparing with 10 * max(abs(z))^2).

check_R

If TRUE, check that R is positive semidefinite.

check_z

If TRUE, check that z lies in column space of R.

n_purity

Passed as argument n_purity to susie_get_cs.

r_tol

Tolerance level for eigenvalue check of positive semidefinite matrix XtX.

refine

If refine = TRUE, then an additional iterative refinement procedure is used, after the IBSS algorithm, to check and escape from local optima (see details).

stochastic_ld_sample

When the LD matrix R is estimated from a stochastic sketch (or an internal subsample) with B random projections, set this to the value of B. This dynamically inflates the null variance of each variable's score statistic at every IBSS iteration to account for LD estimation uncertainty in the Single Effect Regression (SER), attenuating Bayes factors and posterior means for variables where LD uncertainty is large relative to active signals. Must be at least 1000. Not supported with lambda != 0. When provided, the output includes a stochastic_ld_diagnostics element with per-region and per-variable quality metrics.

alpha0

Numerical parameter for the NIG prior when using estimate_residual_method = "Servin_Stephens".

beta0

Numerical parameter for the NIG prior when using estimate_residual_method = "Servin_Stephens".

Value

In addition to the standard "susie" output (see susie), the returned object may contain:

stochastic_ld_diagnostics

A list of diagnostics for the stochastic LD correction (only present when stochastic_ld_sample is provided), containing: B (the sketch dimension); p (number of variables); effective_rank (debiased \(\tilde{r} = p^2 / \|R\|_F^2\)); r_over_B (\(\tilde{r}/B\), one number per region; values \(\le 0.2\) indicate the sketch is adequate); Rhat_diag_deviation (\(|\hat{R}_{jj} - 1|\), one number per variable; flags variables with poor random projections); per_variable_penalty (final-iteration \(v_j / \sigma^2 = \tau_j^2 / \sigma^2 - 1\), one number per variable; values \(\le 0.2\) indicate minimal power loss, values \(\gg 1\) flag variables where the correction is doing heavy lifting).