SuSiE using Sufficient Statistics — susie

Performs SuSiE regression using sufficient statistics (XtX, Xty, yty, n) instead of individual-level data (X, y).

Usage

susie_ss(
  XtX,
  Xty,
  yty,
  n,
  L = min(10, ncol(XtX)),
  X_colmeans = NA,
  y_mean = NA,
  maf = NULL,
  maf_thresh = 0,
  check_input = FALSE,
  r_tol = 1e-08,
  standardize = TRUE,
  scaled_prior_variance = 0.2,
  residual_variance = NULL,
  prior_weights = NULL,
  null_weight = 0,
  model_init = NULL,
  estimate_residual_variance = TRUE,
  estimate_residual_method = c("MoM", "MLE", "Servin_Stephens"),
  residual_variance_lowerbound = 0,
  residual_variance_upperbound = Inf,
  estimate_prior_variance = TRUE,
  estimate_prior_method = c("optim", "EM", "simple"),
  unmappable_effects = c("none", "inf"),
  check_null_threshold = 0,
  prior_tol = 1e-09,
  max_iter = 100,
  tol = 0.001,
  convergence_method = c("elbo", "pip"),
  coverage = 0.95,
  min_abs_corr = 0.5,
  n_purity = 100,
  verbose = FALSE,
  track_fit = FALSE,
  check_prior = FALSE,
  refine = FALSE
)

Arguments

XtX: A p by p matrix, X'X, with columns of X centered to have mean zero.
Xty: A p-vector, X'y, with y and columns of X centered to have mean zero.
yty: A scalar, y'y, with y centered to have mean zero.
n: The sample size.
L: Maximum number of non-zero effects in the model. If L is larger than the number of covariates, p, L is set to p.
X_colmeans: A p-vector of column means of X. If both X_colmeans and y_mean are provided, the intercept is estimated; otherwise, the intercept is NA.
y_mean: A scalar containing the mean of y. If both X_colmeans and y_mean are provided, the intercept is estimated; otherwise, the intercept is NA.
maf: A p-vector of minor allele frequencies; to be used along with maf_thresh to filter input summary statistics.
maf_thresh: Variants with MAF smaller than this threshold are not used.
check_input: If check_input = TRUE, susie_ss performs additional checks on XtX and Xty. The checks are: (1) check that XtX is positive semidefinite; (2) check that Xty is in the space spanned by the non-zero eigenvectors of XtX.
r_tol: Tolerance level for eigenvalue check of positive semidefinite matrix XtX.
standardize: If standardize = TRUE, standardize the columns of X to unit variance prior to fitting (or equivalently standardize XtX and Xty to have the same effect). Note that scaled_prior_variance specifies the prior on the coefficients of X after standardization (if it is performed). If you do not standardize, you may need to think more carefully about specifying scaled_prior_variance. Whatever your choice, the coefficients returned by coef are given for X on the original input scale. Any column of X that has zero variance is not standardized.
scaled_prior_variance: The prior variance, divided by var(y) (or by (1/(n-1))yty for susie_ss); that is, the prior variance of each non-zero element of b is var(y) * scaled_prior_variance. The value provided should be either a scalar or a vector of length L. If estimate_prior_variance = TRUE, this provides initial estimates of the prior variances.
residual_variance: Variance of the residual. If estimate_residual_variance = TRUE, this value provides the initial estimate of the residual variance. By default, it is set to var(y) in susie and (1/(n-1))yty in susie_ss.
prior_weights: A vector of length p, in which each entry gives the prior probability that corresponding column of X has a nonzero effect on the outcome, y.
null_weight: Prior probability of no effect (a number between 0 and 1, and cannot be exactly 1).
model_init: A previous susie fit with which to initialize.
estimate_residual_variance: If estimate_residual_variance = TRUE, the residual variance is estimated, using residual_variance as an initial value. If estimate_residual_variance = FALSE, the residual variance is fixed to the value supplied by residual_variance.
estimate_residual_method: The method used for estimating residual variance. For the original SuSiE model, "MLE" and "MoM" estimation is equivalent, but for the infinitesimal model, "MoM" is more stable. We recommend using "Servin_Stephens" when n < 80 for improved coverage, although it is currently only implemented for individual-level data.
residual_variance_lowerbound: Lower limit on the estimated residual variance. It is only relevant when estimate_residual_variance = TRUE.
residual_variance_upperbound: Upper limit on the estimated residual variance. It is only relevant when estimate_residual_variance = TRUE.
estimate_prior_variance: If estimate_prior_variance = TRUE, the prior variance is estimated (this is a separate parameter for each of the L effects). If provided, scaled_prior_variance is then used as an initial value for the optimization. When estimate_prior_variance = FALSE, the prior variance for each of the L effects is determined by the value supplied to scaled_prior_variance.
estimate_prior_method: The method used for estimating prior variance. When estimate_prior_method = "simple" is used, the likelihood at the specified prior variance is compared to the likelihood at a variance of zero, and the setting with the larger likelihood is retained.
unmappable_effects: The method for modeling unmappable effects: "none", "inf", "ash".
check_null_threshold: When the prior variance is estimated, compare the estimate with the null, and set the prior variance to zero unless the log-likelihood using the estimate is larger by this threshold amount. For example, if you set check_null_threshold = 0.1, this will "nudge" the estimate towards zero when the difference in log-likelihoods is small. A note of caution that setting this to a value greater than zero may lead the IBSS fitting procedure to occasionally decrease the ELBO. This setting is disabled when using unmappable_effects = "inf" or unmappable_effects = "ash".
prior_tol: When the prior variance is estimated, compare the estimated value to prior_tol at the end of the computation, and exclude a single effect from PIP computation if the estimated prior variance is smaller than this tolerance value.
max_iter: Maximum number of IBSS iterations to perform.
tol: tol A small, non-negative number specifying the convergence tolerance for the IBSS fitting procedure.
convergence_method: When converge_method = "elbo" the fitting procedure halts when the difference in the variational lower bound, or “ELBO” (the objective function to be maximized), is less than tol. When converge_method = "pip" the fitting procedure halts when the maximum absolute difference in alpha is less than tol.
coverage: A number between 0 and 1 specifying the “coverage” of the estimated confidence sets.
min_abs_corr: Minimum absolute correlation allowed in a credible set. The default, 0.5, corresponds to a squared correlation of 0.25, which is a commonly used threshold for genotype data in genetic studies.
n_purity: Passed as argument n_purity to susie_get_cs.
verbose: If verbose = TRUE, the algorithm's progress, a summary of the optimization settings, and refinement progress (if refine = TRUE) are printed to the console.
track_fit: If track_fit = TRUE, trace is also returned containing detailed information about the estimates at each iteration of the IBSS fitting procedure.
check_prior: If check_prior = TRUE, it checks if the estimated prior variance becomes unreasonably large (comparing with 10 * max(abs(z))^2).
refine: If refine = TRUE, then an additional iterative refinement procedure is used, after the IBSS algorithm, to check and escape from local optima (see details).