Post-hoc causal-configuration probabilities for one or more SuSiE-class fits — susie_post_outcome

Runs one of two complementary post-hoc analyses, selected by method: "susiex" (default) for the SuSiEx $2^N$ combinatorial enumeration, reporting the posterior probability of every binary causality pattern across the $N$ input traits; or "coloc_pairwise" for the coloc pairwise ABF, reporting the five colocalisation hypothesis posteriors (H0/H1/H2/H3/H4) for every pair of traits. To get both, call the function twice and combine.

Usage

susie_post_outcome_configuration(
  input,
  by = c("fit", "outcome"),
  method = c("susiex", "coloc_pairwise"),
  prob_thresh = 0.8,
  cs_only = TRUE,
  p1 = 1e-04,
  p2 = 1e-04,
  p12 = 5e-06,
  ...
)

Arguments

input: A single fit of class susie, mvsusie, or mfsusie, OR a list of such fits.
by: Either "fit" (one trait per input fit; default) or "outcome" (multi-output fits expand into per-outcome traits).
method: Character scalar; one of "susiex" (default) or "coloc_pairwise". Pick the analysis to run; for both, call the function twice.
prob_thresh: Per-trait marginal threshold for the convenience $active flags in the SuSiEx output. Default 0.8.
cs_only: Logical. If TRUE (default) only enumerate over CSs present in each fit's $sets$cs; if FALSE loop over all L rows of $alpha. Either way, effects whose entire alpha row is zero are skipped. When TRUE, every fit must carry a non-null $sets$cs or the function errors.
p1, p2, p12: Coloc per-SNP causal priors: p1 for trait 1 alone, p2 for trait 2 alone, p12 for shared causal. Defaults match coloc::coloc.bf_bf: p1 = p2 = 1e-4, p12 = 5e-6. Only used when "coloc_pairwise" is in methods.
...: Currently ignored.

Value

A list of class "susie_post_outcome_configuration" with exactly one of the following components, depending on method:

$susiex: (when method = "susiex") A list of length equal to the number of CS tuples considered. Each element has components cs_indices (length-N integer tuple), logBF_trait (length N), configs ($2^N \times N$ binary matrix), config_prob (length $2^N$), marginal_prob (length-N per-trait marginal posterior probability of being active across the configuration ensemble), and active (logical, marginal_prob >= prob_thresh).
$coloc_pairwise: (when method = "coloc_pairwise") A data.frame with one row per (trait1, trait2, l1, l2) combination, columns trait1, trait2, l1, l2, hit1, hit2, PP.H0, PP.H1, PP.H2, PP.H3, PP.H4.

Pretty-print with summary(out).

Details

Two grouping modes are supported through the by argument:

"fit": Each input fit contributes a single trait view. Multi-output fits (mvsusie, mfsusie) are kept whole: the trait's per-(CS, SNP) log Bayes factors are the joint composite stored on the fit as lbf_variable. Configuration enumeration loops over the cross-product $L_1 \times \dots \times L_N$ of CS indices.
"outcome": Multi-output fits fan out into per-outcome views, each with its own per-(CS, SNP) log Bayes factors read from fit$lbf_variable_outcome (an $L \times J \times R$ or $L \times J \times M$ array). All per-outcome views share the joint fit's PIP matrix and CS list, so the configuration enumeration reduces to a single index $l \in 1..L$. Single-output susie fits pass through unchanged. Requires $lbf_variable_outcome on the fit (set attach_lbf_variable_outcome = TRUE when fitting).

SuSiEx algorithm

For each credible-set tuple $(l_1, \dots, l_N)$:

Per-trait CS-level log BF (alpha-weighted SNP average): $$\log\mathrm{BF}^{(n)}_{l_n} = \sum_j \alpha_{n,l_n,j}\, \log\mathrm{BF}_{n,l_n,j}.$$
Enumerate the $2^N$ binary configurations $c \in \{0,1\}^N$.
Configuration log BF: $$\log\mathrm{BF}^{(c)} = \sum_{n: c_n = 1} \log\mathrm{BF}^{(n)}_{l_n}.$$
Normalise under a uniform prior over the $2^N$ configurations.
Per-trait marginal: $P(\mathrm{trait}\,n\,\mathrm{causal}) = \sum_{c: c_n = 1} P(c \mid \mathrm{tuple})$.

Coloc pairwise algorithm

For each unordered trait pair $(n, n')$ and each CS pair $(l_n, l_{n'})$, with per-SNP log BFs $\ell_1 = \log\mathrm{BF}_{n,l_n,\cdot}$ and $\ell_2 = \log\mathrm{BF}_{n',l_{n'},\cdot}$ (length $J$), the five hypothesis log-BFs are $$\log\mathrm{BF}_{H_0} = 0,\quad \log\mathrm{BF}_{H_1} = \log p_1 + \mathrm{LSE}(\ell_1),\quad \log\mathrm{BF}_{H_2} = \log p_2 + \mathrm{LSE}(\ell_2),$$ $$\log\mathrm{BF}_{H_3} = \log p_1 + \log p_2 + \mathrm{logdiff}(\mathrm{LSE}(\ell_1) + \mathrm{LSE}(\ell_2),\; \mathrm{LSE}(\ell_1 + \ell_2)),$$ $$\log\mathrm{BF}_{H_4} = \log p_{12} + \mathrm{LSE}(\ell_1 + \ell_2),$$ and the corresponding posteriors are $\mathrm{PP.H}_h = \exp(\log\mathrm{BF}_{H_h} - \mathrm{LSE}(\log\mathrm{BF}_{H_0:H_4}))$, where $\mathrm{LSE}$ is the log-sum-exp.

H0: no causal variant in either CS.
H1: causal in trait $n$ only.
H2: causal in trait $n'$ only.
H3: distinct causals in the two traits.
H4: a single shared causal variant.

References

SuSiEx, Nature Genetics 2024 (combinatorial $2^N$ enumeration). Wallace, PLoS Genetics 2020 (coloc pairwise H0/H1/H2/H3/H4 ABF).