SuSiE manuscript

Numerical Comparisons

This page contains information to reproduce the numerical comparisons in SuSiE manuscript.

Note that the DSC benchmark has to be completed at this stage. Please follow these instructions to run the DSC benchmark for SuSiE fine-mapping applications.

In [1]:
%revisions -s --tags --no-walk
Revision Author Date Message
afc0677 Gao Wang 2018-10-10 SuSiE v0.4 results
25944c8 Gao Wang 2018-08-21 SuSiE v0.3 results

The work directory on my computer is ~/GIT/github/mvarbvs/dsc/. I will navigate to that directory to perform analysis and show the results.

In [1]:
%cd ~/GIT/github/mvarbvs/dsc
/project/mstephens/SuSiE/mvarbvs/dsc

PIP calibration

Command

sos run analysis/20180527_PIP_Workflow.ipynb cali_pip

Result

In [3]:
%preview susie_comparison/PIP_comparison_1008_calibrated_prior_0p1_null_0p0.png
> susie_comparison/PIP_comparison_1008_calibrated_prior_0p1_null_0p0.png (39.8 KiB):

With null weight penalty,

In [4]:
%preview susie_comparison/PIP_comparison_1008_calibrated_prior_0p1_null_0p9.png
> susie_comparison/PIP_comparison_1008_calibrated_prior_0p1_null_0p9.png (40.0 KiB):

Using estimated prior,

In [5]:
%preview susie_comparison/PIP_comparison_1008_calibrated_prior_0p0_null_0p0.png
> susie_comparison/PIP_comparison_1008_calibrated_prior_0p0_null_0p0.png (40.0 KiB):

Variable level power comparison

Command

sos run analysis/20180527_PIP_Workflow.ipynb roc --null-weight 0.0 0.9 --susie-prior 0.1
sos run analysis/20180527_PIP_Workflow.ipynb roc --null-weight 0.0 0.9 --susie-prior 0.0

Result

In [6]:
%preview susie_comparison/ROC_1008_prior_0p1.png --width 90%
> susie_comparison/ROC_1008_prior_0p1.png (42.0 KiB):

Using estimated prior,

In [7]:
%preview susie_comparison/ROC_1008_prior_0p0.png --width 90%
> susie_comparison/ROC_1008_prior_0p0.png (42.8 KiB):

"Purity" distribution

Command

sos run analysis/20180620_Purity_Plot_Lite.ipynb purity
sos run analysis/20180620_Purity_Plot_Lite.ipynb purity \
    --outdir hard_case --name 1008_L10 --source full_data --dataset lm_less03 \
    --susie fit_susie10 --L 10 --pve 0.3 --maxL 15

Result

In [8]:
%preview susie_comparison/hist_1008.png --width 90%
> susie_comparison/hist_1008.png (11.2 KiB):
In [9]:
%preview hard_case/hist_1008_L10.png --width 18%
> hard_case/hist_1008_L10.png (4.5 KiB):

Purity vs size illustration

In [10]:
%preview susie_comparison/purity_1007/37.png --width 80%
> susie_comparison/purity_1007/37.png (338.9 KiB):

Power comparisons of confidence sets

For "simple case" simulation, number of causal is 1~5 out of 1000 non-causal variants; for the "hard case" number of causal is 10 out of 3000~12000 non-causal variants.

Command

sos run analysis/20180527_PIP_Workflow.ipynb power
sos run analysis/20180615_Power_DAP.ipynb power

Result

for i in 0.0 0.1; do
    for j in 0.0 0.5 0.9 0.95; do
        sos run ~/GIT/github/susie-paper/manuscript_results/numerical_results.ipynb cs_eval \
        --null-weight $j --susie-prior $i -s force
    done
done
In [11]:
readRDS('susie_comparison/Power_comparison_1008_cluster_prob_prior_0p1_null_0p0.rds')
n_signalexpected_discoveriessusie_discoveriesdap_discoveriessusie_powerdap_powersusie_coveragedap_coveragesusie_avg_sizedap_avg_sizesusie_median_sizedap_median_sizesusie_avg_lddap_avg_ld
1 300 302 270 0.98666670.88666670.98013250.985185211.44257 6.0563913 3 0.99372290.9882739
2 600 423 391 0.67166670.60166670.95035460.918158618.62189 8.1058504 5 0.98630830.9712244
3 900 493 468 0.52222220.48666670.92900610.912393218.05022 10.4355976 7 0.98079930.9607073
4 1200 568 530 0.45166670.40333330.92077460.894339617.86424 11.1497896 8 0.97867820.9485908
5 1500 569 533 0.36533330.32333330.90333920.872420322.06809 12.3247317 9 0.97488690.9503371

CS overlapping status:

In [12]:
readRDS('susie_comparison/Power_comparison_1008_cluster_prob_prior_0p1_null_0p0.overlap_status.rds')
$`4` =
  1. 623
  2. 640
  3. 623
In [13]:
readRDS('hard_case/Power_comparison_1008_cluster_prob_prior_0p1_null_0p0.rds')
n_signalexpected_discoveriessusie_discoveriesdap_discoveriessusie_powerdap_powersusie_coveragedap_coveragesusie_avg_sizedap_avg_sizesusie_median_sizedap_median_sizesusie_avg_lddap_avg_ld
1010 3000 840 801 0.26333330.23233330.91666670.861423227.38052 12.5913 9 10 0.97280160.9451048

CS overlapping status (there are none):

In [14]:
readRDS('hard_case/Power_comparison_1008_cluster_prob_prior_0p1_null_0p0.overlap_status.rds')

Out of 1500 simulations under different scenarios, only one instance we see CS overlaps, that the two CS in question have size 623 and 640 respectively (with purity > 0.5), and the CS with 623 variables is subset of that with 640. This is an edge case. Overall, overlapping CS is not a concern here.

In [15]:
[global]
def fmtP(x):
    return str(x).replace(".", "p")
date = '1008'
parameter: susie_prior = 0.0
parameter: null_weight = 0.0
parameter: cwd = path('./')

[cs_eval_1]
depends: R_library('ggplot2'), R_library('cowplot')
input: f'{cwd:a}/susie_comparison/Power_comparison_{date}_cluster_prob_prior_{fmtP(susie_prior)}_null_{fmtP(null_weight)}.rds', f'{cwd:a}/hard_case/Power_comparison_{date}_cluster_prob_prior_{fmtP(susie_prior)}_null_{fmtP(null_weight)}.rds'
output: f'{cwd:a}/susie_comparison/cs_comparison_prior_{fmtP(susie_prior)}_null_{fmtP(null_weight)}.pdf'
R: expand = "${ }", workdir = cwd, stderr = False, stdout = False
    library(ggplot2)
    library(cowplot)
    # Multiple plot function
    #
    # ggplot objects can be passed in ..., or to plotlist (as a list of ggplot objects)
    # - cols:   Number of columns in layout
    # - layout: A matrix specifying the layout. If present, 'cols' is ignored.
    #
    # If the layout is something like matrix(c(1,2,3,3), nrow=2, byrow=TRUE),
    # then plot 1 will go in the upper left, 2 will go in the upper right, and
    # 3 will go all the way across the bottom.
    #
    multiplot <- function(..., plotlist=NULL, file, cols=1, layout=NULL) {
      library(grid)

      # Make a list from the ... arguments and plotlist
      plots <- c(list(...), plotlist)

      numPlots = length(plots)

      # If layout is NULL, then use 'cols' to determine layout
      if (is.null(layout)) {
        # Make the panel
        # ncol: Number of columns of plots
        # nrow: Number of rows needed, calculated from # of cols
        layout <- matrix(seq(1, cols * ceiling(numPlots/cols)),
                        ncol = cols, nrow = ceiling(numPlots/cols), byrow=TRUE)
      }

     if (numPlots==1) {
        print(plots[[1]])

      } else {
        # Set up the page
        grid.newpage()
        pushViewport(viewport(layout = grid.layout(nrow(layout), ncol(layout))))

        # Make each plot, in the correct location
        for (i in 1:numPlots) {
          # Get the i,j matrix positions of the regions that contain this subplot
          matchidx <- as.data.frame(which(layout == i, arr.ind = TRUE))

          print(plots[[i]], vp = viewport(layout.pos.row = matchidx$row,
                                          layout.pos.col = matchidx$col))
        }
      }
    }
    d1 = readRDS(${_input[0]:r})
    d2 = readRDS(${_input[1]:r})
    dat = rbind(d1,d2)
    #print(head(dat))
    dat$susie_power_se = sqrt(dat$susie_power * (1-dat$susie_power) / dat$expected_discoveries)
    dat$dap_power_se = sqrt(dat$dap_power * (1-dat$dap_power) / dat$expected_discoveries)
    dat$susie_coverage_se = sqrt(dat$susie_coverage * (1-dat$susie_coverage) / dat$susie_discoveries)
    dat$dap_coverage_se = sqrt(dat$dap_coverage * (1-dat$dap_coverage) / dat$dap_discoveries)
    susie = cbind(dat[,c("n_signal", "expected_discoveries", "susie_power", "susie_coverage", "susie_power_se", "susie_coverage_se", "susie_median_size", "susie_avg_ld")], "SuSiE")
    colnames(susie) = c("n_signal", "expected_discoveries", "power", "coverage", "power_se", "coverage_se", "median_size", "avg_ld", "Method")
    dap = cbind(dat[,c("n_signal", "expected_discoveries", "dap_power", "dap_coverage", "dap_power_se", "dap_coverage_se", "dap_median_size", "dap_avg_ld")], "DAP-G")
    colnames(dap) = c("n_signal", "expected_discoveries", "power", "coverage", "power_se", "coverage_se", "median_size", "avg_ld", "Method")
    dat = rbind(susie, dap)
    dat$n_signal = as.factor(dat$n_signal)
    plot_panel = function(dat, quantity, legend = TRUE) {
        p = ggplot(dat, aes_string(x="n_signal", y=quantity[1], fill="Method")) + 
            scale_color_manual("Method", values = c("DAP-G" = "#348ABD", "SuSiE" = "#A60628")) + 
            geom_point(aes(colour=Method), position=position_dodge(.25), size=2.5)
        if (quantity[1] == 'power') p = p + geom_errorbar(aes(ymin=power-power_se, ymax=power+power_se, color=Method), width=.2, position=position_dodge(.25))
        if (quantity[1] == 'coverage') p = p + geom_errorbar(aes(ymin=coverage-coverage_se, ymax=coverage+coverage_se, color=Method), width=.2, position=position_dodge(.25)) + geom_hline(yintercept = 0.95, colour = 'gray') 
        p = p + labs(x = "number of effect variables", y = "") + theme_cowplot() + background_grid(major = "x", minor = "none") + ggtitle(quantity[2])
        if (!legend) p = p + theme(legend.position="none")
        return(p)
    }
    p1 = plot_panel(dat, c('coverage', 'A. coverage'), legend = FALSE)
    p2 = plot_panel(dat, c('power', 'B. power'), legend = FALSE)
    p3 = plot_panel(dat, c('median_size', 'C. median number of variables'), legend = FALSE)
    p4 = plot_panel(dat, c('avg_ld', 'D. average r2'), legend = FALSE)
    pdf('${_output:n}.pdf', width=12, height=3)
    print(multiplot(p1, p2, p3, p4, cols=4))
    dev.off()
In [16]:
%preview susie_comparison/cs_comparison_prior_0p1_null_0p0.pdf -s png --dpi 120
> susie_comparison/cs_comparison_prior_0p1_null_0p0.pdf (6.9 KiB):

Use estimated prior,

In [17]:
%preview susie_comparison/cs_comparison_prior_0p0_null_0p0.pdf -s png --dpi 120
> susie_comparison/cs_comparison_prior_0p0_null_0p0.pdf (6.8 KiB):

PIP direct comparison

Command

sos run analysis/20180527_PIP_Workflow.ipynb pip

Result

SuSiE vs DAP-G

In [18]:
%preview susie_comparison/PIP_comparison_1008_prior_0p1_null_0p0.susie_vs_dap.png --width 90%
> susie_comparison/PIP_comparison_1008_prior_0p1_null_0p0.susie_vs_dap.png (169.3 KiB):
In [19]:
%preview susie_comparison/PIP_comparison_1008_prior_0p1_null_0p0.susie_vs_dap.md
> susie_comparison/PIP_comparison_1008_prior_0p1_null_0p0.susie_vs_dap.md (491 B):

susie vs dap 1 causal

  • correlation 0.99
  • 10/150215 (6.66e-03%) differ by 0.1
  • 6/150215 (3.99e-03%) differ by 0.15
  • 5/150215 (3.33e-03%) differ by 0.2

susie vs dap 2 causal

  • correlation 0.96
  • 64/150215 (4.26e-02%) differ by 0.1
  • 42/150215 (2.80e-02%) differ by 0.15
  • 25/150215 (1.66e-02%) differ by 0.2

susie vs dap ~ 5 3 causal

  • correlation 0.94
  • 439/450645 (9.74e-02%) differ by 0.1
  • 245/450645 (5.44e-02%) differ by 0.15
  • 153/450645 (3.40e-02%) differ by 0.2
In [20]:
%preview susie_comparison/PIP_comparison_1008_prior_0p1_null_0p9.susie_vs_dap.png --width 90%
> susie_comparison/PIP_comparison_1008_prior_0p1_null_0p9.susie_vs_dap.png (151.0 KiB):
In [21]:
%preview susie_comparison/PIP_comparison_1008_prior_0p1_null_0p9.susie_vs_dap.md
> susie_comparison/PIP_comparison_1008_prior_0p1_null_0p9.susie_vs_dap.md (490 B):

susie vs dap 1 causal

  • correlation 0.99
  • 10/150215 (6.66e-03%) differ by 0.1
  • 6/150215 (3.99e-03%) differ by 0.15
  • 5/150215 (3.33e-03%) differ by 0.2

susie vs dap 2 causal

  • correlation 0.95
  • 89/150215 (5.92e-02%) differ by 0.1
  • 53/150215 (3.53e-02%) differ by 0.15
  • 34/150215 (2.26e-02%) differ by 0.2

susie vs dap ~ 5 3 causal

  • correlation 0.9
  • 526/450645 (1.17e-01%) differ by 0.1
  • 331/450645 (7.35e-02%) differ by 0.15
  • 226/450645 (5.02e-02%) differ by 0.2

When prior is estimated,

In [21]:
%preview susie_comparison/PIP_comparison_1008_prior_0p0_null_0p0.susie_vs_dap.png --width 90%
> susie_comparison/PIP_comparison_1008_prior_0p0_null_0p0.susie_vs_dap.png (156.4 KiB):
In [23]:
%preview susie_comparison/PIP_comparison_1008_prior_0p0_null_0p0.susie_vs_dap.md
> susie_comparison/PIP_comparison_1008_prior_0p0_null_0p0.susie_vs_dap.md (491 B):

susie vs dap 1 causal

  • correlation 0.98
  • 12/150215 (7.99e-03%) differ by 0.1
  • 8/150215 (5.33e-03%) differ by 0.15
  • 7/150215 (4.66e-03%) differ by 0.2

susie vs dap 2 causal

  • correlation 0.94
  • 78/150215 (5.19e-02%) differ by 0.1
  • 46/150215 (3.06e-02%) differ by 0.15
  • 31/150215 (2.06e-02%) differ by 0.2

susie vs dap ~ 5 3 causal

  • correlation 0.92
  • 433/450645 (9.61e-02%) differ by 0.1
  • 265/450645 (5.88e-02%) differ by 0.15
  • 170/450645 (3.77e-02%) differ by 0.2

SuSiE vs CAVIAR

In [24]:
%preview susie_comparison/PIP_comparison_1008_prior_0p1_null_0p0.susie_vs_caviar.png --width 90%
> susie_comparison/PIP_comparison_1008_prior_0p1_null_0p0.susie_vs_caviar.png (180.1 KiB):
In [25]:
%preview susie_comparison/PIP_comparison_1008_prior_0p1_null_0p0.susie_vs_caviar.md
> susie_comparison/PIP_comparison_1008_prior_0p1_null_0p0.susie_vs_caviar.md (496 B):

susie vs caviar 1 causal

  • correlation 0.99
  • 5/150215 (3.33e-03%) differ by 0.1
  • 0/150215 (0.00e+00%) differ by 0.15
  • 0/150215 (0.00e+00%) differ by 0.2

susie vs caviar 2 causal

  • correlation 0.96
  • 207/150215 (1.38e-01%) differ by 0.1
  • 112/150215 (7.46e-02%) differ by 0.15
  • 70/150215 (4.66e-02%) differ by 0.2

susie vs caviar 3 causal

  • correlation 0.94
  • 238/150215 (1.58e-01%) differ by 0.1
  • 115/150215 (7.66e-02%) differ by 0.15
  • 77/150215 (5.13e-02%) differ by 0.2

SuSiE vs FINEMAP

In [26]:
%preview susie_comparison/PIP_comparison_1008_prior_0p1_null_0p0.susie_vs_finemap.png --width 90%
> susie_comparison/PIP_comparison_1008_prior_0p1_null_0p0.susie_vs_finemap.png (176.7 KiB):
In [27]:
%preview susie_comparison/PIP_comparison_1008_prior_0p1_null_0p0.susie_vs_finemap.md
> susie_comparison/PIP_comparison_1008_prior_0p1_null_0p0.susie_vs_finemap.md (501 B):

susie vs finemap 1 causal

  • correlation 0.99
  • 17/150215 (1.13e-02%) differ by 0.1
  • 3/150215 (2.00e-03%) differ by 0.15
  • 1/150215 (6.66e-04%) differ by 0.2

susie vs finemap 2 causal

  • correlation 0.96
  • 242/150215 (1.61e-01%) differ by 0.1
  • 149/150215 (9.92e-02%) differ by 0.15
  • 97/150215 (6.46e-02%) differ by 0.2

susie vs finemap 3 causal

  • correlation 0.94
  • 311/150215 (2.07e-01%) differ by 0.1
  • 169/150215 (1.13e-01%) differ by 0.15
  • 110/150215 (7.32e-02%) differ by 0.2

DAP vs CAVIAR

In [28]:
%preview susie_comparison/PIP_comparison_1008_prior_0p1_null_0p0.dap_vs_caviar.png --width 90%
> susie_comparison/PIP_comparison_1008_prior_0p1_null_0p0.dap_vs_caviar.png (184.9 KiB):
In [29]:
%preview susie_comparison/PIP_comparison_1008_prior_0p1_null_0p0.dap_vs_caviar.md
> susie_comparison/PIP_comparison_1008_prior_0p1_null_0p0.dap_vs_caviar.md (492 B):

dap vs caviar 1 causal

  • correlation 0.99
  • 13/150215 (8.65e-03%) differ by 0.1
  • 7/150215 (4.66e-03%) differ by 0.15
  • 5/150215 (3.33e-03%) differ by 0.2

dap vs caviar 2 causal

  • correlation 0.96
  • 247/150215 (1.64e-01%) differ by 0.1
  • 134/150215 (8.92e-02%) differ by 0.15
  • 80/150215 (5.33e-02%) differ by 0.2

dap vs caviar 3 causal

  • correlation 0.94
  • 285/150215 (1.90e-01%) differ by 0.1
  • 139/150215 (9.25e-02%) differ by 0.15
  • 105/150215 (6.99e-02%) differ by 0.2

DAP vs FINEMAP

In [30]:
%preview susie_comparison/PIP_comparison_1008_prior_0p1_null_0p0.dap_vs_finemap.png --width 90%
> susie_comparison/PIP_comparison_1008_prior_0p1_null_0p0.dap_vs_finemap.png (181.0 KiB):
In [31]:
%preview susie_comparison/PIP_comparison_1008_prior_0p1_null_0p0.dap_vs_finemap.md
> susie_comparison/PIP_comparison_1008_prior_0p1_null_0p0.dap_vs_finemap.md (496 B):

dap vs finemap 1 causal

  • correlation 0.99
  • 22/150215 (1.46e-02%) differ by 0.1
  • 8/150215 (5.33e-03%) differ by 0.15
  • 7/150215 (4.66e-03%) differ by 0.2

dap vs finemap 2 causal

  • correlation 0.96
  • 274/150215 (1.82e-01%) differ by 0.1
  • 168/150215 (1.12e-01%) differ by 0.15
  • 107/150215 (7.12e-02%) differ by 0.2

dap vs finemap 3 causal

  • correlation 0.94
  • 366/150215 (2.44e-01%) differ by 0.1
  • 193/150215 (1.28e-01%) differ by 0.15
  • 127/150215 (8.45e-02%) differ by 0.2

To consolidate these figures:

In [30]:
d=1008
convert -append susie_comparison/PIP_comparison_"$d"_prior_0p1_null_0p0.susie_vs_dap.png susie_comparison/PIP_comparison_"$d"_prior_0p1_null_0p0.susie_vs_caviar.png susie_comparison/PIP_comparison_"$d"_prior_0p1_null_0p0.susie_vs_finemap.png susie_comparison/PIP_comparison_"$d"_susie_vs_others.png
convert -append susie_comparison/PIP_comparison_"$d"_prior_0p1_null_0p0.dap_vs_caviar.png susie_comparison/PIP_comparison_"$d"_prior_0p1_null_0p0.dap_vs_finemap.png susie_comparison/PIP_comparison_"$d"_prior_0p1_null_0p0.caviar_vs_finemap.png susie_comparison/PIP_comparison_"$d"_others.png

Robustness of purity threshold

Command

sos run analysis/20180620_Purity_Plot_Lite.ipynb ld --L 1

Result

In [32]:
%preview susie_comparison/ld_0722.png --width 90%
> susie_comparison/ld_0722.png (25.6 KiB):

Reliability of coverage

Command

Here the strongest null penalty 0.95 is evaluated.

for i in 3 4 5; do
    for j in 0 0.9; do
        sos run analysis/20180527_PIP_Workflow.ipynb coverage --n-signals $i --null-weight $j
    done
done
In [32]:
convert +append susie_comparison/Coverage_1008_null_weight_0p0_signals_3.png susie_comparison/Coverage_1008_null_weight_0p0_signals_4.png susie_comparison/Coverage_1008_null_weight_0p0_signals_5.png susie_comparison/Coverage_1008_null_weight_0p0.png
convert +append susie_comparison/Coverage_1008_null_weight_0p9_signals_3.png susie_comparison/Coverage_1008_null_weight_0p9_signals_4.png susie_comparison/Coverage_1008_null_weight_0p9_signals_5.png susie_comparison/Coverage_1008_null_weight_0p9.png

Result

It seem neither estimating prior nor the null penalty has helped much with CS coverage, if at all.

In [33]:
%preview susie_comparison/Coverage_1008_null_weight_0p0.png --width 90%
%preview susie_comparison/Coverage_1008_null_weight_0p0.png
> susie_comparison/Coverage_1008_null_weight_0p0.png (47.1 KiB):
In [34]:
%preview susie_comparison/Coverage_1008_null_weight_0p9.png --width 90%
%preview susie_comparison/Coverage_1008_null_weight_0p9.png
> susie_comparison/Coverage_1008_null_weight_0p9.png (49.3 KiB):

Speed comparison

Command

sos run analysis/20180630_Runtime.ipynb speed

Result

See Appendix C of our manuscript for hardware environment for this benchmark.

In [35]:
dat = readRDS('susie_comparison/speed_comparison_1008_prior_0p1_null_0p0.rds')
t(apply(dat, 2, function(x) c(mean(x), min(x), max(x))))
SuSiE 0.63550 0.34050 2.275000
DAP 2.86735 2.22586 8.874461
FINEMAP 23.01671 10.99400 48.163000
CAVIAR2907.50967 2637.33900 3018.519000
In [36]:
%preview susie_comparison/speed_comparison_1008_prior_0p1_null_0p0.png --width 60%
%preview susie_comparison/speed_comparison_1008_prior_0p1_null_0p0.png
> susie_comparison/speed_comparison_1008_prior_0p1_null_0p0.png (5.6 KiB):

© 2017-2018 authored by Gao Wang at Stephens Lab, The University of Chicago

Exported from manuscript_results/numerical_results.ipynb committed by Gao Wang on Tue Dec 17 01:08:32 2019 revision 26, 8039165