SuSiE manuscript

Numerical Comparisons

This page contains information to reproduce the numerical comparisons in SuSiE manuscript.

Note that the DSC benchmark has to be completed at this stage. Please follow these instructions to run the DSC benchmark for SuSiE fine-mapping applications.

In [1]:
%revisions -s --tags --no-walk
Revision Author Date Message
afc0677 Gao Wang 2018-10-10 SuSiE v0.4 results
25944c8 Gao Wang 2018-08-21 SuSiE v0.3 results

The work directory on my computer is ~/GIT/github/mvarbvs/dsc/. I will navigate to that directory to perform analysis and show the results.

In [1]:
%cd ~/GIT/github/mvarbvs/dsc
/project/mstephens/SuSiE/mvarbvs/dsc

PIP calibration

Command

sos run analysis/20180527_PIP_Workflow.ipynb cali_pip

Result

In [3]:
%preview susie_comparison/PIP_comparison_1008_calibrated_prior_0p1_null_0p0.png
> susie_comparison/PIP_comparison_1008_calibrated_prior_0p1_null_0p0.png (39.8 KiB):

With null weight penalty,

In [4]:
%preview susie_comparison/PIP_comparison_1008_calibrated_prior_0p1_null_0p9.png
> susie_comparison/PIP_comparison_1008_calibrated_prior_0p1_null_0p9.png (40.0 KiB):

Using estimated prior,

In [5]:
%preview susie_comparison/PIP_comparison_1008_calibrated_prior_0p0_null_0p0.png
> susie_comparison/PIP_comparison_1008_calibrated_prior_0p0_null_0p0.png (40.0 KiB):

Variable level power comparison

Command

sos run analysis/20180527_PIP_Workflow.ipynb roc --null-weight 0.0 0.9 --susie-prior 0.1
sos run analysis/20180527_PIP_Workflow.ipynb roc --null-weight 0.0 0.9 --susie-prior 0.0

Result

In [6]:
%preview susie_comparison/ROC_1008_prior_0p1.png --width 90%
> susie_comparison/ROC_1008_prior_0p1.png (42.0 KiB):

Using estimated prior,

In [7]:
%preview susie_comparison/ROC_1008_prior_0p0.png --width 90%
> susie_comparison/ROC_1008_prior_0p0.png (42.8 KiB):

"Purity" distribution

Command

sos run analysis/20180620_Purity_Plot_Lite.ipynb purity
sos run analysis/20180620_Purity_Plot_Lite.ipynb purity \
    --outdir hard_case --name 1008_L10 --source full_data --dataset lm_less03 \
    --susie fit_susie10 --L 10 --pve 0.3 --maxL 15

Result

In [8]:
%preview susie_comparison/hist_1008.png --width 90%
> susie_comparison/hist_1008.png (11.2 KiB):
In [9]:
%preview hard_case/hist_1008_L10.png --width 18%
> hard_case/hist_1008_L10.png (4.5 KiB):

Purity vs size illustration

In [10]:
%preview susie_comparison/purity_1007/37.png --width 80%
> susie_comparison/purity_1007/37.png (338.9 KiB):

Power comparisons of confidence sets

For "simple case" simulation, number of causal is 1~5 out of 1000 non-causal variants; for the "hard case" number of causal is 10 out of 3000~12000 non-causal variants.

Command

sos run analysis/20180527_PIP_Workflow.ipynb power
sos run analysis/20180615_Power_DAP.ipynb power

Result

for i in 0.0 0.1; do
    for j in 0.0 0.5 0.9 0.95; do
        sos run ~/GIT/github/susie-paper/manuscript_results/numerical_results.ipynb cs_eval \
        --null-weight $j --susie-prior $i -s force
    done
done
In [11]:
readRDS('susie_comparison/Power_comparison_1008_cluster_prob_prior_0p1_null_0p0.rds')
n_signalexpected_discoveriessusie_discoveriesdap_discoveriessusie_powerdap_powersusie_coveragedap_coveragesusie_avg_sizedap_avg_sizesusie_median_sizedap_median_sizesusie_avg_lddap_avg_ld
1 300 302 270 0.98666670.88666670.98013250.985185211.44257 6.0563913 3 0.99372290.9882739
2 600 423 391 0.67166670.60166670.95035460.918158618.62189 8.1058504 5 0.98630830.9712244
3 900 493 468 0.52222220.48666670.92900610.912393218.05022 10.4355976 7 0.98079930.9607073
4 1200 568 530 0.45166670.40333330.92077460.894339617.86424 11.1497896 8 0.97867820.9485908
5 1500 569 533 0.36533330.32333330.90333920.872420322.06809 12.3247317 9 0.97488690.9503371

CS overlapping status:

In [12]:
readRDS('susie_comparison/Power_comparison_1008_cluster_prob_prior_0p1_null_0p0.overlap_status.rds')
$`4` =
  1. 623
  2. 640
  3. 623
In [13]:
readRDS('hard_case/Power_comparison_1008_cluster_prob_prior_0p1_null_0p0.rds')
n_signalexpected_discoveriessusie_discoveriesdap_discoveriessusie_powerdap_powersusie_coveragedap_coveragesusie_avg_sizedap_avg_sizesusie_median_sizedap_median_sizesusie_avg_lddap_avg_ld
1010 3000 840 801 0.26333330.23233330.91666670.861423227.38052 12.5913 9 10 0.97280160.9451048

CS overlapping status (there are none):

In [14]:
readRDS('hard_case/Power_comparison_1008_cluster_prob_prior_0p1_null_0p0.overlap_status.rds')

Out of 1500 simulations under different scenarios, only one instance we see CS overlaps, that the two CS in question have size 623 and 640 respectively (with purity > 0.5), and the CS with 623 variables is subset of that with 640. This is an edge case. Overall, overlapping CS is not a concern here.

In [15]:
[global]
def fmtP(x):
    return str(x).replace(".", "p")
date = '1008'
parameter: susie_prior = 0.0
parameter: null_weight = 0.0
parameter: cwd = path('./')

[cs_eval_1]
depends: R_library('ggplot2'), R_library('cowplot')
input: f'{cwd:a}/susie_comparison/Power_comparison_{date}_cluster_prob_prior_{fmtP(susie_prior)}_null_{fmtP(null_weight)}.rds', f'{cwd:a}/hard_case/Power_comparison_{date}_cluster_prob_prior_{fmtP(susie_prior)}_null_{fmtP(null_weight)}.rds'
output: f'{cwd:a}/susie_comparison/cs_comparison_prior_{fmtP(susie_prior)}_null_{fmtP(null_weight)}.pdf'
R: expand = "${ }", workdir = cwd, stderr = False, stdout = False
    library(ggplot2)
    library(cowplot)
    # Multiple plot function
    #
    # ggplot objects can be passed in ..., or to plotlist (as a list of ggplot objects)
    # - cols:   Number of columns in layout
    # - layout: A matrix specifying the layout. If present, 'cols' is ignored.
    #
    # If the layout is something like matrix(c(1,2,3,3), nrow=2, byrow=TRUE),
    # then plot 1 will go in the upper left, 2 will go in the upper right, and
    # 3 will go all the way across the bottom.
    #
    multiplot <- function(..., plotlist=NULL, file, cols=1, layout=NULL) {
      library(grid)

      # Make a list from the ... arguments and plotlist
      plots <- c(list(...), plotlist)

      numPlots = length(plots)

      # If layout is NULL, then use 'cols' to determine layout
      if (is.null(layout)) {
        # Make the panel
        # ncol: Number of columns of plots
        # nrow: Number of rows needed, calculated from # of cols
        layout <- matrix(seq(1, cols * ceiling(numPlots/cols)),
                        ncol = cols, nrow = ceiling(numPlots/cols), byrow=TRUE)
      }

     if (numPlots==1) {
        print(plots[[1]])

      } else {
        # Set up the page
        grid.newpage()
        pushViewport(viewport(layout = grid.layout(nrow(layout), ncol(layout))))

        # Make each plot, in the correct location
        for (i in 1:numPlots) {
          # Get the i,j matrix positions of the regions that contain this subplot
          matchidx <- as.data.frame(which(layout == i, arr.ind = TRUE))

          print(plots[[i]], vp = viewport(layout.pos.row = matchidx$row,
                                          layout.pos.col = matchidx$col))
        }
      }
    }
    d1 = readRDS(${_input[0]:r})
    d2 = readRDS(${_input[1]:r})
    dat = rbind(d1,d2)
    #print(head(dat))
    dat$susie_power_se = sqrt(dat$susie_power * (1-dat$susie_power) / dat$expected_discoveries)
    dat$dap_power_se = sqrt(dat$dap_power * (1-dat$dap_power) / dat$expected_discoveries)
    dat$susie_coverage_se = sqrt(dat$susie_coverage * (1-dat$susie_coverage) / dat$susie_discoveries)
    dat$dap_coverage_se = sqrt(dat$dap_coverage * (1-dat$dap_coverage) / dat$dap_discoveries)
    susie = cbind(dat[,c("n_signal", "expected_discoveries", "susie_power", "susie_coverage", "susie_power_se", "susie_coverage_se", "susie_median_size", "susie_avg_ld")], "SuSiE")
    colnames(susie) = c("n_signal", "expected_discoveries", "power", "coverage", "power_se", "coverage_se", "median_size", "avg_ld", "Method")
    dap = cbind(dat[,c("n_signal", "expected_discoveries", "dap_power", "dap_coverage", "dap_power_se", "dap_coverage_se", "dap_median_size", "dap_avg_ld")], "DAP-G")
    colnames(dap) = c("n_signal", "expected_discoveries", "power", "coverage", "power_se", "coverage_se", "median_size", "avg_ld", "Method")
    dat = rbind(susie, dap)
    dat$n_signal = as.factor(dat$n_signal)
    plot_panel = function(dat, quantity, legend = TRUE) {
        p = ggplot(dat, aes_string(x="n_signal", y=quantity[1], fill="Method")) + 
            scale_color_manual("Method", values = c("DAP-G" = "#348ABD", "SuSiE" = "#A60628")) + 
            geom_point(aes(colour=Method), position=position_dodge(.25), size=2.5)
        if (quantity[1] == 'power') p = p + geom_errorbar(aes(ymin=power-power_se, ymax=power+power_se, color=Method), width=.2, position=position_dodge(.25))
        if (quantity[1] == 'coverage') p = p + geom_errorbar(aes(ymin=coverage-coverage_se, ymax=coverage+coverage_se, color=Method), width=.2, position=position_dodge(.25)) + geom_hline(yintercept = 0.95, colour = 'gray') 
        p = p + labs(x = "number of effect variables", y = "") + theme_cowplot() + background_grid(major = "x", minor = "none") + ggtitle(quantity[2])
        if (!legend) p = p + theme(legend.position="none")
        return(p)
    }
    p1 = plot_panel(dat, c('coverage', 'A. coverage'), legend = FALSE)
    p2 = plot_panel(dat, c('power', 'B. power'), legend = FALSE)
    p3 = plot_panel(dat, c('median_size', 'C. median number of variables'), legend = FALSE)
    p4 = plot_panel(dat, c('avg_ld', 'D. average r2'), legend = FALSE)
    pdf('${_output:n}.pdf', width=12, height=3)
    print(multiplot(p1, p2, p3, p4, cols=4))
    dev.off()
In [16]:
%preview susie_comparison/cs_comparison_prior_0p1_null_0p0.pdf -s png --dpi 120
> susie_comparison/cs_comparison_prior_0p1_null_0p0.pdf (6.9 KiB):

Use estimated prior,

In [17]:
%preview susie_comparison/cs_comparison_prior_0p0_null_0p0.pdf -s png --dpi 120
> susie_comparison/cs_comparison_prior_0p0_null_0p0.pdf (6.8 KiB):

PIP direct comparison

Command

sos run analysis/20180527_PIP_Workflow.ipynb pip

Result

SuSiE vs DAP-G

In [18]:
%preview susie_comparison/PIP_comparison_1008_prior_0p1_null_0p0.susie_vs_dap.png --width 90%
> susie_comparison/PIP_comparison_1008_prior_0p1_null_0p0.susie_vs_dap.png (169.3 KiB):