Please read our manuscript for simulation details.
Since the simulation was based on human genotype from GTEx project which is only available via dbGaP for approved users, we cannot share the input genotype data with the public. However in this repository we share:
We implemented the numerical study of the manuscript in the DSC framework.
The benchmark is a single SoS Notebook file (a Jupyter notebook with SoS kernel, in JSON format) finemapping.ipynb
that conslidates all scripts required for the benchmark, distributed in this repository under src
folder.
To run the benchmark, first navigate to src
, then export scripts from the notebook:
./export.sos
You should see a number of scripts generated under src
and src/modules
with extensions *.dsc
, *.py
and *.R
. These are scripts essential to run the benchmark.
We provide a dockerfile to run the benchmark. If you want to set up the computing environment yourself please also take a look at this file to see how we configured it -- including versions of software we used for the paper.
Here using one data-set from DAP-G paper we demonstrate now the benchmark works,
dsc susie.dsc --target run_comparison -o toy_comparison -c 8
screen output:
INFO: Load command line DSC sequence: run_comparison
INFO: DSC script exported to toy_comparison.html
INFO: Constructing DSC from susie.dsc ...
INFO: Building execution graph & running DSC ...
[#####################] 21 steps processed (183 jobs completed)
INFO: Building DSC database ...
INFO: DSC complete!
INFO: Elapsed time 6456.271 seconds.
Results from this one toy data-set can be found under toy_comparison
folder.
Computations of DSC are performed on The University of Chicago RCC Midway cluster.
The files *.yml
are relevant to settings on the cluster that assigns resource for different computational modules involved.
To compare with methods on smaller simulations scenarios for effect variable $S = 1,2,3,4,5$ on smaller data (1000 variables):
dsc susie.dsc --target run_comparison -o susie_comparison --host susie_comparison.yml -c 60
To compare with DAP on $S = 10$ on full range of genotype data (~8K variables on average):
dsc susie.dsc --target hard_case -o hard_case --host hard_case.yml -c 60
It is also possible to run the benchmark on a local computer, if --host
option is dropped. The *.yml
files under src
folder are still relevant in providing information on required resource for each DSC module. Running time of the benchmark depends on avaiable computational resource. It is estimated that the entire benchmark will take over a week to run on a single compute node with 28 Intel Xeon E5 CPU threads (some otehr fine-mapping methods are the speed bottleneck).
Please check out this pipeline for information on how input genotype data is prepared.
The benchmark data can be downloaded here, a 48GB tarball including all intermediate results generated by the benchmark (~8GB from comparison with CAVIAR, DAP-G and FINEMAP on 1000 variables, ~40GB from comparison with DAP-G on 3000~12,000$ variables). This dataset can be used to reproduce numerical studies of the manuscript as documented here.