In Li et al 2016 the authors systematically analyzed genetic effects (SNPs) on various molecular phenotypes of gene regulation, from the chromatin state through protein function. Here we re-analyze one molecular phenotype, the alternative splicing. We fine-map splice QTL (sQTL) using data provided in Li et al.
The data-set is available here. Summary statistics for sQTL analysis via MatrixQTL
was also provided on the website. In the original analysis the authors performed permutation based procedure to calibrate p-value for MatrixQTL
results. They computed the empirical
gene-level p-value for the most significant QTL for each gene; and at 10% and 5% FDR, they identified 2,893 and 1,602 sQTLs respectively.
Here we perform two analysis, sQTL fine-mapping with SuSiE and enrichment analysis in selected genomic regions or functional annotations. This notebook provides commands to reproduce our fine-mapping and enrichment analysis.
%revisions -s --tags --no-walk
%cd /project/compbio
Set these environment variables in the command window that performs the analysis. To reproduce you need to change these paths to fit your computing environment:
xdata=/project/compbio/jointLCLs/genotype/hg19/YRI/genotypesYRI.gen.txt.gz
ydata=/project/compbio/jointLCLs/phenotypes/fastqtl_qqnorm_ASintron_RNAseqGeuvadis_YangVCF.txt.gz
ncpu=16
trait=AS
cwd=/project/compbio/jointLCLs/results/SuSiE
binary_path=~/GIT/github/mvarbvs/dsc/modules/linux/
annotation_list="data/annotation.list"
sos run analysis/20180704_MolecularQTL_Workflow.ipynb preprocess \
--x-data $xdata --y-data $ydata --trait $trait --cwd $cwd -j $ncpu \
--max-dist 100000 --num-pcs 3
sos run analysis/20180704_MolecularQTL_Workflow.ipynb SuSiE \
--x-data $xdata --y-data $ydata --trait $trait --cwd $cwd -j $ncpu \
--max-dist 100000 --prior_var 0.096
We plot posterior inclusion probability for intron clusters having one or more confidence sets identified:
sos run analysis/20180704_MolecularQTL_Workflow.ipynb SuSiE_summary \
--x-data $xdata --y-data $ydata --trait $trait --cwd $cwd -j $ncpu \
--max-dist 100000
The plots can be downloaded here.
sos run analysis/20180704_MolecularQTL_Workflow.ipynb signal_summary \
--x-data $xdata --y-data $ydata --trait $trait --cwd $cwd -j $ncpu \
--max-dist 100000
This command generates a file named SuSiE_CS_Summary.rds
which contains information on properties of reported SuSiE confidence sets such as the distribution in number of CS per intron unit, their size, purity and PIP inside CS.
We also run DAP-G on the dataset:
sos run analysis/20180704_MolecularQTL_Workflow.ipynb DAP \
--x-data $xdata \
--y-data $ydata \
--max-dist 100000 --num-pcs 3 --trait $trait \
--cwd $cwd -j $ncpu -b $binary_path
For intron cluster with one or more confidence sets we follow up with CAVIAR allowing for one additional possible secondary signal (use -c 2
option):
sos run analysis/20180704_MolecularQTL_Workflow.ipynb CAVIAR_follow_up \
--x-data $xdata --y-data $ydata --trait $trait --cwd $cwd -j $ncpu \
--max-dist 100000 \
-b $binary_path
Comparison between CAVIAR and SuSiE results for intron clusters analyzed can be found here.
sos run analysis/20180704_MolecularQTL_Workflow.ipynb lm_follow_up \
--x-data $xdata --y-data $ydata --trait $trait --cwd $cwd -j $ncpu \
--max-dist 100000
%preview JointLCL/AS_output/fastqtl_qqnorm_ASintron_RNAseqGeuvadis_YangVCF_100Kb/lm_follow_up/top_cond_pval.png
We consolidate PIP and CS from identified sQTLs -- relevant to other analysis downstream:
sos run analysis/20180712_Enrichment_Workflow.ipynb extract_sumstats \
--trait $trait --y-data $ydata -j $ncpu --cwd $cwd
sos run analysis/20180712_Enrichment_Workflow.ipynb signal_overlap \
--y-data $ydata --cwd $cwd \
--trait $trait --fdr 0.01
sos run analysis/20180712_Enrichment_Workflow.ipynb signal_overlap \
--y-data $ydata --cwd $cwd \
--trait $trait --fdr 0.001
sos run analysis/20180712_Enrichment_Workflow.ipynb signal_overlap \
--y-data $ydata --cwd $cwd \
--trait $trait --fdr 0.0001
Pipelines below were used to obtain and pre-process the annotation data-sets. We provide here the processed annotation files for easier access.
sos run analysis/20180712_Enrichment_Workflow.ipynb download_histone_annotation --trait $trait \
--y-data $ydata
sos run analysis/20180712_Enrichment_Workflow.ipynb download_ctcf_annotation --trait $trait \
--y-data $ydata
sos run analysis/20180712_Enrichment_Workflow.ipynb download_polII_annotation --trait $trait \
--y-data $ydata
sos run analysis/20180712_Enrichment_Workflow.ipynb download_general_annotation --trait $trait \
--y-data $ydata
sos run analysis/20180712_Enrichment_Workflow.ipynb gene_regions --trait $trait \
--y-data $ydata
sos run analysis/20180712_Enrichment_Workflow.ipynb extended_splice_site --trait $trait \
--y-data $ydata
sos run analysis/20180712_Enrichment_Workflow.ipynb range2var_annotation \
--y-data $ydata \
--cwd $cwd --trait $trait -j $ncpu \
--single-annot $annotation_list
Instead of performing matched analysis (matching MAF, distance to TSS, LD pattern) for other molecular genotypes, we focus on SNPs in genes for sQTL, as suggested by Li et al 2016.
sos run analysis/20180712_Enrichment_Workflow.ipynb overlap_cluster \
--y-data $ydata \
--cwd $cwd --trait $trait -j $ncpu
sos run analysis/20180712_Enrichment_Workflow.ipynb cs_fisher_test \
--y-data $ydata \
--cwd $cwd --trait $trait -j $ncpu \
--single-annot $annotation_list
%preview jointLCLs/results/SuSiE/fastqtl_qqnorm_ASintron_RNAseqGeuvadis_YangVCF_100Kb/enrichment/SuSiE_loci.sumstats.cs_fisher_test.png
Exported from manuscript_results/Li_2016_splice_QTL.ipynb
committed by Gao Wang on Mon Oct 22 20:55:41 2018 revision 10, 05fe86e