SuSiE manuscript

Splicing QTL analysis: application to Li et al 2016

In Li et al 2016 the authors systematically analyzed genetic effects (SNPs) on various molecular phenotypes of gene regulation, from the chromatin state through protein function. Here we re-analyze one molecular phenotype, the alternative splicing. We fine-map splice QTL (sQTL) using data provided in Li et al.

The data-set is available here. Summary statistics for sQTL analysis via MatrixQTL was also provided on the website. In the original analysis the authors performed permutation based procedure to calibrate p-value for MatrixQTL results. They computed the empirical gene-level p-value for the most significant QTL for each gene; and at 10% and 5% FDR, they identified 2,893 and 1,602 sQTLs respectively.

Here we perform two analysis, sQTL fine-mapping with SuSiE and enrichment analysis in selected genomic regions or functional annotations. This notebook provides commands to reproduce our fine-mapping and enrichment analysis.

In [1]:
%revisions -s --tags --no-walk
Revision Author Date Message
25944c8 Gao Wang 2018-08-21 SuSiE v0.3 results
In [2]:
%cd /project/compbio
/project/compbio

Data paths

Set these environment variables in the command window that performs the analysis. To reproduce you need to change these paths to fit your computing environment:

xdata=/project/compbio/jointLCLs/genotype/hg19/YRI/genotypesYRI.gen.txt.gz
ydata=/project/compbio/jointLCLs/phenotypes/fastqtl_qqnorm_ASintron_RNAseqGeuvadis_YangVCF.txt.gz
ncpu=16
trait=AS
cwd=/project/compbio/jointLCLs/results/SuSiE
binary_path=~/GIT/github/mvarbvs/dsc/modules/linux/
annotation_list="data/annotation.list"

Fine-mapping with SuSiE

Alternative splicing ratio intron data preprocessing

sos run analysis/20180704_MolecularQTL_Workflow.ipynb preprocess \
    --x-data $xdata --y-data $ydata --trait $trait --cwd $cwd -j $ncpu \
    --max-dist 100000 --num-pcs 3

SuSiE analysis

sos run analysis/20180704_MolecularQTL_Workflow.ipynb SuSiE \
    --x-data $xdata --y-data $ydata --trait $trait --cwd $cwd -j $ncpu \
    --max-dist 100000 --prior_var 0.096

Fine-mapping results plot

We plot posterior inclusion probability for intron clusters having one or more confidence sets identified:

sos run analysis/20180704_MolecularQTL_Workflow.ipynb SuSiE_summary \
    --x-data $xdata --y-data $ydata --trait $trait --cwd $cwd -j $ncpu \
    --max-dist 100000

The plots can be downloaded here.

Confidence sets summary

sos run analysis/20180704_MolecularQTL_Workflow.ipynb signal_summary \
    --x-data $xdata --y-data $ydata --trait $trait --cwd $cwd -j $ncpu \
    --max-dist 100000

This command generates a file named SuSiE_CS_Summary.rds which contains information on properties of reported SuSiE confidence sets such as the distribution in number of CS per intron unit, their size, purity and PIP inside CS.

Fine-mapping with other methods

DAP-G

We also run DAP-G on the dataset:

sos run analysis/20180704_MolecularQTL_Workflow.ipynb DAP \
    --x-data $xdata \
    --y-data $ydata \
    --max-dist 100000 --num-pcs 3 --trait $trait \
    --cwd $cwd -j $ncpu -b $binary_path

Follow-up with CAVIAR

For intron cluster with one or more confidence sets we follow up with CAVIAR allowing for one additional possible secondary signal (use -c 2 option):

sos run analysis/20180704_MolecularQTL_Workflow.ipynb CAVIAR_follow_up \
    --x-data $xdata --y-data $ydata --trait $trait --cwd $cwd -j $ncpu \
    --max-dist 100000 \
    -b $binary_path

Comparison between CAVIAR and SuSiE results for intron clusters analyzed can be found here.

Follow-up with multiple regression

sos run analysis/20180704_MolecularQTL_Workflow.ipynb lm_follow_up \
    --x-data $xdata --y-data $ydata --trait $trait --cwd $cwd -j $ncpu \
    --max-dist 100000
In [2]:
%preview JointLCL/AS_output/fastqtl_qqnorm_ASintron_RNAseqGeuvadis_YangVCF_100Kb/lm_follow_up/top_cond_pval.png
%preview JointLCL/AS_output/fastqtl_qqnorm_ASintron_RNAseqGeuvadis_YangVCF_100Kb/lm_follow_up/top_cond_pval.png
> JointLCL/AS_output/fastqtl_qqnorm_ASintron_RNAseqGeuvadis_YangVCF_100Kb/lm_follow_up/top_cond_pval.png (168.2 KiB):

Extract SuSiE summary statistics

We consolidate PIP and CS from identified sQTLs -- relevant to other analysis downstream:

sos run analysis/20180712_Enrichment_Workflow.ipynb extract_sumstats \
    --trait $trait --y-data $ydata -j $ncpu --cwd $cwd

Comparison with MatrixQTL results (analysis performed by Li et al 2016)

sos run analysis/20180712_Enrichment_Workflow.ipynb signal_overlap \
    --y-data $ydata --cwd $cwd \
    --trait $trait --fdr 0.01

sos run analysis/20180712_Enrichment_Workflow.ipynb signal_overlap \
    --y-data $ydata --cwd $cwd \
    --trait $trait --fdr 0.001   

sos run analysis/20180712_Enrichment_Workflow.ipynb signal_overlap \
    --y-data $ydata --cwd $cwd \
    --trait $trait --fdr 0.0001

Enrichment analysis

Download and pre-process annotation data

Pipelines below were used to obtain and pre-process the annotation data-sets. We provide here the processed annotation files for easier access.

Histone marks & DNase I hypersensitive site

sos run analysis/20180712_Enrichment_Workflow.ipynb download_histone_annotation --trait $trait \
    --y-data $ydata

CTCF binding

sos run analysis/20180712_Enrichment_Workflow.ipynb download_ctcf_annotation --trait $trait \
    --y-data $ydata

RNA polymerase II binding

sos run analysis/20180712_Enrichment_Workflow.ipynb download_polII_annotation --trait $trait \
    --y-data $ydata

General annotations

sos run analysis/20180712_Enrichment_Workflow.ipynb download_general_annotation --trait $trait \
    --y-data $ydata

Gene regions

sos run analysis/20180712_Enrichment_Workflow.ipynb gene_regions --trait $trait \
    --y-data $ydata

Extended splice sites

sos run analysis/20180712_Enrichment_Workflow.ipynb extended_splice_site --trait $trait \
    --y-data $ydata

Apply annotations

sos run analysis/20180712_Enrichment_Workflow.ipynb range2var_annotation \
    --y-data $ydata \
    --cwd $cwd --trait $trait -j $ncpu \
    --single-annot $annotation_list

Select candidate SNPs for sQTL enrichment analysis

Instead of performing matched analysis (matching MAF, distance to TSS, LD pattern) for other molecular genotypes, we focus on SNPs in genes for sQTL, as suggested by Li et al 2016.

sos run analysis/20180712_Enrichment_Workflow.ipynb overlap_cluster \
    --y-data $ydata \
    --cwd $cwd --trait $trait -j $ncpu

Fisher's exact test for enrichment

sos run analysis/20180712_Enrichment_Workflow.ipynb cs_fisher_test \
    --y-data $ydata \
    --cwd $cwd --trait $trait -j $ncpu \
    --single-annot $annotation_list
In [5]:
%preview jointLCLs/results/SuSiE/fastqtl_qqnorm_ASintron_RNAseqGeuvadis_YangVCF_100Kb/enrichment/SuSiE_loci.sumstats.cs_fisher_test.png
%preview jointLCLs/results/SuSiE/fastqtl_qqnorm_ASintron_RNAseqGeuvadis_YangVCF_100Kb/enrichment/SuSiE_loci.sumstats.cs_fisher_test.png
> jointLCLs/results/SuSiE/fastqtl_qqnorm_ASintron_RNAseqGeuvadis_YangVCF_100Kb/enrichment/SuSiE_loci.sumstats.cs_fisher_test.png (283.7 KiB):

© 2017-2018 authored by Gao Wang at Stephens Lab, The University of Chicago

Exported from manuscript_results/Li_2016_splice_QTL.ipynb committed by Gao Wang on Mon Oct 22 20:55:41 2018 revision 10, 05fe86e