Previously in GTEx V6 MASH analysis we have a list of 27 genes whose eQTL across brain tissues are in different directions compared to their effects in other tissues. We now run these genes through M&M.
We run M&M with different residual covariances,
We have to see if the 27 genes still present in GTEx V8 data. Their ENSG version might also have changed. Here we figure out their current data files:
%cd /project2/compbio/GTEx_eQTL/cis_eqtl_analysis_ready
genes = ['ENSG00000025772.7', 'ENSG00000056661.9', 'ENSG00000064012.17', 'ENSG00000089486.12', 'ENSG00000104472.5', 'ENSG00000108384.10', 'ENSG00000112977.11', 'ENSG00000120088.10', 'ENSG00000135744.7', 'ENSG00000136059.10', 'ENSG00000140265.8', 'ENSG00000145214.9', 'ENSG00000149054.10', 'ENSG00000160766.10', 'ENSG00000164124.6', 'ENSG00000177084.12', 'ENSG00000181240.9', 'ENSG00000187824.4', 'ENSG00000188732.6', 'ENSG00000189171.9', 'ENSG00000189316.3', 'ENSG00000198794.7', 'ENSG00000225439.2', 'ENSG00000249846.2', 'ENSG00000264247.1', 'ENSG00000267508.1', 'ENSG00000272030.1']
genes = [x.split('.')[0] for x in genes]
import glob
genes = sum([glob.glob(x + '*.Multi_Tissues.rds') for x in genes], [])
print(genes)
print(len(genes))
Now we have 26 left. We save them to file,
with open('../data/pleiotropy_genes.txt', 'w') as f:
f.write('\n'.join([x.replace('.Multi_Tissues.rds', '') for x in genes]))
The pipeline can be found here. We run two analysis:
sos run GTEx_V8_Association.ipynb mnm --cov_method flash \
--data-dir data --gene-id-file data/pleiotropy_genes.txt \
--mash-model data/FastQTLSumStats.mash.FL_PC3.mash_model_est_v.rds
sos run GTEx_V8_Association.ipynb mnm --cov_method diag \
--data-dir data --gene-id-file data/pleiotropy_genes.txt \
--mash-model data/FastQTLSumStats.mash.FL_PC3.mash_model_est_v.rds
sos run GTEx_V8_Association.ipynb mnm --cov_method znull \
--data-dir data --gene-id-file data/pleiotropy_genes.txt \
--mash-model data/FastQTLSumStats.mash.FL_PC3.mash_model_est_v.rds
The result is a PNG file figure containing 3 panels: PIP plot, effect size bubble plot from M&M and original univariate effect size bubble plot. Bubble plots focus on the CS identified. The "top" SNP (the SNP that has smallest p-value in the gene region, as we define the strong set in MASH) is in red shown on the 3rd panel.
Using flash
based residual covariance method,
%preview ~/ENSG00000177084.16.Multi_Tissues.mnm_flash.png
%preview ~/ENSG00000104472.9.Multi_Tissues.mnm_flash.png
The SNPs in cyan color have opposite effects in brain and non-brain tissues. M&M seems to resolve the problem. But I have reservations as there seems to be a lot CS discovered ...
%preview ~/ENSG00000135744.7.Multi_Tissues.mnm_flash.png
Here, M&M seems to "sharpen" the causal SNPs by shrinking them to a brain specific and a shared set.
%preview ~/ENSG00000160766.14.Multi_Tissues.mnm_flash.png
MASH shrinkage seems to help here, and below,
%preview ~/ENSG00000198794.11.Multi_Tissues.mnm_flash.png
%preview ~/ENSG00000136059.14.Multi_Tissues.mnm_flash.png
So maybe it is a case of pleiotropy?
%preview ~/ENSG00000272030.1.Multi_Tissues.mnm_flash.png
This does not happen too often (2 out of 26 analysis), but still not rare events:
%preview ~/ENSG00000120088.14.Multi_Tissues.mnm_flash.png
%preview ~/ENSG00000249846.6.Multi_Tissues.mnm_flash.png
But with diag
method for residual covariance we get this all the time, for example for the same data ENSG00000104472.9
, the first example in the "good" example section, if we use diag
method,
%preview ~/ENSG00000104472.9.Multi_Tissues.mnm_diag.png
Not very promising.
%cd ~/tmp/29-Oct-2019/output
%preview ENSG00000177084.16.Multi_Tissues.mnm_znull.png
%preview ENSG00000104472.9.Multi_Tissues.mnm_znull.png
%preview ENSG00000135744.7.Multi_Tissues.mnm_znull.png
%preview ENSG00000160766.14.Multi_Tissues.mnm_znull.png
%preview ENSG00000198794.11.Multi_Tissues.mnm_znull.png
%preview ENSG00000136059.14.Multi_Tissues.mnm_znull.png
%preview ENSG00000272030.1.Multi_Tissues.mnm_znull.png
%preview ENSG00000120088.14.Multi_Tissues.mnm_znull.png
%preview ENSG00000249846.6.Multi_Tissues.mnm_znull.png
In fact, as later shown in a simulation benchmark we are still suffering from issues with analysis with missing data. Something is not quite right there. Please see section "Next steps for this investigation" on that page.
Additionally, I want to
Exported from analysis/pleiotropy_linkage.ipynb
committed by Gao Wang on Tue Feb 2 19:11:23 2021 revision 1, c5fe213