Last updated: 2020-07-24

Checks: 6 1

Knit directory: causal-TWAS/

This reproducible R Markdown analysis was created with workflowr (version 1.6.0). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.


The R Markdown is untracked by Git. To know which version of the R Markdown file created these results, you’ll want to first commit it to the Git repo. If you’re still working on the analysis, you can ignore this warning. When you’re finished, you can run wflow_publish to commit the R Markdown file and build the HTML.

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

The command set.seed(20191103) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility. The version displayed above was the version of the Git repository at the time these results were generated.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .Rhistory
    Ignored:    .Rproj.user/
    Ignored:    data/

Untracked files:
    Untracked:  analysis/simulation-multi-ukbchr22-gtex.adipose.Rmd
    Untracked:  code/run_UKB_process.R
    Untracked:  code/workflow/
    Untracked:  code/wtccc/

Unstaged changes:
    Modified:   README.md
    Modified:   analysis/index.Rmd
    Deleted:    code/ctwas_polygenic_V1.R
    Deleted:    code/ctwas_spikeslab_V1.R
    Deleted:    code/gene_annotation.R
    Modified:   code/input_reformat.R
    Modified:   code/mr.ash2.R
    Modified:   code/mr.ash2_FBM.R
    Deleted:    code/run_WTCCC_data_process.R
    Modified:   code/run_gwas_snp.R
    Modified:   code/run_test_mr.ash2s.R
    Modified:   code/run_test_susie.R
    Deleted:    code/simulate-WTCCC-expr.R
    Deleted:    code/simulate-WTCCC-phenotype.R
    Modified:   code/simulate_phenotype.R
    Deleted:    code/train_expression.R
    Deleted:    code/workflow-WTCCC-polygenic-simulation.ipynb
    Deleted:    code/workflow-ashtest.ipynb
    Deleted:    code/workflow-ashtest2.ipynb
    Deleted:    code/workflow-ashtest3.ipynb
    Deleted:    code/workflow-data.ipynb

Staged changes:
    Deleted:    code/master_run3.sh

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.


There are no past versions. Publish this analysis with wflow_publish() to start tracking its development.


Run simulation 9 times for ukb chr 22.

library(mr.ash.alpha)
library(data.table)
suppressMessages({library(plotly)})
library(tidyr)
library(plyr)
simdatadir <- "~/causalTWAS/simulations/simulation_ashtest_20200616/"
outputdir <- "~/causalTWAS/simulations/simulation_ashtest_20200616/"
susiedir <- "~/causalTWAS/simulations/simulation_susietest_20200616/"
get_files <- function(tag, tag2){
  par <- paste0(outputdir, tag, "-mr.ash2s.", tag2, ".param.txt")
  rpip <- paste0(outputdir, tag, "-mr.ash2s.", tag2, ".rPIP.txt")
  
  gmrash <- paste0(outputdir, tag, "-mr.ash2s.", tag2, ".expr.txt")
  smrash <- paste0(outputdir, tag, "-mr.ash2s.", tag2, ".snp.txt")   
  
  ggwas <- paste0(outputdir, tag, ".exprgwas.txt.gz")
  sgwas <- paste0(outputdir, tag, ".snpgwas.txt.gz")
  
  gsusie <- paste0(susiedir, tag, ".", tag2, ".L3.susieres.expr.txt")
  ssusie <- paste0(susiedir, tag, ".", tag2, ".L3.susieres.snp.txt")
  
  return(tibble::lst(par, rpip, gmrash, ggwas, smrash, sgwas, gsusie, ssusie))
}

Mr.ash2 parameter estimation

Results for 9 simulations runs, using different initiate and update strategy

tags <- paste0('20200616-7-', 1:9)
tag2s <- c('expr-snp', 'snp-expr', 'lassoexpr-snp','lassoSNPes-es','lassoes-se' )
show_param <- function(tag2){
  f <- lapply(tags, get_files, tag2 = tag2)
  parf <- lapply(f, '[[', "par")
  param <- do.call(rbind, lapply(parf, function(x) t(read.table(x))[2:1,]))
  knitr::kable(param)
}

NULL; expr-snp; expr-snp

show_param(tag2s[1])
gene.pi1 gene.pve snp.pi1 snp.pve
truth 0.0502092 0.0047262 0.0024979 0.0506180
estimated 0.0321949 0.0092713 0.0005093 0.0480143
truth 0.0502092 0.0134063 0.0024979 0.0567824
estimated 0.0586605 0.0169032 0.0005042 0.0479763
truth 0.0502092 0.0083281 0.0024979 0.0543350
estimated 0.0608350 0.0173011 0.0004527 0.0427801
truth 0.0502092 0.0089567 0.0024979 0.0586225
estimated 0.0918778 0.0259876 0.0006059 0.0566179
truth 0.0502092 0.0118538 0.0024979 0.0487240
estimated 0.0553670 0.0159207 0.0005120 0.0485317
truth 0.0502092 0.0054891 0.0024979 0.0465223
estimated 0.0924275 0.0257045 0.0002606 0.0248300
truth 0.0502092 0.0247506 0.0024979 0.0485317
estimated 0.1083247 0.0310019 0.0004298 0.0415368
truth 0.0502092 0.0029643 0.0024979 0.0519305
estimated 0.0296452 0.0086987 0.0005386 0.0515244
truth 0.0502092 0.0069086 0.0024979 0.0529274
estimated 0.0889689 0.0256536 0.0003692 0.0359862

NULL; snp-expr; expr-snp

show_param(tag2s[2])
gene.pi1 gene.pve snp.pi1 snp.pve
truth 0.0502092 0.0047262 0.0024979 0.0506180
estimated 0.0321944 0.0092712 0.0005093 0.0480149
truth 0.0502092 0.0134063 0.0024979 0.0567824
estimated 0.0596023 0.0171553 0.0004935 0.0469762
truth 0.0502092 0.0083281 0.0024979 0.0543350
estimated 0.0608435 0.0173035 0.0004528 0.0427930
truth 0.0502092 0.0089567 0.0024979 0.0586225
estimated 0.0832465 0.0236055 0.0006322 0.0588887
truth 0.0502092 0.0118538 0.0024979 0.0487240
estimated 0.0553671 0.0159207 0.0005120 0.0485315
truth 0.0502092 0.0054891 0.0024979 0.0465223
estimated 0.0924275 0.0257045 0.0002606 0.0248300
truth 0.0502092 0.0247506 0.0024979 0.0485317
estimated 0.1083243 0.0310017 0.0004298 0.0415367
truth 0.0502092 0.0029643 0.0024979 0.0519305
estimated 0.0302425 0.0088718 0.0005388 0.0515448
truth 0.0502092 0.0069086 0.0024979 0.0529274
estimated 0.0889689 0.0256536 0.0003692 0.0359862

lasso; expr-snp; expr-snp

show_param(tag2s[3])
gene.pi1 gene.pve snp.pi1 snp.pve
truth 0.0502092 0.0047262 0.0024979 0.0506180
estimated 0.0228165 0.0066038 0.0005438 0.0512063
truth 0.0502092 0.0134063 0.0024979 0.0567824
estimated 0.0416075 0.0120936 0.0005595 0.0531555
truth 0.0502092 0.0083281 0.0024979 0.0543350
estimated 0.0612394 0.0174629 0.0004845 0.0457611
truth 0.0502092 0.0089567 0.0024979 0.0586225
estimated 0.0637453 0.0182558 0.0007030 0.0653106
truth 0.0502092 0.0118538 0.0024979 0.0487240
estimated 0.0452139 0.0130858 0.0005611 0.0531032
truth 0.0502092 0.0054891 0.0024979 0.0465223
estimated 0.0680865 0.0191110 0.0003529 0.0333717
truth 0.0502092 0.0247506 0.0024979 0.0485317
estimated 0.0666294 0.0194348 0.0006197 0.0591515
truth 0.0502092 0.0029643 0.0024979 0.0519305
estimated 0.0100500 0.0029766 0.0005783 0.0552552
truth 0.0502092 0.0069086 0.0024979 0.0529274
estimated 0.0498103 0.0145714 0.0004834 0.0466705

lasso; expr-snp; snp-expr

show_param(tag2s[4])
gene.pi1 gene.pve snp.pi1 snp.pve
truth 0.0502092 0.0047262 0.0024979 0.0506180
estimated 0.0000000 0.0000000 0.0005895 0.0552205
truth 0.0502092 0.0134063 0.0024979 0.0567824
estimated 0.0000000 0.0000000 0.0006964 0.0651481
truth 0.0502092 0.0083281 0.0024979 0.0543350
estimated 0.0488315 0.0139788 0.0005269 0.0495375
truth 0.0502092 0.0089567 0.0024979 0.0586225
estimated 0.0180747 0.0052461 0.0007962 0.0732005
truth 0.0502092 0.0118538 0.0024979 0.0487240
estimated 0.0399764 0.0115843 0.0005800 0.0547575
truth 0.0502092 0.0054891 0.0024979 0.0465223
estimated 0.0595675 0.0167656 0.0003795 0.0358011
truth 0.0502092 0.0247506 0.0024979 0.0485317
estimated 0.0496988 0.0145644 0.0006602 0.0626893
truth 0.0502092 0.0029643 0.0024979 0.0519305
estimated 0.0100559 0.0029783 0.0005783 0.0552540
truth 0.0502092 0.0069086 0.0024979 0.0529274
estimated 0.0422545 0.0123905 0.0005078 0.0488983

lassosnp; expr-snp; expr-snp

show_param(tag2s[5])
gene.pi1 gene.pve snp.pi1 snp.pve
truth 0.0502092 0.0047262 0.0024979 0.0506180
estimated 0.0204267 0.0059169 0.0005478 0.0515797
truth 0.0502092 0.0134063 0.0024979 0.0567824
estimated 0.0416351 0.0121015 0.0005595 0.0531559
truth 0.0502092 0.0083281 0.0024979 0.0543350
estimated 0.0612418 0.0174636 0.0004845 0.0457608
truth 0.0502092 0.0089567 0.0024979 0.0586225
estimated 0.0637456 0.0182559 0.0007030 0.0653100
truth 0.0502092 0.0118538 0.0024979 0.0487240
estimated 0.0452191 0.0130873 0.0005611 0.0531039
truth 0.0502092 0.0054891 0.0024979 0.0465223
estimated 0.0680900 0.0191120 0.0003529 0.0333716
truth 0.0502092 0.0247506 0.0024979 0.0485317
estimated 0.0666303 0.0194351 0.0006197 0.0591511
truth 0.0502092 0.0029643 0.0024979 0.0519305
estimated 0.0100752 0.0029840 0.0005783 0.0552548
truth 0.0502092 0.0069086 0.0024979 0.0529274
estimated 0.0498103 0.0145714 0.0004834 0.0466705

Regional mr.ash2s PIP overview

Take simulation 1 (NULL; expr-snp; expr-snp) as examples. We use region size 500kb and PIP cut off at 0.5 for SUSIE.

f <- get_files(tag= tags[1], tag2 = tag2s[1])
a <- read.table(f[["rpip"]], header = T)
plot(a$p0, a$rPIP, pch =19, col ='salmon', xlab = "position", ylab= "Sum of PIP")
grid()

PIP scatter plot

mr.ash2s PIP vs. susie PIP.

scatter_plot_PIP<- function(tag2){
  f <- lapply(tags, get_files, tag2 = tag2)
  mrashf <- lapply(f, '[[', "gmrash")
  names(mrashf) <- tags
  
  susief <- lapply(f, '[[', "gsusie")
  names(susief) <- tags

  .tagname <- function(x, flist){
    a <- read.table(flist[[x]], header =T)
    a[, "name"] <- paste0(x, ":", a[, "name"])
    a
  }
  mrashres <- do.call(rbind, lapply(tags, .tagname, flist = mrashf))
  susieres <- do.call(rbind, lapply(tags, .tagname, flist = susief))
 
  res <- merge(mrashres, susieres, by = "name", all = T)
  
  res <- res[complete.cases(res),]
  res <- rename(res, c("PIP" = "mr.ash_PIP", "pip" = "SUSIE_PIP", "pip.null" = "SUSIE_PIP_null") )
  res$ifcausal <- mapvalues(res$ifcausal, 
          from=c(0,1), 
          to=c("Non causal", "Causal"))
  
  fig1 <- plot_ly(data = res, x = ~ mr.ash_PIP, y = ~ SUSIE_PIP, color = ~ ifcausal, 
                 colors = c( "salmon", "darkgreen"))
  
  fig2 <- plot_ly(data = res, x = ~ mr.ash_PIP, y = ~ SUSIE_PIP_null, color = ~ ifcausal, 
                 colors = c( "salmon", "darkgreen"))
  
  fig <- subplot(fig1, fig2, titleX = TRUE, titleY = T, margin = 0.1)
  fig
}

NULL; expr-snp; expr-snp

scatter_plot_PIP(tag2s[1])
Warning: `arrange_()` is deprecated as of dplyr 0.7.0.
Please use `arrange()` instead.
See vignette('programming') for more help
This warning is displayed once every 8 hours.
Call `lifecycle::last_warnings()` to see where this warning was generated.

NULL; snp-expr; expr-snp

scatter_plot_PIP(tag2s[2])

lasso; expr-snp; expr-snp

scatter_plot_PIP(tag2s[3])

lasso; expr-snp; snp-expr

scatter_plot_PIP(tag2s[4])

lassosnp; expr-snp; expr-snp

scatter_plot_PIP(tag2s[5])

ROC curve

ROC_plot<- function(tag2){
  f <- lapply(tags, get_files, tag2 = tag2)
  mrashf <- lapply(f, '[[', "gmrash")
  names(mrashf) <- tags
  
  susief <- lapply(f, '[[', "gsusie")
  names(susief) <- tags
  
  gwasf <- lapply(f, '[[', "ggwas")
  names(gwasf) <- tags

  .tagname <- function(x, flist, colnames = NULL){
    a <- read.table(flist[[x]], header =T)
    if (!is.null(colnames)){
      colnames(a) <- colnames
    }
    a[, "name"] <- paste0(x, ":", a[, "name"])
    a
  }
  mrashres <- do.call(rbind, lapply(tags, .tagname, flist = mrashf))
  susieres <- do.call(rbind, lapply(tags, .tagname, flist = susief))
  gwasres <- do.call(rbind, lapply(tags, .tagname, flist = gwasf, 
                                   colnames =  c("chr", "p0",   "p1", "name", "Estimate", "Std.Error", "t-value", "PVALUE")))

  res <- merge(mrashres, susieres, by = "name", all = T)
  res <- merge(res, gwasres, by = "name", all = T)
  
  res <- res[complete.cases(res),]
  res <- rename(res, c("PIP" = "mr.ash", "pip" = "SUSIE", "PVALUE" = "TWAS") )
  res[,"TWAS"] <- -log10(res[, "TWAS"])
  
  roccolors <-  c("red", "green", "blue")
  methods <- c("mr.ash", "SUSIE", "TWAS")
  plot(0, xlim=c(0,1), ylim=c(0,1), col="white", xlab = "FPR", ylab = "TPR")
  for (i in 1:3){
    method <- methods[i]
    bordered <- res[order(res[,method]),] 
    actuals <- bordered$ifcausal == 1
    sens <- (sum(actuals) - cumsum(actuals))/sum(actuals)
    spec <- cumsum(!actuals)/sum(!actuals)
    lines(1 - spec, sens, type = "l", col = roccolors[i])
    abline(c(0,0),c(1,1))
    auc <- sum(spec*diff(c(0, 1 - sens)))
    cat("AUC for ", method, ": ", auc)
  }
  legend(0.6,0.3, legend= methods, col=roccolors, lty=1, cex=0.8)
  grid()
}

NULL; expr-snp; expr-snp

ROC_plot(tag2s[1])

AUC for  mr.ash :  0.7517444AUC for  SUSIE :  0.7683526AUC for  TWAS :  0.7935694

NULL; snp-expr; expr-snp

ROC_plot(tag2s[2])

AUC for  mr.ash :  0.7463235AUC for  SUSIE :  0.7708886AUC for  TWAS :  0.7943248

lasso; expr-snp; expr-snp

ROC_plot(tag2s[3])

AUC for  mr.ash :  0.7250546AUC for  SUSIE :  0.7637822AUC for  TWAS :  0.799127

lasso; expr-snp; snp-expr

ROC_plot(tag2s[4])

AUC for  mr.ash :  0.6203323AUC for  SUSIE :  0.717332AUC for  TWAS :  0.7999876

lassosnp; expr-snp; expr-snp

ROC_plot(tag2s[5])

AUC for  mr.ash :  0.7248623AUC for  SUSIE :  0.7642336AUC for  TWAS :  0.7984251

sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Scientific Linux 7.4 (Nitrogen)

Matrix products: default
BLAS/LAPACK: /software/openblas-0.2.19-el7-x86_64/lib/libopenblas_haswellp-r0.2.19.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] plyr_1.8.6          tidyr_0.8.3         plotly_4.9.2.9000  
[4] ggplot2_3.3.1       data.table_1.12.7   mr.ash.alpha_0.1-34

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.4.6      highr_0.7         compiler_3.5.1   
 [4] pillar_1.4.4      later_0.7.5       git2r_0.26.1     
 [7] workflowr_1.6.0   tools_3.5.1       digest_0.6.25    
[10] viridisLite_0.3.0 jsonlite_1.6.1    evaluate_0.12    
[13] tibble_3.0.1      lifecycle_0.2.0   gtable_0.2.0     
[16] lattice_0.20-38   pkgconfig_2.0.2   rlang_0.4.6      
[19] Matrix_1.2-15     shiny_1.2.0       crosstalk_1.0.0  
[22] yaml_2.2.0        httr_1.4.1        withr_2.1.2      
[25] stringr_1.4.0     dplyr_1.0.0       knitr_1.20       
[28] htmlwidgets_1.3   generics_0.0.2    fs_1.3.1         
[31] vctrs_0.3.1       tidyselect_1.1.0  rprojroot_1.3-2  
[34] grid_3.5.1        glue_1.4.1        R6_2.3.0         
[37] rmarkdown_1.10    purrr_0.3.4       magrittr_1.5     
[40] backports_1.1.2   scales_1.0.0      promises_1.0.1   
[43] htmltools_0.3.6   ellipsis_0.3.1    xtable_1.8-3     
[46] mime_0.6          colorspace_1.3-2  httpuv_1.4.5     
[49] stringi_1.3.1     lazyeval_0.2.1    munsell_0.5.0    
[52] crayon_1.3.4