vignettes/Limitation_SuSiE.Rmd
Limitation_SuSiE.Rmd
This vignette illustrates some of the limitations of SuSiE when analyzing molecular traits or traits/responses variables that are functions. We use a simple example where the effect of a SNP spans over multiple CpG. We start by fine-mapping the effect of the genotype on each CpG in a sequential fashion. Then, we show that SuSiE can output inconsistent results. Finally we show that fSuSiE can handle this problem and output much more interpretable results.
Let’s simulate a simple effect of a SNP on a set of CpG. The effect of the SNP is a cosine function that spans over 40 CpG. The effect of the SNP is 0 for the first 40 CpG. We simulate 100 observations of the effect of the SNP on the CpG. We then run SuSiE sequentially on each CpG.
effect <- 1.2*cos( (1:128)/128 * 3*pi )
effect[which(effect<0)]<- 0
effect[1:40]<- 0
plot( effect)
library(fsusieR)
library(susieR)
library(wavethresh)
#> Loading required package: MASS
#> WaveThresh: R wavelet software, release 4.7.2, installed
#> Copyright Guy Nason and others 1993-2022
#> Note: nlevels has been renamed to nlevelsWT
library(ggplot2)
set.seed(2)
data(N3finemapping)
X <- N3finemapping$X[1:100,]
set.seed(1)
obs <- list()
true_pos <- 700
for (i in 1:100 ){
obs[[i]] <- X[i,true_pos]*effect+ rnorm(length(effect))
}
y <- do.call(rbind, obs)
which(effect>0)
#> [1] 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83
#> [20] 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102
#> [39] 103 104 105 106
Once the data is simulated, we can run SuSiE on each CpG. For the sake of simplicity, we only run SuSiE with L=1 as we are aware that there is only one effect in this simulation. In the same line, we also limit our fine mapping to the region where the effect is non-zero (which we don’t know in practice).
susie_est <- list()
for (i in which( effect>0)){
susie_est [[i]]<- susie( X=X,y=y[,i], L=1)$sets
if(!is.null(susie_est[[i]]$cs$L1)){
print( i)
print( susie_est [[i]])
}
}
#> [1] 88
#> $cs
#> $cs$L1
#> [1] 835 850 852 853 856 859 862 867 869 870 872 875 876
#>
#>
#> $purity
#> min.abs.corr mean.abs.corr median.abs.corr
#> L1 0.8096749 0.9672284 1
#>
#> $cs_index
#> [1] 1
#>
#> $coverage
#> [1] 0.9857122
#>
#> $requested_coverage
#> [1] 0.95
#>
#> [1] 91
#> $cs
#> $cs$L1
#> [1] 700 848
#>
#>
#> $purity
#> min.abs.corr mean.abs.corr median.abs.corr
#> L1 0.7230199 0.7230199 0.7230199
#>
#> $cs_index
#> [1] 1
#>
#> $coverage
#> [1] 0.9535302
#>
#> $requested_coverage
#> [1] 0.95
SuSiE only detected effect in at CpGs number 88 and 91. Whereas the effect of the causak SNP (SNP number 700) spans between CpG 65 to CpG 106.Thus showing the lack of power of SuSiE in this scenario.
#> Warning: The `size` argument of `element_rect()` is deprecated as of ggplot2 3.4.0.
#> ℹ Please use the `linewidth` argument instead.
#> This warning is displayed once every 8 hours.
#> Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
#> generated.
tt <- susiF(y, X, L=1)
#> Warning in susiF(y, X, L = 1): Some of the columns of X are constants, we
#> removed 10 columns
#> [1] "Starting initialization"
#> [1] "Data transform"
#> [1] "Discarding 0 wavelet coefficients out of 128"
#> [1] "Data transform done"
#> [1] "Initializing prior"
#> [1] "Initialization done"
#> [1] "Fine mapping done, refining effect estimates using cylce spinning wavelet transform"
tt$cs
#> [[1]]
#> [1] 691
#ten column where removed so we need to remake the plot for the explanation (other t)
tt$pip <- rep( 0,length(tt$pip))
tt$pip[700] <-1
df_pip5 <-data.frame(x=1:length(tt$pip),
y=tt$pip)
df_est_f<- data.frame(y=c( tt$fitted_func[[1]],
tt$cred_band[[1]][1,],
tt$cred_band[[1]][2,]
),
x= rep( 1:128, 3),
type=factor(rep(1:3, each=128)))
df_est_f<- data.frame(y=c( tt$fitted_func[[1]],
tt$cred_band[[1]][1,],
tt$cred_band[[1]][2,]
),
x= rep( 1:128, 3),
type=factor(rep(1:3, each=128)))
P05 <- ggplot( )+
geom_point(df_effect, mapping=aes(x=x, y=y))+
geom_line( df_est_f[which(df_est_f$type==1),],
mapping=aes(x=x, y=y,linetype="longdash"),
col="lightblue3",
size=1.3)+
geom_line( df_est_f[which(df_est_f$type==2),],
mapping=aes(x=x, y=y,linetype="solid"),
col="lightblue3",
size=1.3)+
geom_line( df_est_f[which(df_est_f$type==3),],
mapping=aes(x=x, y=y,linetype="solid"),
col="lightblue3",
size=1.3)+
xlab("CpG")+
ylab("Effect ")+
theme_classic()+
geom_hline(yintercept = 0)+
theme(legend.position = "none",
panel.border = element_rect(colour = "black", fill=NA, size=1.2),
axis.text.y=element_blank(),
axis.ticks.y=element_blank(),
axis.text.x=element_blank(),
axis.ticks.x=element_blank())
#> Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
#> ℹ Please use `linewidth` instead.
#> This warning is displayed once every 8 hours.
#> Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
#> generated.
P51 <- ggplot( df_pip5, aes(x=x, y=y))+
geom_point()+
xlab("SNP index")+
ylab("PIP")+
theme_classic()+
geom_point(x= 700,y=tt$pip[700], col="lightblue3",size=3)+
theme( panel.border = element_rect(colour = "black", fill=NA, size=1.2),
axis.text.y=element_blank(),
axis.ticks.y=element_blank(),
axis.text.x=element_blank(),
axis.ticks.x=element_blank())+
geom_point(x= 700,y=tt$pip[700], col="red", shape=21,size=3)
grid.arrange(
P05,P51, ncol=2)