Last updated: 2017-01-15
Code version: 94a3347615361216b39b8a9333bfa8354794e0ff
Here we compare the likelihoods achieved by the newly implemented interior point method against the EM algorithm.
Note: the last code chunk (“compare_lfsr”) may take several minutes to render in R or RStudio because it uses ggplot to plot more than a million data points. Also, it it may not be possible to run the last code chunk interactively in RStudio.
First, we load the necessary libraries.
library(ggplot2)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(reshape2)
The ‘dsc-opt’ script simply runs both methods (IP and EM) on the same data sets, and compares the difference in the log-likelihoods. We can see that in >90% of cases the difference is negligible. Then there is a tail of cases where the IP log-likelihood is higher - presumably where the EM doesn’t really converge so well.
load("../output/dsc-opt-files/dsc_opt.RData")
plot(ecdf(dsc_opt$res$diff1),
main = "ECDF of log-likelihood difference (IP - EM)")
However, what we really care about is inference quantities, such as the lfsr, rather than the log-likelihood. Here we compare the lfsr for the simulations where we ran both the IP method and EM method (nocxx) in dsc-shrink.
load("../output/dsc-shrink-files/res.RData")
# Select out the lfsr estimates for each method.
df1 = res$lfsr %>% filter(method %in% c("ash.hu.nocxx")) %>%
select(-user.self,-sys.self,-elapsed,-user.child,-sys.child) %>%
melt(id.vars = c("method","scenario","seed",".id"),
value.name = "lfsr.nocxx")
df2 = res$lfsr %>% filter(method %in% c("ash.hu")) %>%
select(-user.self,-sys.self,-elapsed,-user.child,-sys.child) %>%
melt(id.vars = c("method","scenario","seed",".id"),
value.name = "lfsr.IP")
df = inner_join(df1,df2,by = c("scenario","seed","variable"))
df = transform(df,scenario = factor(scenario))
print(ggplot(df,aes(lfsr.nocxx,lfsr.IP)) +
geom_point(shape = 1) +
facet_wrap(~ scenario,nrow = 2) +
geom_abline(colour = "black") +
xlab("EM algorithm (no cxx)") +
ylab("IP method"))
sessionInfo()
R version 3.3.2 (2016-10-31)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: macOS Sierra 10.12.2
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] knitr_1.15.1 reshape2_1.4.2 dplyr_0.5.0 ggplot2_2.2.0
loaded via a namespace (and not attached):
[1] Rcpp_0.12.8 magrittr_1.5 munsell_0.4.3 xtable_1.8-2
[5] colorspace_1.2-6 R6_2.2.0 stringr_1.1.0 plyr_1.8.4
[9] tools_3.3.2 grid_3.3.2 gtable_0.2.0 DBI_0.5-1
[13] dscr_0.1.1 htmltools_0.3.5 yaml_2.1.14 lazyeval_0.2.0
[17] rprojroot_1.1 digest_0.6.10 assertthat_0.1 tibble_1.2
[21] shiny_0.14.2 mime_0.5 evaluate_0.10 rmarkdown_1.3
[25] labeling_0.3 stringi_1.1.2 scales_0.4.1 backports_1.0.4
[29] httpuv_1.3.3