Last updated: 2017-01-15

Code version: 94a3347615361216b39b8a9333bfa8354794e0ff

Here we compare the likelihoods achieved by the newly implemented interior point method against the EM algorithm.

Note: the last code chunk (“compare_lfsr”) may take several minutes to render in R or RStudio because it uses ggplot to plot more than a million data points. Also, it it may not be possible to run the last code chunk interactively in RStudio.

First, we load the necessary libraries.

library(ggplot2)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(reshape2)

Log-likelihood comparison

The ‘dsc-opt’ script simply runs both methods (IP and EM) on the same data sets, and compares the difference in the log-likelihoods. We can see that in >90% of cases the difference is negligible. Then there is a tail of cases where the IP log-likelihood is higher - presumably where the EM doesn’t really converge so well.

load("../output/dsc-opt-files/dsc_opt.RData")
plot(ecdf(dsc_opt$res$diff1),
     main = "ECDF of log-likelihood difference (IP - EM)")

lfsr comparison

However, what we really care about is inference quantities, such as the lfsr, rather than the log-likelihood. Here we compare the lfsr for the simulations where we ran both the IP method and EM method (nocxx) in dsc-shrink.

load("../output/dsc-shrink-files/res.RData")

# Select out the lfsr estimates for each method.
df1 = res$lfsr %>% filter(method %in% c("ash.hu.nocxx")) %>% 
        select(-user.self,-sys.self,-elapsed,-user.child,-sys.child) %>%
        melt(id.vars = c("method","scenario","seed",".id"),
             value.name = "lfsr.nocxx")
df2 = res$lfsr %>% filter(method %in% c("ash.hu")) %>%
        select(-user.self,-sys.self,-elapsed,-user.child,-sys.child) %>%
        melt(id.vars = c("method","scenario","seed",".id"),
             value.name = "lfsr.IP")
df = inner_join(df1,df2,by = c("scenario","seed","variable"))
df = transform(df,scenario = factor(scenario))
print(ggplot(df,aes(lfsr.nocxx,lfsr.IP)) + 
      geom_point(shape = 1) +
      facet_wrap(~ scenario,nrow = 2) +
      geom_abline(colour = "black") +
      xlab("EM algorithm (no cxx)") +
      ylab("IP method"))

Session information

sessionInfo()
R version 3.3.2 (2016-10-31)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: macOS Sierra 10.12.2

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] knitr_1.15.1   reshape2_1.4.2 dplyr_0.5.0    ggplot2_2.2.0 

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.8      magrittr_1.5     munsell_0.4.3    xtable_1.8-2    
 [5] colorspace_1.2-6 R6_2.2.0         stringr_1.1.0    plyr_1.8.4      
 [9] tools_3.3.2      grid_3.3.2       gtable_0.2.0     DBI_0.5-1       
[13] dscr_0.1.1       htmltools_0.3.5  yaml_2.1.14      lazyeval_0.2.0  
[17] rprojroot_1.1    digest_0.6.10    assertthat_0.1   tibble_1.2      
[21] shiny_0.14.2     mime_0.5         evaluate_0.10    rmarkdown_1.3   
[25] labeling_0.3     stringi_1.1.2    scales_0.4.1     backports_1.0.4 
[29] httpuv_1.3.3