mix-SQP experiments

Profiling adaptive shrinkage computations with mix-SQP and IP solvers

An initial motivation for this work was our interest in applying a nonparametric Empirical Bayes method, “adaptive shrinkage,” to very large data sets. These Empirical Bayes computations involve three steps:

  1. likelihood computation,
  2. maximum-likelihood estimation of the mixture proportions, and
  3. posterior computation.

Here we profile the runtime of each of these steps, in which the second step (maximum-likelihood estimation) is solved using either an interior point method (MOSEK) or the SQP algorithm we have developed. Our initial solution used the commercial interior point solver MOSEK (called via the KWDual function in the REBayes R package), and here we show that the mix-SQP solver yields a large improvement in performance, to the point that the model fitting step is no longer the predominant step in terms of computational effort.

The adaptive shrinkage calculations from the ashr package are reproduced here in Julia.

Analysis setup

Before attempting to run this Julia code, make sure your computer is properly set up to run this code by following the setup instructions in the README of the git repository.

We begin by loading the Distributions, LowRankApprox, RCall and other packages, as well as some function definitions used in the code chunks below.

In [1]:
using Pkg
using Random
using Printf
using Distributions
using LowRankApprox
using LinearAlgebra
using SparseArrays
using RCall
include("../code/datasim.jl");
include("../code/likelihood.jl");
include("../code/mixSQP.jl");
include("../code/REBayes.jl");
include("../code/ash.jl");

Next, initialize the sequence of pseudorandom numbers.

In [2]:
Random.seed!(1);

Generate a small data set

Let's begin with a smaller example with 50,000 samples.

In [3]:
n = round(Int,5e4);
z = normtmixdatasim(n);
s = ones(n);

In this example, the standard errors (s) of the provided estimates (z) are assumed to all be 1.

Run adaptive shrinkage

Run the adaptive shrinkage method with model fitting implemented using the mix-SQP (method = "mixSQP") and MOSEK (method = "REBayes") algorithms. This is a trial run intended to first precompile the relevant functions.

In [4]:
gridmult    = 1.2;
out_rebayes = ash(z,s,gridmult = gridmult,method = "REBayes");
out_mixsqp  = ash(z,s,gridmult = gridmult,method = "mixSQP",lowrank = "qr");
┌ Warning: RCall.jl: Warning: package ‘REBayes’ was built under R version 3.4.4
│ Loading required package: Matrix
└ @ RCall /Users/pcarbo/.julia/packages/RCall/iojZI/src/io.jl:113

Now we re-run the adaptive shrinkage computations, this time recording the runtimes for comparison.

In [5]:
out_rebayes = ash(z,s,gridmult = gridmult,method = "REBayes");
out_mixsqp  = ash(z,s,gridmult = gridmult,method = "mixSQP",lowrank = "qr");

Now let's summarize the computational effort of adaptive shrinkage in this example data set:

In [6]:
@printf " solver likelihood fitting posterior\n"
@printf("  MOSEK %10.3f %7.3f %9.3f\n",out_rebayes["timing-likelihood"],
        out_rebayes["timing-fit"],out_rebayes["timing-posterior"])
@printf("mix-SQP %10.3f %7.3f %9.3f\n",out_mixsqp["timing-likelihood"],
        out_mixsqp["timing-fit"],out_mixsqp["timing-posterior"])
 solver likelihood fitting posterior
  MOSEK      0.201   2.198     0.065
mix-SQP      0.170   0.211     0.017

The likelihood and posterior computations are roughly the same with both optimization algorithms, which is expected because these steps are unchanged.

As for the model fitting step, we observe it is the slowest step in both cases. Still, the SQP approach is substantially faster than the interior point method (MOSEK), to the point that the model fitting step is comparable in runtime to the likelihood computation step.

Performance on a larger data set

Next, let's profile the same adaptive shrinkage computations in a larger data set with more samples, and with a finer-scale grid of normal densities.

In [7]:
n = round(Int,1e5)
z = normtmixdatasim(n);
s = ones(n);

Run adaptive shrinkage using mix-SQP (method = "mixSQP") and MOSEK (method = "REBayes") in the model-fitting step.

In [8]:
gridmult    = 1.05;
out_rebayes = ash(z,s,gridmult = gridmult,method = "REBayes");
out_mixsqp  = ash(z,s,gridmult = gridmult,method = "mixSQP",lowrank = "qr");

Next, summarize the computational effort of adaptive shrinkage on the larger data set.

In [9]:
@printf " solver likelihood fitting posterior\n"
@printf("  MOSEK %10.3f %7.3f %9.3f\n",out_rebayes["timing-likelihood"],
        out_rebayes["timing-fit"],out_rebayes["timing-posterior"])
@printf("mix-SQP %10.3f %7.3f %9.3f\n",out_mixsqp["timing-likelihood"],
        out_mixsqp["timing-fit"],out_mixsqp["timing-posterior"])
 solver likelihood fitting posterior
  MOSEK      1.031  19.527     0.684
mix-SQP      0.763   0.695     0.036

The result is similar to above, but more dramatic—in this example, the effort of model fitting with mix-SQP is comparable to the effort required to compute the likelihood matrix, whereas the model fitting using the interior point method (MOSEK) dominates the effort of the other steps.

In summary, this illustrates the benefit of the SQP method for implementing adaptive shrinkage, particularly for large data sets.

Session information

The section gives information about the computing environment used to generate the results contained in this notebook, including the version of Julia and the Julia packages.

In [10]:
Pkg.status("Distributions")
Pkg.status("LowRankApprox")
Pkg.status("RCall")
versioninfo()
    Status `~/.julia/environments/v1.1/Project.toml`
  [31c24e10] Distributions v0.21.1
    Status `~/.julia/environments/v1.1/Project.toml`
  [898213cb] LowRankApprox v0.2.3
    Status `~/.julia/environments/v1.1/Project.toml`
  [6f49c342] RCall v0.13.3
Julia Version 1.1.1
Commit 55e36cc (2019-05-16 04:10 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin15.6.0)
  CPU: Intel(R) Core(TM) i7-7567U CPU @ 3.50GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.1 (ORCJIT, skylake)

© 2017-2018 Youngseok Kim, Peter Carbonetto, Matthew Stephens & Mihai Anitescu.