mix-SQP experiments

Comparing performance of active-set and interior-point methods for solving quadratic subproblem inside SQP

Here we explore the use of active-set and interior-point methods (the latter implemented by the commercial software MOSEK) for solving the quadratic subproblem inside SQP.

Analysis setup

Before attempting to run this Julia code, make sure your computer is properly set up to run this code by following the setup instructions in the README of the git repository.

We begin by loading the Distributions, Mosek and JuMP Julia packages, as well as some function definitions used in the code chunks below.

In [1]:
using Pkg
using Random
using Printf
using Distributions
using SparseArrays
using LinearAlgebra
using Mosek
using JuMP
include("../code/datasim.jl");
include("../code/likelihood.jl");
include("../code/mixSQP.jl");

Next, initialize the sequence of pseudorandom numbers.

In [2]:
Random.seed!(1);

Generate a small data set

Let's begin with a smaller example with 50,000 samples.

In [3]:
n = round(Int,5e4);
z = normtmixdatasim(n);

Compute the likelihood matrix

Compute the $n \times k$ likelihood matrix for a mixture of zero-centered normals, with $k = 20$. Note that the rows of the likelihood matrix are normalized by default.

In [4]:
sd = autoselectmixsd(z,nv = 20);
L  = normlikmatrix(z,sd = sd);
size(L)
Out[4]:
(50000, 20)

Fit mixture model using SQP algorithm

First we run the mix-SQP algorithm a couple of times to precompile the relevant functions.

In [5]:
out = mixSQP(L,qpsubprob = "activeset",lowrank = "none",maxiter = 10,verbose = false);
out = mixSQP(L,qpsubprob = "mosek",lowrank = "none",maxiter = 10,verbose = false);

Fit the model using the SQP algorithm, with an active-set method to find the solution to the quadratic program at each SQP iteration.

In [6]:
@time out1 = mixSQP(L,qpsubprob = "activeset",lowrank = "none");
Running SQP algorithm with the following settings:
- 50000 x 20 data matrix
- convergence tolerance  = 1.00e-08
- zero threshold         = 1.00e-06
- Exact derivative computation (partial QR not used).
iter      objective -min(g+1)  #nz #qp #ls
   1 3.03733620e+04 +6.30e-01   20
   2 2.09533189e+04 +5.80e+04    1  27   1
   3 1.28079423e+04 +2.01e+04    3  54   1
   4 1.11142170e+04 +8.72e+03    3  34   1
   5 1.09365390e+04 +4.16e+03    3   9   1
   6 1.07220696e+04 +2.01e+03    3  18   1
   7 1.05949242e+04 +1.03e+03    4  12   1
   8 1.05173539e+04 +5.08e+02    4   2   1
   9 1.03017484e+04 +2.50e+02    4  20   1
  10 1.01824445e+04 +1.28e+02    4   2   1
  11 1.01286239e+04 +6.46e+01    4   4   1
  12 1.00404507e+04 +3.20e+01    4   7   1
  13 9.89744142e+03 +1.61e+01    4  16   1
  14 9.85084743e+03 +8.00e+00    4   5   1
  15 9.81505659e+03 +3.85e+00    5   4   1
  16 9.77438543e+03 +1.81e+00    5  11   1
  17 9.75247900e+03 +8.28e-01    5   2   1
  18 9.74083776e+03 +3.51e-01    6   4   1
  19 9.73161458e+03 +1.06e-01    6  11   1
  20 9.72785163e+03 +2.17e-02    6   2   1
  21 9.72698842e+03 +1.59e-03    6   2   1
  22 9.72691639e+03 +1.04e-06    6   2   1
  23 9.72691593e+03 -3.18e-09    6   2   1
  0.389242 seconds (248.85 k allocations: 443.474 MiB, 9.25% gc time)

Next fit the model again using the same SQP algorithm, with the active-set method replaced by MOSEK.

In [7]:
@time out2 = mixSQP(L,qpsubprob = "mosek",lowrank = "none");
Running SQP algorithm with the following settings:
- 50000 x 20 data matrix
- convergence tolerance  = 1.00e-08
- zero threshold         = 1.00e-06
- Exact derivative computation (partial QR not used).
iter      objective -min(g+1)  #nz #qp #ls
   1 3.03733620e+04 +6.30e-01   20
   2 2.35424547e+04 +3.74e-01   20   0   2
   3 1.61319423e+04 +6.27e+04    7   0   1
   4 1.16898684e+04 +1.99e+04    9   0   1
   5 1.10954042e+04 +8.33e+03   10   0   1
   6 1.09384616e+04 +3.96e+03    9   0   1
   7 1.07182611e+04 +1.91e+03   10   0   1
   8 1.05921969e+04 +9.72e+02    9   0   1
   9 1.05108500e+04 +4.85e+02   10   0   1
  10 1.02904495e+04 +2.40e+02   11   0   1
  11 1.01780677e+04 +1.23e+02   11   0   1
  12 1.01226704e+04 +6.19e+01   12   0   1
  13 1.00302472e+04 +3.06e+01   12   0   1
  14 9.89082213e+03 +1.54e+01   13   0   1
  15 9.84835543e+03 +7.66e+00   12   0   1
  16 9.81161447e+03 +3.68e+00    6   0   1
  17 9.77177429e+03 +1.73e+00   13   0   1
  18 9.75149444e+03 +7.87e-01   14   0   1
  19 9.74004693e+03 +3.29e-01   14   0   1
  20 9.73116124e+03 +9.73e-02    7   0   1
  21 9.72773184e+03 +1.90e-02    6   0   1
  22 9.72697209e+03 +1.23e-03    6   0   1
  23 9.72691620e+03 +2.68e-07    6   0   1
  24 9.72691593e+03 -2.98e-09    6   0   1
  0.334137 seconds (16.08 k allocations: 449.188 MiB, 11.10% gc time)

Both runs converged to a solution in a small number of iterations. The solutions are very similar:

In [8]:
maximum(abs.(out1["x"] - out2["x"]))
Out[8]:
1.3978790775089067e-7

Solving the quadratic programs is only a small fraction of the total effort. Nonetheless, the effort of the active-set computations is much less than solving the quadratic subproblems with MOSEK.

In [9]:
@printf "Total runtime of active set method:     %0.3f s.\n" sum(out1["qptiming"])
@printf "Total runtime of interior point method: %0.3f s.\n" sum(out2["qptiming"])
Total runtime of active set method:     0.002 s.
Total runtime of interior point method: 0.047 s.

Comparison with a larger data set

Let's now explore the accuracy and runtime of the active-set and MOSEK solvers in a larger data set.

In [10]:
z = normtmixdatasim(round(Int,1e5));

As before, we compute the $n \times k$ conditional likelihood matrix for a mixture of zero-centered normals. This time, we use a finer grid of $k = 40$ normal densities to compute this matrix.

In [11]:
k  = 40;
sd = autoselectmixsd(z,nv = k);
L  = normlikmatrix(z,sd = sd);
size(L)
Out[11]:
(100000, 40)

Now we fit the model using the two variants of the SQP algorithm.

In [12]:
@time out1 = mixSQP(L,qpsubprob = "activeset",lowrank = "none",verbose = false);
@time out2 = mixSQP(L,qpsubprob = "mosek",lowrank = "none",verbose = false);
  1.000902 seconds (50.91 k allocations: 1.532 GiB, 22.63% gc time)
  0.950645 seconds (2.97 k allocations: 1.525 GiB, 14.49% gc time)

The first SQP run with the active-set method is only slightly faster. And, as before, the solutions are very similar:

In [13]:
maximum(abs.(out1["x"] - out2["x"]))
Out[13]:
4.531836922726025e-5

The amount of time spent solving the quadratic programs is again only a small proportion of the total:

In [14]:
@printf "Total runtime of active set method:     %0.3f s.\n" sum(out1["qptiming"])
@printf "Total runtime of interior point method: %0.3f s.\n" sum(out2["qptiming"])
Total runtime of active set method:     0.005 s.
Total runtime of interior point method: 0.060 s.

Therefore, although the active-set method is much faster than MOSEK, the overall impact on performance is relatively small.

Session information

The section gives information about the computing environment used to generate the results contained in this notebook, including the version of Julia, and the versions of the Julia packages used here.

In [15]:
Pkg.status("Distributions");
Pkg.status("Mosek");
Pkg.status("JuMP");
versioninfo()
    Status `~/.julia/environments/v1.1/Project.toml`
  [31c24e10] Distributions v0.21.1
    Status `~/.julia/environments/v1.1/Project.toml`
  [6405355b] Mosek v1.0.4
    Status `~/.julia/environments/v1.1/Project.toml`
  [4076af6c] JuMP v0.19.2
Julia Version 1.1.1
Commit 55e36cc (2019-05-16 04:10 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin15.6.0)
  CPU: Intel(R) Core(TM) i7-7567U CPU @ 3.50GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.1 (ORCJIT, skylake)

© 2017-2018 Youngseok Kim, Peter Carbonetto, Matthew Stephens & Mihai Anitescu.