mix-SQP experiments

Illustration of mix-SQP solver applied to a small data set, and a large one

Analysis setup

Before attempting to run this Julia code, make sure your computer is properly set up to run this code by following the setup instructions in the README of the git repository.

We begin by loading the Distributions, LowRankApprox and other packages, as well as some function definitions used in the code chunks below.

In [1]:
using Pkg
using Printf
using Random
using Distributions
using LinearAlgebra
using SparseArrays
using LowRankApprox
include("../code/datasim.jl");
include("../code/likelihood.jl");
include("../code/mixSQP.jl");

Next, initialize the sequence of pseudorandom numbers.

In [2]:
Random.seed!(1);

Generate a small data set

Let's start with a smaller example with 50,000 samples.

In [3]:
z = normtmixdatasim(round(Int,5e4));

Compute the likelihood matrix

Compute the $n \times k$ likelihood matrix for a mixture of zero-centered normals, with $k = 20$. Note that the rows of the likelihood matrix are normalized by default.

In [4]:
sd = autoselectmixsd(z,nv = 20);
L  = normlikmatrix(z,sd = sd);
size(L)
Out[4]:
(50000, 20)

Fit mixture model using SQP algorithm

First we run the mix-SQP algorithm once to precompile the function.

In [5]:
out = mixSQP(L,maxiter = 10,verbose = false);

Observe that only a small number of iterations is needed to converge to the solution of the constrained optimization problem.

In [6]:
k   = size(L,2);
x0  = ones(k)/k;
@time out = mixSQP(L,x = x0);
Running SQP algorithm with the following settings:
- 50000 x 20 data matrix
- convergence tolerance  = 1.00e-08
- zero threshold         = 1.00e-06
- partial SVD tolerance  = 1.00e-08
- partial SVD max. error = 3.30e-07
iter      objective -min(g+1)  #nz #qp #ls
   1 3.03733620e+04 +6.30e-01   20
   2 2.09533189e+04 +5.80e+04    1   0   0
   3 1.28027913e+04 +2.02e+04    4   0   0
   4 1.11155343e+04 +8.75e+03    4   0   0
   5 1.09390485e+04 +4.16e+03    4   0   0
   6 1.07197711e+04 +2.05e+03    3   0   0
   7 1.05963767e+04 +1.05e+03    3   0   0
   8 1.05212428e+04 +5.21e+02    4   0   0
   9 1.03089069e+04 +2.57e+02    4   0   0
  10 1.01851327e+04 +1.31e+02    4   0   0
  11 1.01318618e+04 +6.64e+01    4   0   0
  12 1.00461045e+04 +3.29e+01    4   0   0
  13 9.90166640e+03 +1.65e+01    5   0   0
  14 9.85230672e+03 +8.21e+00    4   0   0
  15 9.81701206e+03 +3.95e+00    5   0   0
  16 9.77596268e+03 +1.86e+00    5   0   0
  17 9.75307635e+03 +8.53e-01    5   0   0
  18 9.74130102e+03 +3.62e-01    6   0   0
  19 9.73189243e+03 +1.11e-01    6   0   0
  20 9.72792839e+03 +2.34e-02    6   0   0
  21 9.72699979e+03 +1.84e-03    6   0   0
  22 9.72691654e+03 +1.68e-06    6   0   0
  23 9.72691593e+03 -3.18e-09    6   0   0
  0.861777 seconds (1.12 M allocations: 335.503 MiB, 6.32% gc time)

Generate a larger data set

Next, let's see what happens when we use the SQP algorithm to fit a mixture model to a much larger data set.

In [7]:
Random.seed!(1);
z = normtmixdatasim(round(Int,1e5));

Compute the likelihood matrix

As before, we compute the $n \times k$ likelihood matrix for a mixture of zero-centered normals. This time, we use a finer grid of $k = 40$ normal densities.

In [8]:
sd = autoselectmixsd(z,nv = 40);
L  = normlikmatrix(z,sd = sd);
size(L)
Out[8]:
(100000, 40)

Fit mixture model using SQP algorithm

Even on this much larger data set, only a small number of iterations is needed to compute the solution.

In [9]:
k   = size(L,2);
x0  = ones(k)/k;
@time out = mixSQP(L,x = x0);
Running SQP algorithm with the following settings:
- 100000 x 40 data matrix
- convergence tolerance  = 1.00e-08
- zero threshold         = 1.00e-06
- partial SVD tolerance  = 1.00e-08
- partial SVD max. error = 1.35e-06
iter      objective -min(g+1)  #nz #qp #ls
   1 6.21694207e+04 +6.60e-01   40
   2 4.83207933e+04 +3.93e-01   40   0   0
   3 3.75596771e+04 +2.25e-01   40   0   0
   4 2.99843226e+04 +1.23e-01   40   0   0
   5 2.13835273e+04 +7.20e+03    3   0   0
   6 1.98674398e+04 +2.53e+03    3   0   0
   7 1.97771147e+04 +1.12e+03    3   0   0
   8 1.97191264e+04 +5.21e+02    3   0   0
   9 1.96310713e+04 +2.54e+02    3   0   0
  10 1.95995661e+04 +1.24e+02    4   0   0
  11 1.95641269e+04 +6.15e+01    5   0   0
  12 1.95418491e+04 +3.09e+01    6   0   0
  13 1.95188457e+04 +1.53e+01    6   0   0
  14 1.95042314e+04 +7.72e+00    5   0   0
  15 1.94888368e+04 +3.80e+00    6   0   0
  16 1.94788367e+04 +1.87e+00    6   0   0
  17 1.94701494e+04 +8.56e-01    6   0   0
  18 1.94655306e+04 +3.33e-01    6   0   0
  19 1.94621375e+04 +9.23e-02    7   0   0
  20 1.94610863e+04 +1.37e-02    6   0   0
  21 1.94608951e+04 +5.45e-04    6   0   0
  22 1.94608878e+04 -3.13e-09    6   0   0
  0.549351 seconds (27.77 k allocations: 634.877 MiB, 13.71% gc time)

With no low-rank approximation (lowrank = "none"), the algorithm still converges rapidly, even when using a very small correction factor eps = 1e-12.

In [10]:
@time out = mixSQP(L,x = x0,lowrank = "none",eps = 1e-12);
Running SQP algorithm with the following settings:
- 100000 x 40 data matrix
- convergence tolerance  = 1.00e-08
- zero threshold         = 1.00e-06
- Exact derivative computation (partial QR not used).
iter      objective -min(g+1)  #nz #qp #ls
   1 6.21694226e+04 +6.60e-01   40
   2 4.35976361e+04 +2.74e+08    2   0   0
   3 2.63754248e+04 +9.42e+07    3   0   0
   4 2.26716550e+04 +4.11e+07    3   0   0
   5 2.22369707e+04 +1.93e+07    3   0   0
   6 2.20822493e+04 +9.86e+06    3   0   0
   7 2.17850756e+04 +4.96e+06    3   0   0
   8 2.15686849e+04 +2.48e+06    2   0   0
   9 2.13181589e+04 +1.28e+06    3   0   0
  10 2.11397368e+04 +6.40e+05    2   0   0
  11 2.08812795e+04 +3.39e+05    3   0   0
  12 2.07941248e+04 +1.75e+05    3   0   0
  13 2.04915356e+04 +8.97e+04    3   0   0
  14 2.03990501e+04 +4.57e+04    3   0   0
  15 2.01797507e+04 +2.27e+04    3   0   0
  16 2.00663424e+04 +1.17e+04    3   0   0
  17 2.00068966e+04 +5.91e+03    3   0   0
  18 1.98269276e+04 +3.06e+03    3   0   0
  19 1.97740158e+04 +1.56e+03    3   0   0
  20 1.97130109e+04 +7.79e+02    3   0   0
  21 1.96283745e+04 +3.99e+02    3   0   0
  22 1.96010421e+04 +2.00e+02    4   0   0
  23 1.95654206e+04 +9.97e+01    4   0   0
  24 1.95444434e+04 +5.02e+01    6   0   0
  25 1.95216090e+04 +2.51e+01    6   0   0
  26 1.95072601e+04 +1.28e+01    5   0   0
  27 1.94925787e+04 +6.33e+00    5   0   0
  28 1.94817429e+04 +3.17e+00    6   0   0
  29 1.94728348e+04 +1.50e+00    6   0   0
  30 1.94676911e+04 +6.57e-01    6   0   0
  31 1.94635123e+04 +2.27e-01    6   0   0
  32 1.94615349e+04 +5.42e-02    6   0   0
  33 1.94609587e+04 +5.31e-03    6   0   0
  34 1.94608908e+04 +1.44e-04    6   0   0
  35 1.94608893e+04 +1.36e-07    6   0   0
  36 1.94608893e+04 -3.13e-13    6   0   0
  1.736786 seconds (81.75 k allocations: 2.396 GiB, 17.75% gc time)

Session information

The section gives information about the computing environment used to generate the results contained in this notebook, including the version of Julia, and the versions of the Julia packages used here.

In [11]:
Pkg.status("Distributions");
Pkg.status("LowRankApprox");
versioninfo();
    Status `~/.julia/environments/v1.1/Project.toml`
  [31c24e10] Distributions v0.21.1
    Status `~/.julia/environments/v1.1/Project.toml`
  [898213cb] LowRankApprox v0.2.3
Julia Version 1.1.1
Commit 55e36cc (2019-05-16 04:10 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin15.6.0)
  CPU: Intel(R) Core(TM) i7-7567U CPU @ 3.50GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.1 (ORCJIT, skylake)

© 2017-2018 Youngseok Kim, Peter Carbonetto, Matthew Stephens & Mihai Anitescu.