Fits a multinomial topic model to the count data, hiding most of the complexities of model fitting. The default optimization settings used here are intended to work well in a wide range of data sets, although some fine-tuning may be needed for more difficult cases. For full control, use fit_poisson_nmf.

fit_topic_model(
  X,
  k,
  numiter.main = 100,
  numiter.refine = 100,
  method.main = "em",
  method.refine = "scd",
  init.method = c("topicscore", "random"),
  control.init = list(),
  control.main = list(numiter = 4),
  control.refine = list(numiter = 4, extrapolate = TRUE),
  verbose = c("progressbar", "detailed", "none")
)

Arguments

X

The n x m matrix of counts; all entries of X should be non-negative. It can be a sparse matrix (class "dgCMatrix") or dense matrix (class "matrix").

k

The number of topics. Must be 2 or greater.

numiter.main

Number of updates of the factors and loadings to perform in the main model fitting step. Should be 1 or more.

numiter.refine

Number of updates of the factors and loadings to perform in the model refinement step.

method.main

The method to use for updating the factors and loadings in the main model fitting step. Passed as argument "method" to fit_poisson_nmf.

method.refine

The method to use for updating the factors in evthe model refinement step. Passed as argument "method" to fit_poisson_nmf.

init.method

The method used to initialize the factors and loadings. See init_poisson_nmf for details.

control.init

A list of parameters controlling the behaviour of the optimization and Topic SCORE method in the call to init_poisson_nmf. This is passed as argument "control" to init_poisson_nmf.

control.main

A list of parameters controlling the behaviour of the optimization in the main model fitting step. This is passed as argument "control" to fit_poisson_nmf.

control.refine

A list of parameters controlling the behaviour of the of the optimization algorithm in the model refinement step. This is passed as argument "control" to fit_poisson_nmf.

verbose

When verbose = "progressbar" or verbose = "detailed", information about the progress of the model fitting is printed to the console. See fit_poisson_nmf for more information.

Value

A multinomial topic model fit; see

poisson2multinom and fit_poisson_nmf

for details. Note that outputted likelihoods and deviances in

progress are evaluated with respect to the equivalent Poisson NMF model.

Details

With the default settings, the model fitting is accomplished in four steps: (1) initialize the Poisson NMF model fit (init_poisson_nmf); (2) perform the main model fitting step by running 100 EM updates using fit_poisson_nmf; (3) refine the fit by running 100 extrapolated SCD updates, again using fit_poisson_nmf; and (4) recover the multinomial topic model by calling poisson2multinom.

This two-stage fitting approach is based on our findings that the EM algorithm initially makes rapid progress toward a solution, but its convergence slows considerably as the iterates approach a solution. Close to a solution, we have found that other algorithms make much more rapid progress than EM; in particularly, we found that the extrapolated SCD updates usually performed best). For larger data sets, more updates in the main model fitting and refinement steps may be needed to obtain a good fit.

For larger data sets, more than 200 updates may be needed to obtain a good fit.

References

Dey, K. K., Hsiao, C. J. and Stephens, M. (2017). Visualizing the structure of RNA-seq expression data using grade of membership models. PLoS Genetics 13, e1006599.

Blei, D. M., Ng, A. Y. and Jordan, M. I. (2003). Latent Dirichlet allocation. Journal of Machine Learning Research 3, 993-1022.

Hofmann, T. (1999). Probabilistic latent semantic indexing. In Proceedings of the 22nd International ACM SIGIR Conference, 50-57. doi:10.1145/312624.312649

Examples

library(Matrix)
set.seed(1)
X <- simulate_count_data(80,100,k = 3,sparse = TRUE)$X
fit <- fit_topic_model(X,k = 3)
#> Initializing factors using Topic SCORE algorithm.
#> Initializing loadings by running 10 SCD updates.
#> Fitting rank-3 Poisson NMF to 80 x 100 sparse matrix.
#> Running at most 100 EM updates, without extrapolation (fastTopics 0.6-175).
#> 

#> [=======================>-----------------------------------------------]  34%
#> 

#> [========================>----------------------------------------------]  35%
#> 

#> [=========================>---------------------------------------------]  36%
#> 

#> [=========================>---------------------------------------------]  37%
#> 

#> [==========================>--------------------------------------------]  38%
#> 

#> [===========================>-------------------------------------------]  39%
#> 

#> [===========================>-------------------------------------------]  40%
#> 

#> [============================>------------------------------------------]  41%
#> 

#> [=============================>-----------------------------------------]  42%
#> 

#> [==============================>----------------------------------------]  43%
#> 

#> [==============================>----------------------------------------]  44%
#> 

#> [===============================>---------------------------------------]  45%
#> 

#> [================================>--------------------------------------]  46%
#> 

#> [================================>--------------------------------------]  47%
#> 

#> [=================================>-------------------------------------]  48%
#> 

#> [==================================>------------------------------------]  49%
#> 

#> [===================================>-----------------------------------]  50%
#> 

#> [===================================>-----------------------------------]  51%
#> 

#> [====================================>----------------------------------]  52%
#> 

#> [=====================================>---------------------------------]  53%
#> 

#> [=====================================>---------------------------------]  54%
#> 

#> [======================================>--------------------------------]  55%
#> 

#> [=======================================>-------------------------------]  56%
#> 

#> [=======================================>-------------------------------]  57%
#> 

#> [========================================>------------------------------]  58%
#> 

#> [=========================================>-----------------------------]  59%
#> 

#> [==========================================>----------------------------]  60%
#> 

#> [==========================================>----------------------------]  61%
#> 

#> [===========================================>---------------------------]  62%
#> 

#> [============================================>--------------------------]  63%
#> 

#> [============================================>--------------------------]  64%
#> 

#> [=============================================>-------------------------]  65%
#> 

#> [==============================================>------------------------]  66%
#> 

#> [===============================================>-----------------------]  67%
#> 

#> [===============================================>-----------------------]  68%
#> 

#> [================================================>----------------------]  69%
#> 

#> [=================================================>---------------------]  70%
#> 

#> [=================================================>---------------------]  71%
#> 

#> [==================================================>--------------------]  72%
#> 

#> [===================================================>-------------------]  73%
#> 

#> [====================================================>------------------]  74%
#> 

#> [====================================================>------------------]  75%
#> 

#> [=====================================================>-----------------]  76%
#> 

#> [======================================================>----------------]  77%
#> 

#> [======================================================>----------------]  78%
#> 

#> [=======================================================>---------------]  79%
#> 

#> [========================================================>--------------]  80%
#> 

#> [=========================================================>-------------]  81%
#> 

#> [=========================================================>-------------]  82%
#> 

#> [==========================================================>------------]  83%
#> 

#> [===========================================================>-----------]  84%
#> 

#> [===========================================================>-----------]  85%
#> 

#> [============================================================>----------]  86%
#> 

#> [=============================================================>---------]  87%
#> 

#> [=============================================================>---------]  88%
#> 

#> [==============================================================>--------]  89%
#> 

#> [===============================================================>-------]  90%
#> 

#> [================================================================>------]  91%
#> 

#> [================================================================>------]  92%
#> 

#> [=================================================================>-----]  93%
#> 

#> [==================================================================>----]  94%
#> 

#> [==================================================================>----]  95%
#> 

#> [===================================================================>---]  96%
#> 

#> [====================================================================>--]  97%
#> 

#> [=====================================================================>-]  98%
#> 

#> [=====================================================================>-]  99%
#> 

#> [=======================================================================] 100%
#> 
                                                                              
#> 

#> Refining model fit.
#> Fitting rank-3 Poisson NMF to 80 x 100 sparse matrix.
#> Running at most 100 SCD updates, with extrapolation (fastTopics 0.6-175).
#> 

#> [===================>---------------------------------------------------]  28%
#> 

#> [====================>--------------------------------------------------]  29%
#> 

#> [====================>--------------------------------------------------]  30%
#> 

#> [=====================>-------------------------------------------------]  31%
#> 

#> [======================>------------------------------------------------]  32%
#> 

#> [======================>------------------------------------------------]  33%
#> 

#> [=======================>-----------------------------------------------]  34%
#> 

#> [========================>----------------------------------------------]  35%
#> 

#> [=========================>---------------------------------------------]  36%
#> 

#> [=========================>---------------------------------------------]  37%
#> 

#> [==========================>--------------------------------------------]  38%
#> 

#> [===========================>-------------------------------------------]  39%
#> 

#> [===========================>-------------------------------------------]  40%
#> 

#> [============================>------------------------------------------]  41%
#> 

#> [=============================>-----------------------------------------]  42%
#> 

#> [==============================>----------------------------------------]  43%
#> 

#> [==============================>----------------------------------------]  44%
#> 

#> [===============================>---------------------------------------]  45%
#> 

#> [================================>--------------------------------------]  46%
#> 

#> [================================>--------------------------------------]  47%
#> 

#> [=================================>-------------------------------------]  48%
#> 

#> [==================================>------------------------------------]  49%
#> 

#> [===================================>-----------------------------------]  50%
#> 

#> [===================================>-----------------------------------]  51%
#> 

#> [====================================>----------------------------------]  52%
#> 

#> [=====================================>---------------------------------]  53%
#> 

#> [=====================================>---------------------------------]  54%
#> 

#> [======================================>--------------------------------]  55%
#> 

#> [=======================================>-------------------------------]  56%
#> 

#> [=======================================>-------------------------------]  57%
#> 

#> [========================================>------------------------------]  58%
#> 

#> [=========================================>-----------------------------]  59%
#> 

#> [==========================================>----------------------------]  60%
#> 

#> [==========================================>----------------------------]  61%
#> 

#> [===========================================>---------------------------]  62%
#> 

#> [============================================>--------------------------]  63%
#> 

#> [============================================>--------------------------]  64%
#> 

#> [=============================================>-------------------------]  65%
#> 

#> [==============================================>------------------------]  66%
#> 

#> [===============================================>-----------------------]  67%
#> 

#> [===============================================>-----------------------]  68%
#> 

#> [================================================>----------------------]  69%
#> 

#> [=================================================>---------------------]  70%
#> 

#> [=================================================>---------------------]  71%
#> 

#> [==================================================>--------------------]  72%
#> 

#> [===================================================>-------------------]  73%
#> 

#> [====================================================>------------------]  74%
#> 

#> [====================================================>------------------]  75%
#> 

#> [=====================================================>-----------------]  76%
#> 

#> [======================================================>----------------]  77%
#> 

#> [======================================================>----------------]  78%
#> 

#> [=======================================================>---------------]  79%
#> 

#> [========================================================>--------------]  80%
#> 

#> [=========================================================>-------------]  81%
#> 

#> [=========================================================>-------------]  82%
#> 

#> [==========================================================>------------]  83%
#> 

#> [===========================================================>-----------]  84%
#> 

#> [===========================================================>-----------]  85%
#> 

#> [============================================================>----------]  86%
#> 

#> [=============================================================>---------]  87%
#> 

#> [=============================================================>---------]  88%
#> 

#> [==============================================================>--------]  89%
#> 

#> [===============================================================>-------]  90%
#> 

#> [================================================================>------]  91%
#> 

#> [================================================================>------]  92%
#> 

#> [=================================================================>-----]  93%
#> 

#> [==================================================================>----]  94%
#> 

#> [==================================================================>----]  95%
#> 

#> [===================================================================>---]  96%
#> 

#> [====================================================================>--]  97%
#> 

#> [=====================================================================>-]  98%
#> 

#> [=====================================================================>-]  99%
#> 

#> [=======================================================================] 100%
#> 
                                                                              
#> 

print(summary(fit))
#> Model overview:
#>   Number of data rows, n: 80
#>   Number of data cols, m: 100
#>   Rank/number of topics, k: 3
#> Evaluation of model fit (200 updates performed):
#>   Poisson NMF log-likelihood: -8.311490191230e+03
#>   Multinomial topic model log-likelihood: -8.068320081877e+03
#>   Poisson NMF deviance: +7.710268688987e+03
#>   Max KKT residual: +2.984697e-07
#> Set show.size.factors = TRUE, show.mixprops = TRUE and/or show.topic.reps = TRUE in print(...) for more information