Implements methods for differential expression analysis using a topic model. These methods are motivated by gene expression studies, but could have other uses, such as identifying “key words” for topics.
An object of class “poisson_nmf_fit” or
“multinom_topic_model_fit”, or an n x k matrix of topic
proportions, where k is the number of topics. (The elements in each
row of this matrix should sum to 1.) If a Poisson NMF fit is
provided as input, the corresponding multinomial topic model fit is
automatically recovered using poisson2multinom.
The n x m counts matrix. It can be a sparse matrix (class
"dgCMatrix") or dense matrix (class "matrix").
A numeric vector of length n determining how the rates are
scaled in the Poisson models. See “Details” for guidance on
the choice of s.
Observations with this value are added to the counts matrix to stabilize maximum-likelihood estimation.
Method used to fit the Poisson models. Note that
fit.method = "glm" is the slowest method, and is mainly used
for testing.
Method used to stabilize the posterior mean
LFC estimates.  When shrink.method = "ash", the "adaptive
shrinkage" method implemented in the ‘ashr’ package is used to
compute posterior. When shrink.method = "none", no
stabilization is performed, and the “raw” posterior mean LFC
estimates are returned.
The log-fold change statistics returned:
lfc.stat = "vsnull", the log-fold change relative to the
null; lfc.stat = "le", the “least extreme” LFC; or a
topic name or number, in which case the LFC is defined relative to
the selected topic. See “Details” for more detailed
explanations of these choices.
A list of parameters controlling behaviour of the optimization and Monte Carlo algorithms. See ‘Details’.
When verbose = TRUE, progress information is
printed to the console.
When shrink.method = "ash", these are
additional arguments passed to ash.
A list with the following elements:
The log-fold change maximum-likelihood estimates.
Posterior mean LFC estimates.
Lower limits of estimated HPD intervals. Note that these are not updated by the shrinkage step.
Upper limits of estimated HPD intervals. Note that these are not updated by the shrinkage step.
z-scores for posterior mean LFC estimates.
-log10 two-tailed p-values obtained from the
  z-scores. When shrink.method = "ash", this is NA, and
  the s-values are returned instead (see below).
s-values returned by
  ash. s-values are analogous to q-values, but
  based on the local false sign rate; see Stephens (2016).
When shrink.method = "ash" only, this output
  contains the estimated local false sign rates.
When shrink.method = "ash" only, this output
  contains the ash return value (after removing
  the "data", "result" and "call" list
  elements).
Maximum-likelihood estimates of the Poisson model parameters.
Maximum-likelihood estimates of the null model parameters.
A vector containing the Metropolis acceptance ratios from each MCMC run.
The methods are based on the Poisson model
$$x_i ~ Poisson(\lambda_i),$$ in which the Poisson rates are
$$\lambda_i = \sum_{j=1}^k s_i l_{ij} f_j,$$ the \(l_{ik}\)
are the topic proportions and the \(f_j\) are the unknowns to be
estimated. This model is applied separately to each column of
X. When \(s_i\) (specified by input argument s) is
equal the total count in row i (this is the default), the Poisson
model will closely approximate a binomial model of the count data,
and the unknowns \(f_j\) will approximate binomial
probabilities. (The Poisson approximation to the binomial is most
accurate when the total counts rowSums(X) are large and the
unknowns \(f_j\) are small.) Other choices for s are
possible, and implement different normalization schemes.
To allow for some flexibility, de_analysis allows for the
log-fold change to be measured in several ways.
One option is to compare against the probability under the null
model: \(LFC(j) = log2(f_j/f_0)\), where \(f_0\) is the single
parameter in the Poisson model \(x_i ~ Poisson(\lambda_i)\) with
rates \(\lambda_i = s_i f_0\). This LFC definition is chosen with
lfc.stat = "vsnull".
Another option is to compare against a chosen topic, k: \(LFC(j)
= log2(f_j/f_k)\). By definition, \(LFC(k)\) is zero, and
statistics such as z-scores and p-values for topic k are set to
NA. This LFC definition is selected by setting
lfc.stat = k.
A final option (which is the default) computes the “least
extreme” LFC, defined as \(LFC(j) = log2(f_j/f_k)\) such that
\(k\) is the topic other than \(j\) that gives the ratio
\(f_j/f_k\) closest to 1. This option is chosen with
lfc.stat = "le".
We recommend setting shrink.method = "ash", which uses the
“adaptive shrinkage” method (Stephens, 2016) to improve
accuracy of the posterior mean estimates and z-scores. We follow
the settings used in lfcShrink from the ‘DESeq2’
package, with type = "ashr".
Note that all LFC statistics are defined using the base-2 logarithm following the conventioned used in differential expression analysis.
The control argument is a list in which any of the
following named components will override the default optimization
algorithm settings (as they are defined by
de_analysis_control_default):
numiterMaximum number of iterations performed in
  fitting the Poisson models. When fit.method = "glm", this is
  passed as argument maxit to the glm function.
minvalA small, positive number. All topic
  proportions less than this value and greater than 1 - minval
  are set to this value.
tolControls the convergence tolerance for fitting
  the Poisson models. When fit.method = "glm", this is passed
  as argument epsilon to function glm.
conf.levelThe size of the highest posterior density (HPD) intervals. Should be a number greater than 0 and less than 1.
nsNumber of Monte Carlo samples simulated by random-walk MCMC for estimating posterior LFC quantities.
rwThe standard deviation of the normal density used to propose new states in the random-walk MCMC.
epsA small, non-negative number added to the terms inside the logarithms to avoid computing logarithms of zero.
ncNumber of threads used in the multithreaded
  computations. This controls both (a) the number of RcppParallel
  threads used to fit the factors in the Poisson models, and (b)
  the number of cores used in mclapply for
  the MCMC simulation step. Note that mclapply relies on forking
  hence is not available on Windows; will return an error on
  Windows unless nc = 1. Also note that setting nc >
  1 copies the contents of memory nc times, which can lead
  to poor performance if the total resident memory required exceeds
  available physical memory.
nc.blasNumber of threads used in the numerical linear algebra library (e.g., OpenBLAS), if available. For best performance, we recommend setting this to 1 (i.e., no multithreading).
nsplitThe number of data splits used in the
  multithreaded computations (only relevant when nc > 1). More
  splits increase the granularity of the progress bar, but can also
  slow down the mutithreaded computations by introducing more
  overhead in the call to pblapply.
Stephens, M. (2016). False discovery rates: a new deal. Biostatistics 18(2), kxw041. doi:10.1093/biostatistics/kxw041
Zhu, A., Ibrahim, J. G. and Love, M. I. (2019). Heavy-tailed prior distributions for sequence count data: removing the noise and preserving large differences. Bioinformatics 35(12), 2084–2092.
# Perform a differential expression (DE) analysis using the previously
# fitted multinomial topic model. Note that the de_analysis call could
# take several minutes to complete.
# \donttest{
set.seed(1)
data(pbmc_facs)
de <- de_analysis(pbmc_facs$fit,pbmc_facs$counts)
#> Fitting 16791 Poisson models with k=6 using method="scd".
#> Computing log-fold change statistics from 16791 Poisson models with k=6.
#> 
#> [-----------------------------------------------------------------------]   0%
#> 
#> [-----------------------------------------------------------------------]   1%
#> 
#> [>----------------------------------------------------------------------]   1%
#> 
#> [>----------------------------------------------------------------------]   2%
#> 
#> [=>---------------------------------------------------------------------]   2%
#> 
#> [=>---------------------------------------------------------------------]   3%
#> 
#> [=>---------------------------------------------------------------------]   4%
#> 
#> [==>--------------------------------------------------------------------]   4%
#> 
#> [==>--------------------------------------------------------------------]   5%
#> 
#> [===>-------------------------------------------------------------------]   5%
#> 
#> [===>-------------------------------------------------------------------]   6%
#> 
#> [====>------------------------------------------------------------------]   6%
#> 
#> [====>------------------------------------------------------------------]   7%
#> 
#> [====>------------------------------------------------------------------]   8%
#> 
#> [=====>-----------------------------------------------------------------]   8%
#> 
#> [=====>-----------------------------------------------------------------]   9%
#> 
#> [======>----------------------------------------------------------------]   9%
#> 
#> [======>----------------------------------------------------------------]  10%
#> 
#> [======>----------------------------------------------------------------]  11%
#> 
#> [=======>---------------------------------------------------------------]  11%
#> 
#> [=======>---------------------------------------------------------------]  12%
#> 
#> [========>--------------------------------------------------------------]  12%
#> 
#> [========>--------------------------------------------------------------]  13%
#> 
#> [=========>-------------------------------------------------------------]  13%
#> 
#> [=========>-------------------------------------------------------------]  14%
#> 
#> [=========>-------------------------------------------------------------]  15%
#> 
#> [==========>------------------------------------------------------------]  15%
#> 
#> [==========>------------------------------------------------------------]  16%
#> 
#> [===========>-----------------------------------------------------------]  16%
#> 
#> [===========>-----------------------------------------------------------]  17%
#> 
#> [===========>-----------------------------------------------------------]  18%
#> 
#> [============>----------------------------------------------------------]  18%
#> 
#> [============>----------------------------------------------------------]  19%
#> 
#> [=============>---------------------------------------------------------]  19%
#> 
#> [=============>---------------------------------------------------------]  20%
#> 
#> [==============>--------------------------------------------------------]  20%
#> 
#> [==============>--------------------------------------------------------]  21%
#> 
#> [==============>--------------------------------------------------------]  22%
#> 
#> [===============>-------------------------------------------------------]  22%
#> 
#> [===============>-------------------------------------------------------]  23%
#> 
#> [================>------------------------------------------------------]  23%
#> 
#> [================>------------------------------------------------------]  24%
#> 
#> [================>------------------------------------------------------]  25%
#> 
#> [=================>-----------------------------------------------------]  25%
#> 
#> [=================>-----------------------------------------------------]  26%
#> 
#> [==================>----------------------------------------------------]  26%
#> 
#> [==================>----------------------------------------------------]  27%
#> 
#> [===================>---------------------------------------------------]  27%
#> 
#> [===================>---------------------------------------------------]  28%
#> 
#> [===================>---------------------------------------------------]  29%
#> 
#> [====================>--------------------------------------------------]  29%
#> 
#> [====================>--------------------------------------------------]  30%
#> 
#> [=====================>-------------------------------------------------]  30%
#> 
#> [=====================>-------------------------------------------------]  31%
#> 
#> [=====================>-------------------------------------------------]  32%
#> 
#> [======================>------------------------------------------------]  32%
#> 
#> [======================>------------------------------------------------]  33%
#> 
#> [=======================>-----------------------------------------------]  33%
#> 
#> [=======================>-----------------------------------------------]  34%
#> 
#> [=======================>-----------------------------------------------]  35%
#> 
#> [========================>----------------------------------------------]  35%
#> 
#> [========================>----------------------------------------------]  36%
#> 
#> [=========================>---------------------------------------------]  36%
#> 
#> [=========================>---------------------------------------------]  37%
#> 
#> [==========================>--------------------------------------------]  37%
#> 
#> [==========================>--------------------------------------------]  38%
#> 
#> [==========================>--------------------------------------------]  39%
#> 
#> [===========================>-------------------------------------------]  39%
#> 
#> [===========================>-------------------------------------------]  40%
#> 
#> [============================>------------------------------------------]  40%
#> 
#> [============================>------------------------------------------]  41%
#> 
#> [============================>------------------------------------------]  42%
#> 
#> [=============================>-----------------------------------------]  42%
#> 
#> [=============================>-----------------------------------------]  43%
#> 
#> [==============================>----------------------------------------]  43%
#> 
#> [==============================>----------------------------------------]  44%
#> 
#> [===============================>---------------------------------------]  44%
#> 
#> [===============================>---------------------------------------]  45%
#> 
#> [===============================>---------------------------------------]  46%
#> 
#> [================================>--------------------------------------]  46%
#> 
#> [================================>--------------------------------------]  47%
#> 
#> [=================================>-------------------------------------]  47%
#> 
#> [=================================>-------------------------------------]  48%
#> 
#> [=================================>-------------------------------------]  49%
#> 
#> [==================================>------------------------------------]  49%
#> 
#> [==================================>------------------------------------]  50%
#> 
#> [===================================>-----------------------------------]  50%
#> 
#> [===================================>-----------------------------------]  51%
#> 
#> [====================================>----------------------------------]  51%
#> 
#> [====================================>----------------------------------]  52%
#> 
#> [====================================>----------------------------------]  53%
#> 
#> [=====================================>---------------------------------]  53%
#> 
#> [=====================================>---------------------------------]  54%
#> 
#> [======================================>--------------------------------]  54%
#> 
#> [======================================>--------------------------------]  55%
#> 
#> [======================================>--------------------------------]  56%
#> 
#> [=======================================>-------------------------------]  56%
#> 
#> [=======================================>-------------------------------]  57%
#> 
#> [========================================>------------------------------]  57%
#> 
#> [========================================>------------------------------]  58%
#> 
#> [=========================================>-----------------------------]  58%
#> 
#> [=========================================>-----------------------------]  59%
#> 
#> [=========================================>-----------------------------]  60%
#> 
#> [==========================================>----------------------------]  60%
#> 
#> [==========================================>----------------------------]  61%
#> 
#> [===========================================>---------------------------]  61%
#> 
#> [===========================================>---------------------------]  62%
#> 
#> [===========================================>---------------------------]  63%
#> 
#> [============================================>--------------------------]  63%
#> 
#> [============================================>--------------------------]  64%
#> 
#> [=============================================>-------------------------]  64%
#> 
#> [=============================================>-------------------------]  65%
#> 
#> [==============================================>------------------------]  65%
#> 
#> [==============================================>------------------------]  66%
#> 
#> [==============================================>------------------------]  67%
#> 
#> [===============================================>-----------------------]  67%
#> 
#> [===============================================>-----------------------]  68%
#> 
#> [================================================>----------------------]  68%
#> 
#> [================================================>----------------------]  69%
#> 
#> [================================================>----------------------]  70%
#> 
#> [=================================================>---------------------]  70%
#> 
#> [=================================================>---------------------]  71%
#> 
#> [==================================================>--------------------]  71%
#> 
#> [==================================================>--------------------]  72%
#> 
#> [==================================================>--------------------]  73%
#> 
#> [===================================================>-------------------]  73%
#> 
#> [===================================================>-------------------]  74%
#> 
#> [====================================================>------------------]  74%
#> 
#> [====================================================>------------------]  75%
#> 
#> [=====================================================>-----------------]  75%
#> 
#> [=====================================================>-----------------]  76%
#> 
#> [=====================================================>-----------------]  77%
#> 
#> [======================================================>----------------]  77%
#> 
#> [======================================================>----------------]  78%
#> 
#> [=======================================================>---------------]  78%
#> 
#> [=======================================================>---------------]  79%
#> 
#> [=======================================================>---------------]  80%
#> 
#> [========================================================>--------------]  80%
#> 
#> [========================================================>--------------]  81%
#> 
#> [=========================================================>-------------]  81%
#> 
#> [=========================================================>-------------]  82%
#> 
#> [==========================================================>------------]  82%
#> 
#> [==========================================================>------------]  83%
#> 
#> [==========================================================>------------]  84%
#> 
#> [===========================================================>-----------]  84%
#> 
#> [===========================================================>-----------]  85%
#> 
#> [============================================================>----------]  85%
#> 
#> [============================================================>----------]  86%
#> 
#> [============================================================>----------]  87%
#> 
#> [=============================================================>---------]  87%
#> 
#> [=============================================================>---------]  88%
#> 
#> [==============================================================>--------]  88%
#> 
#> [==============================================================>--------]  89%
#> 
#> [===============================================================>-------]  89%
#> 
#> [===============================================================>-------]  90%
#> 
#> [===============================================================>-------]  91%
#> 
#> [================================================================>------]  91%
#> 
#> [================================================================>------]  92%
#> 
#> [=================================================================>-----]  92%
#> 
#> [=================================================================>-----]  93%
#> 
#> [=================================================================>-----]  94%
#> 
#> [==================================================================>----]  94%
#> 
#> [==================================================================>----]  95%
#> 
#> [===================================================================>---]  95%
#> 
#> [===================================================================>---]  96%
#> 
#> [====================================================================>--]  96%
#> 
#> [====================================================================>--]  97%
#> 
#> [====================================================================>--]  98%
#> 
#> [=====================================================================>-]  98%
#> 
#> [=====================================================================>-]  99%
#> 
#> [======================================================================>]  99%
#> 
#> [======================================================================>] 100%
#> 
#> [=======================================================================] 100%
#> 
                                                                              
#> 
#> Stabilizing posterior log-fold change estimates using adaptive shrinkage.
# Compile the DE analysis results for topic 4 into a table, and
# rank the genes by the posterior mean log-fold change, limiting to
# DE genes identified with low lfsr ("local false sign rate").
k <- 4
dat <- data.frame(postmean = de$postmean[,k],
                  z        = de$z[,k],
                  lfsr     = de$lfsr[,k])
rownames(dat) <- with(pbmc_facs$genes,paste(symbol,ensembl,sep = "_"))
dat <- subset(dat,lfsr < 0.01)
dat <- dat[order(dat$postmean,decreasing = TRUE),]
# Genes at the top of this ranking are genes that are much more
# highly expressed in the topic compared to other topics.
head(dat,n = 10)
#>                           postmean      z      lfsr
#> LINC00926_ENSG00000247982    6.476  8.406 7.977e-15
#> HLA-DOB_ENSG00000241106      5.701  4.479 2.617e-04
#> BANK1_ENSG00000153064        5.210  6.491 6.919e-09
#> BLK_ENSG00000136573          5.160  9.992 2.424e-21
#> CD40_ENSG00000101017         3.996  7.486 9.637e-12
#> P2RX5_ENSG00000083454        3.823 15.002 4.223e-48
#> RALGPS2_ENSG00000116191      3.668 12.199 1.110e-31
#> CD79B_ENSG00000007312        3.544 15.986 7.436e-55
#> TNFRSF13C_ENSG00000159958    3.425  3.009 8.031e-03
#> CD72_ENSG00000137101         3.224  6.465 2.382e-09
# The genes at the bottom of the ranking are genes that are much less
# expressed in the topic.
tail(dat,n = 10)
#>                          postmean       z      lfsr
#> KIAA0125_ENSG00000226777   -1.369 -10.313 1.110e-16
#> LILRA4_ENSG00000239961     -1.405 -16.907 1.110e-16
#> VAMP5_ENSG00000168899      -1.430  -5.527 3.007e-06
#> GSTP1_ENSG00000084207      -1.478 -13.814 0.000e+00
#> FGR_ENSG00000000938        -1.698 -17.268 0.000e+00
#> ARL4C_ENSG00000188042      -1.773  -4.868 3.511e-05
#> IGJ_ENSG00000132465        -1.781 -14.424 0.000e+00
#> LST1_ENSG00000204482       -1.798  -7.559 2.200e-12
#> ABRACL_ENSG00000146386     -1.853  -5.407 1.858e-06
#> HCK_ENSG00000101336        -2.153 -44.484 2.220e-16
# Create a volcano plot from the DE results for topic 4.
volcano_plot(de,k = k,ymax = 50,labels = pbmc_facs$genes$symbol)
# }