Implements methods for differential expression analysis using a topic model. These methods are motivated by gene expression studies, but could have other uses, such as identifying “key words” for topics.
An object of class “poisson_nmf_fit” or
“multinom_topic_model_fit”, or an n x k matrix of topic
proportions, where k is the number of topics. (The elements in each
row of this matrix should sum to 1.) If a Poisson NMF fit is
provided as input, the corresponding multinomial topic model fit is
automatically recovered using poisson2multinom
.
The n x m counts matrix. It can be a sparse matrix (class
"dgCMatrix"
) or dense matrix (class "matrix"
).
A numeric vector of length n determining how the rates are
scaled in the Poisson models. See “Details” for guidance on
the choice of s
.
Observations with this value are added to the counts matrix to stabilize maximum-likelihood estimation.
Method used to fit the Poisson models. Note that
fit.method = "glm"
is the slowest method, and is mainly used
for testing.
Method used to stabilize the posterior mean
LFC estimates. When shrink.method = "ash"
, the "adaptive
shrinkage" method implemented in the ‘ashr’ package is used to
compute posterior. When shrink.method = "none"
, no
stabilization is performed, and the “raw” posterior mean LFC
estimates are returned.
The log-fold change statistics returned:
lfc.stat = "vsnull"
, the log-fold change relative to the
null; lfc.stat = "le"
, the “least extreme” LFC; or a
topic name or number, in which case the LFC is defined relative to
the selected topic. See “Details” for more detailed
explanations of these choices.
A list of parameters controlling behaviour of the optimization and Monte Carlo algorithms. See ‘Details’.
When verbose = TRUE
, progress information is
printed to the console.
When shrink.method = "ash"
, these are
additional arguments passed to ash
.
A list with the following elements:
The log-fold change maximum-likelihood estimates.
Posterior mean LFC estimates.
Lower limits of estimated HPD intervals. Note that these are not updated by the shrinkage step.
Upper limits of estimated HPD intervals. Note that these are not updated by the shrinkage step.
z-scores for posterior mean LFC estimates.
-log10 two-tailed p-values obtained from the
z-scores. When shrink.method = "ash"
, this is NA
, and
the s-values are returned instead (see below).
s-values returned by
ash
. s-values are analogous to q-values, but
based on the local false sign rate; see Stephens (2016).
When shrink.method = "ash"
only, this output
contains the estimated local false sign rates.
When shrink.method = "ash"
only, this output
contains the ash
return value (after removing
the "data"
, "result"
and "call"
list
elements).
Maximum-likelihood estimates of the Poisson model parameters.
Maximum-likelihood estimates of the null model parameters.
A vector containing the Metropolis acceptance ratios from each MCMC run.
The methods are based on the Poisson model
$$x_i ~ Poisson(\lambda_i),$$ in which the Poisson rates are
$$\lambda_i = \sum_{j=1}^k s_i l_{ij} f_j,$$ the \(l_{ik}\)
are the topic proportions and the \(f_j\) are the unknowns to be
estimated. This model is applied separately to each column of
X
. When \(s_i\) (specified by input argument s
) is
equal the total count in row i (this is the default), the Poisson
model will closely approximate a binomial model of the count data,
and the unknowns \(f_j\) will approximate binomial
probabilities. (The Poisson approximation to the binomial is most
accurate when the total counts rowSums(X)
are large and the
unknowns \(f_j\) are small.) Other choices for s
are
possible, and implement different normalization schemes.
To allow for some flexibility, de_analysis
allows for the
log-fold change to be measured in several ways.
One option is to compare against the probability under the null
model: \(LFC(j) = log2(f_j/f_0)\), where \(f_0\) is the single
parameter in the Poisson model \(x_i ~ Poisson(\lambda_i)\) with
rates \(\lambda_i = s_i f_0\). This LFC definition is chosen with
lfc.stat = "vsnull"
.
Another option is to compare against a chosen topic, k: \(LFC(j)
= log2(f_j/f_k)\). By definition, \(LFC(k)\) is zero, and
statistics such as z-scores and p-values for topic k are set to
NA
. This LFC definition is selected by setting
lfc.stat = k
.
A final option (which is the default) computes the “least
extreme” LFC, defined as \(LFC(j) = log2(f_j/f_k)\) such that
\(k\) is the topic other than \(j\) that gives the ratio
\(f_j/f_k\) closest to 1. This option is chosen with
lfc.stat = "le"
.
We recommend setting shrink.method = "ash"
, which uses the
“adaptive shrinkage” method (Stephens, 2016) to improve
accuracy of the posterior mean estimates and z-scores. We follow
the settings used in lfcShrink
from the ‘DESeq2’
package, with type = "ashr"
.
Note that all LFC statistics are defined using the base-2 logarithm following the conventioned used in differential expression analysis.
The control
argument is a list in which any of the
following named components will override the default optimization
algorithm settings (as they are defined by
de_analysis_control_default
):
numiter
Maximum number of iterations performed in
fitting the Poisson models. When fit.method = "glm"
, this is
passed as argument maxit
to the glm
function.
minval
A small, positive number. All topic
proportions less than this value and greater than 1 - minval
are set to this value.
tol
Controls the convergence tolerance for fitting
the Poisson models. When fit.method = "glm"
, this is passed
as argument epsilon
to function glm
.
conf.level
The size of the highest posterior density (HPD) intervals. Should be a number greater than 0 and less than 1.
ns
Number of Monte Carlo samples simulated by random-walk MCMC for estimating posterior LFC quantities.
rw
The standard deviation of the normal density used to propose new states in the random-walk MCMC.
eps
A small, non-negative number added to the terms inside the logarithms to avoid computing logarithms of zero.
nc
Number of threads used in the multithreaded
computations. This controls both (a) the number of RcppParallel
threads used to fit the factors in the Poisson models, and (b)
the number of cores used in mclapply
for
the MCMC simulation step. Note that mclapply relies on forking
hence is not available on Windows; will return an error on
Windows unless nc = 1
. Also note that setting nc >
1
copies the contents of memory nc
times, which can lead
to poor performance if the total resident memory required exceeds
available physical memory.
nc.blas
Number of threads used in the numerical linear algebra library (e.g., OpenBLAS), if available. For best performance, we recommend setting this to 1 (i.e., no multithreading).
nsplit
The number of data splits used in the
multithreaded computations (only relevant when nc > 1
). More
splits increase the granularity of the progress bar, but can also
slow down the mutithreaded computations by introducing more
overhead in the call to pblapply
.
Stephens, M. (2016). False discovery rates: a new deal. Biostatistics 18(2), kxw041. doi:10.1093/biostatistics/kxw041
Zhu, A., Ibrahim, J. G. and Love, M. I. (2019). Heavy-tailed prior distributions for sequence count data: removing the noise and preserving large differences. Bioinformatics 35(12), 2084–2092.
# Perform a differential expression (DE) analysis using the previously
# fitted multinomial topic model. Note that the de_analysis call could
# take several minutes to complete.
# \donttest{
set.seed(1)
data(pbmc_facs)
de <- de_analysis(pbmc_facs$fit,pbmc_facs$counts)
#> Fitting 16791 Poisson models with k=6 using method="scd".
#> Computing log-fold change statistics from 16791 Poisson models with k=6.
#>
#> [-----------------------------------------------------------------------] 0%
#>
#> [-----------------------------------------------------------------------] 1%
#>
#> [>----------------------------------------------------------------------] 1%
#>
#> [>----------------------------------------------------------------------] 2%
#>
#> [=>---------------------------------------------------------------------] 2%
#>
#> [=>---------------------------------------------------------------------] 3%
#>
#> [=>---------------------------------------------------------------------] 4%
#>
#> [==>--------------------------------------------------------------------] 4%
#>
#> [==>--------------------------------------------------------------------] 5%
#>
#> [===>-------------------------------------------------------------------] 5%
#>
#> [===>-------------------------------------------------------------------] 6%
#>
#> [====>------------------------------------------------------------------] 6%
#>
#> [====>------------------------------------------------------------------] 7%
#>
#> [====>------------------------------------------------------------------] 8%
#>
#> [=====>-----------------------------------------------------------------] 8%
#>
#> [=====>-----------------------------------------------------------------] 9%
#>
#> [======>----------------------------------------------------------------] 9%
#>
#> [======>----------------------------------------------------------------] 10%
#>
#> [======>----------------------------------------------------------------] 11%
#>
#> [=======>---------------------------------------------------------------] 11%
#>
#> [=======>---------------------------------------------------------------] 12%
#>
#> [========>--------------------------------------------------------------] 12%
#>
#> [========>--------------------------------------------------------------] 13%
#>
#> [=========>-------------------------------------------------------------] 13%
#>
#> [=========>-------------------------------------------------------------] 14%
#>
#> [=========>-------------------------------------------------------------] 15%
#>
#> [==========>------------------------------------------------------------] 15%
#>
#> [==========>------------------------------------------------------------] 16%
#>
#> [===========>-----------------------------------------------------------] 16%
#>
#> [===========>-----------------------------------------------------------] 17%
#>
#> [===========>-----------------------------------------------------------] 18%
#>
#> [============>----------------------------------------------------------] 18%
#>
#> [============>----------------------------------------------------------] 19%
#>
#> [=============>---------------------------------------------------------] 19%
#>
#> [=============>---------------------------------------------------------] 20%
#>
#> [==============>--------------------------------------------------------] 20%
#>
#> [==============>--------------------------------------------------------] 21%
#>
#> [==============>--------------------------------------------------------] 22%
#>
#> [===============>-------------------------------------------------------] 22%
#>
#> [===============>-------------------------------------------------------] 23%
#>
#> [================>------------------------------------------------------] 23%
#>
#> [================>------------------------------------------------------] 24%
#>
#> [================>------------------------------------------------------] 25%
#>
#> [=================>-----------------------------------------------------] 25%
#>
#> [=================>-----------------------------------------------------] 26%
#>
#> [==================>----------------------------------------------------] 26%
#>
#> [==================>----------------------------------------------------] 27%
#>
#> [===================>---------------------------------------------------] 27%
#>
#> [===================>---------------------------------------------------] 28%
#>
#> [===================>---------------------------------------------------] 29%
#>
#> [====================>--------------------------------------------------] 29%
#>
#> [====================>--------------------------------------------------] 30%
#>
#> [=====================>-------------------------------------------------] 30%
#>
#> [=====================>-------------------------------------------------] 31%
#>
#> [=====================>-------------------------------------------------] 32%
#>
#> [======================>------------------------------------------------] 32%
#>
#> [======================>------------------------------------------------] 33%
#>
#> [=======================>-----------------------------------------------] 33%
#>
#> [=======================>-----------------------------------------------] 34%
#>
#> [=======================>-----------------------------------------------] 35%
#>
#> [========================>----------------------------------------------] 35%
#>
#> [========================>----------------------------------------------] 36%
#>
#> [=========================>---------------------------------------------] 36%
#>
#> [=========================>---------------------------------------------] 37%
#>
#> [==========================>--------------------------------------------] 37%
#>
#> [==========================>--------------------------------------------] 38%
#>
#> [==========================>--------------------------------------------] 39%
#>
#> [===========================>-------------------------------------------] 39%
#>
#> [===========================>-------------------------------------------] 40%
#>
#> [============================>------------------------------------------] 40%
#>
#> [============================>------------------------------------------] 41%
#>
#> [============================>------------------------------------------] 42%
#>
#> [=============================>-----------------------------------------] 42%
#>
#> [=============================>-----------------------------------------] 43%
#>
#> [==============================>----------------------------------------] 43%
#>
#> [==============================>----------------------------------------] 44%
#>
#> [===============================>---------------------------------------] 44%
#>
#> [===============================>---------------------------------------] 45%
#>
#> [===============================>---------------------------------------] 46%
#>
#> [================================>--------------------------------------] 46%
#>
#> [================================>--------------------------------------] 47%
#>
#> [=================================>-------------------------------------] 47%
#>
#> [=================================>-------------------------------------] 48%
#>
#> [=================================>-------------------------------------] 49%
#>
#> [==================================>------------------------------------] 49%
#>
#> [==================================>------------------------------------] 50%
#>
#> [===================================>-----------------------------------] 50%
#>
#> [===================================>-----------------------------------] 51%
#>
#> [====================================>----------------------------------] 51%
#>
#> [====================================>----------------------------------] 52%
#>
#> [====================================>----------------------------------] 53%
#>
#> [=====================================>---------------------------------] 53%
#>
#> [=====================================>---------------------------------] 54%
#>
#> [======================================>--------------------------------] 54%
#>
#> [======================================>--------------------------------] 55%
#>
#> [======================================>--------------------------------] 56%
#>
#> [=======================================>-------------------------------] 56%
#>
#> [=======================================>-------------------------------] 57%
#>
#> [========================================>------------------------------] 57%
#>
#> [========================================>------------------------------] 58%
#>
#> [=========================================>-----------------------------] 58%
#>
#> [=========================================>-----------------------------] 59%
#>
#> [=========================================>-----------------------------] 60%
#>
#> [==========================================>----------------------------] 60%
#>
#> [==========================================>----------------------------] 61%
#>
#> [===========================================>---------------------------] 61%
#>
#> [===========================================>---------------------------] 62%
#>
#> [===========================================>---------------------------] 63%
#>
#> [============================================>--------------------------] 63%
#>
#> [============================================>--------------------------] 64%
#>
#> [=============================================>-------------------------] 64%
#>
#> [=============================================>-------------------------] 65%
#>
#> [==============================================>------------------------] 65%
#>
#> [==============================================>------------------------] 66%
#>
#> [==============================================>------------------------] 67%
#>
#> [===============================================>-----------------------] 67%
#>
#> [===============================================>-----------------------] 68%
#>
#> [================================================>----------------------] 68%
#>
#> [================================================>----------------------] 69%
#>
#> [================================================>----------------------] 70%
#>
#> [=================================================>---------------------] 70%
#>
#> [=================================================>---------------------] 71%
#>
#> [==================================================>--------------------] 71%
#>
#> [==================================================>--------------------] 72%
#>
#> [==================================================>--------------------] 73%
#>
#> [===================================================>-------------------] 73%
#>
#> [===================================================>-------------------] 74%
#>
#> [====================================================>------------------] 74%
#>
#> [====================================================>------------------] 75%
#>
#> [=====================================================>-----------------] 75%
#>
#> [=====================================================>-----------------] 76%
#>
#> [=====================================================>-----------------] 77%
#>
#> [======================================================>----------------] 77%
#>
#> [======================================================>----------------] 78%
#>
#> [=======================================================>---------------] 78%
#>
#> [=======================================================>---------------] 79%
#>
#> [=======================================================>---------------] 80%
#>
#> [========================================================>--------------] 80%
#>
#> [========================================================>--------------] 81%
#>
#> [=========================================================>-------------] 81%
#>
#> [=========================================================>-------------] 82%
#>
#> [==========================================================>------------] 82%
#>
#> [==========================================================>------------] 83%
#>
#> [==========================================================>------------] 84%
#>
#> [===========================================================>-----------] 84%
#>
#> [===========================================================>-----------] 85%
#>
#> [============================================================>----------] 85%
#>
#> [============================================================>----------] 86%
#>
#> [============================================================>----------] 87%
#>
#> [=============================================================>---------] 87%
#>
#> [=============================================================>---------] 88%
#>
#> [==============================================================>--------] 88%
#>
#> [==============================================================>--------] 89%
#>
#> [===============================================================>-------] 89%
#>
#> [===============================================================>-------] 90%
#>
#> [===============================================================>-------] 91%
#>
#> [================================================================>------] 91%
#>
#> [================================================================>------] 92%
#>
#> [=================================================================>-----] 92%
#>
#> [=================================================================>-----] 93%
#>
#> [=================================================================>-----] 94%
#>
#> [==================================================================>----] 94%
#>
#> [==================================================================>----] 95%
#>
#> [===================================================================>---] 95%
#>
#> [===================================================================>---] 96%
#>
#> [====================================================================>--] 96%
#>
#> [====================================================================>--] 97%
#>
#> [====================================================================>--] 98%
#>
#> [=====================================================================>-] 98%
#>
#> [=====================================================================>-] 99%
#>
#> [======================================================================>] 99%
#>
#> [======================================================================>] 100%
#>
#> [=======================================================================] 100%
#>
#>
#> Stabilizing posterior log-fold change estimates using adaptive shrinkage.
# Compile the DE analysis results for topic 4 into a table, and
# rank the genes by the posterior mean log-fold change, limiting to
# DE genes identified with low lfsr ("local false sign rate").
k <- 4
dat <- data.frame(postmean = de$postmean[,k],
z = de$z[,k],
lfsr = de$lfsr[,k])
rownames(dat) <- with(pbmc_facs$genes,paste(symbol,ensembl,sep = "_"))
dat <- subset(dat,lfsr < 0.01)
dat <- dat[order(dat$postmean,decreasing = TRUE),]
# Genes at the top of this ranking are genes that are much more
# highly expressed in the topic compared to other topics.
head(dat,n = 10)
#> postmean z lfsr
#> LINC00926_ENSG00000247982 6.476 8.406 7.977e-15
#> HLA-DOB_ENSG00000241106 5.701 4.479 2.617e-04
#> BANK1_ENSG00000153064 5.210 6.491 6.919e-09
#> BLK_ENSG00000136573 5.160 9.992 2.424e-21
#> CD40_ENSG00000101017 3.996 7.486 9.637e-12
#> P2RX5_ENSG00000083454 3.823 15.002 4.223e-48
#> RALGPS2_ENSG00000116191 3.668 12.199 1.110e-31
#> CD79B_ENSG00000007312 3.544 15.986 7.436e-55
#> TNFRSF13C_ENSG00000159958 3.425 3.009 8.031e-03
#> CD72_ENSG00000137101 3.224 6.465 2.382e-09
# The genes at the bottom of the ranking are genes that are much less
# expressed in the topic.
tail(dat,n = 10)
#> postmean z lfsr
#> KIAA0125_ENSG00000226777 -1.369 -10.313 1.110e-16
#> LILRA4_ENSG00000239961 -1.405 -16.907 1.110e-16
#> VAMP5_ENSG00000168899 -1.430 -5.527 3.007e-06
#> GSTP1_ENSG00000084207 -1.478 -13.814 0.000e+00
#> FGR_ENSG00000000938 -1.698 -17.268 0.000e+00
#> ARL4C_ENSG00000188042 -1.773 -4.868 3.511e-05
#> IGJ_ENSG00000132465 -1.781 -14.424 0.000e+00
#> LST1_ENSG00000204482 -1.798 -7.559 2.200e-12
#> ABRACL_ENSG00000146386 -1.853 -5.407 1.858e-06
#> HCK_ENSG00000101336 -2.153 -44.484 2.220e-16
# Create a volcano plot from the DE results for topic 4.
volcano_plot(de,k = k,ymax = 50,labels = pbmc_facs$genes$symbol)
# }