t-SNE from Poisson NMF or Multinomial Topic Model

Computes a low-dimensional nonlinear embededding of the data from the estimated loadings or mixture proportions using the t-SNE nonlinear dimensionality reduction method.

tsne_from_topics(
  fit,
  dims = 2,
  n = 5000,
  scaling = NULL,
  pca = FALSE,
  normalize = FALSE,
  perplexity = 100,
  theta = 0.1,
  max_iter = 1000,
  eta = 200,
  check_duplicates = FALSE,
  verbose = TRUE,
  ...
)

Arguments

fit	An object of class “poisson_nmf_fit” or “multinom_topic_model_fit”.
dims	The number of dimensions in the t-SNE embedding; passed as argument “dims” to `Rtsne`.
n	The maximum number of rows in the loadings matrix `fit$L` to use; when the loadings matrix has more than `n` rows, the t-SNE embedding is computed on a random selection of `n` rows. An upper limit on the number of rows is used because the runtime of `Rtsne` increases rapidly with the number of rows in the input matrix.
scaling	A numeric vector of length equal to the number of topics specifying a scaling of the columns of `fit$L`; this re-scaling is performed prior to running t-SNE. The vector should contain non-negative numbers only. A larger value will increase the importance, or “weight”, of the respective topic in computing the embedding. When `scaling` is `NULL`, no re-scaling is performed. Note that this scaling will have no effect if `normalize = TRUE`.
pca	Whether to perform a PCA processing stepe in t-SNE; passed as argument “pca” to `Rtsne`.
normalize	Whether to normalize the data prior to running t-SNE; passed as argument “normalize” to `Rtsne`.
perplexity	t-SNE perplexity parameter, passed as argument “perplexity” to `Rtsne`. The perplexity is automatically revised if it is too large; see `Rtsne` for more information.
theta	t-SNE speed/accuracy trade-off parameter; passed as argument “theta” to `Rtsne`.
max_iter	Maximum number of t-SNE iterations; passed as argument “max_iter” to `Rtsne`.
eta	t-SNE learning rate parameter; passed as argument “eta” to `Rtsne`.
check_duplicates	When `check_duplicates = TRUE`, checks whether there are duplicate rows in `fit$L`; passed as argument “check_duplicates” to `Rtsne`.
verbose	If `verbose = TRUE`, progress updates are printed; passed as argument “verbose” to `Rtsne`.
...	Additional arguments passed to `Rtsne`.

Value

A list with two list elements: Y, an n x d matrix containing the embedding Y returned by Rtsne, where n is the number of rows of the loadings matrix, and d = dims; rows, the rows of the loadings matrix included in the t-SNE embedding.

Details

This is a lightweight interface for rapidly producing t-SNE embeddings from matrix factorizations or multinomial topic models; in particular, tsne_from_topics replaces the t-SNE defaults with settings that are more suitable for visualizing the structure of a matrix factorization or topic model (e.g., the PCA step in Rtsne is activated by default, but disabled in tsne_from_topics). See Kobak and Berens (2019) for guidance on choosing t-SNE settings such as the "perplexity" and learning rate (eta).

Note that since tsne_plot uses a nonlinear transformation of the data, distances between points are less interpretable than a linear transformation visualized using pca_plot for example.

References

Kobak, D. and Berens, P. (2019). The art of using t-SNE for single-cell transcriptomics. Nature Communications 10, 5416. https://doi.org/10.1038/s41467-019-13056-x

Arguments

Value

Details

References

See also