Computes a low-dimensional nonlinear embededding of the data from the estimated loadings or mixture proportions using the t-SNE nonlinear dimensionality reduction method.

tsne_from_topics(
  fit,
  dims = 2,
  n = 5000,
  scaling = NULL,
  pca = FALSE,
  normalize = FALSE,
  perplexity = 100,
  theta = 0.1,
  max_iter = 1000,
  eta = 200,
  check_duplicates = FALSE,
  verbose = TRUE,
  ...
)

Arguments

fit

An object of class “poisson_nmf_fit” or “multinom_topic_model_fit”.

dims

The number of dimensions in the t-SNE embedding; passed as argument “dims” to Rtsne.

n

The maximum number of rows in the loadings matrix fit$L to use; when the loadings matrix has more than n rows, the t-SNE embedding is computed on a random selection of n rows. An upper limit on the number of rows is used because the runtime of Rtsne increases rapidly with the number of rows in the input matrix.

scaling

A numeric vector of length equal to the number of topics specifying a scaling of the columns of fit$L; this re-scaling is performed prior to running t-SNE. The vector should contain non-negative numbers only. A larger value will increase the importance, or “weight”, of the respective topic in computing the embedding. When scaling is NULL, no re-scaling is performed. Note that this scaling will have no effect if normalize = TRUE.

pca

Whether to perform a PCA processing stepe in t-SNE; passed as argument “pca” to Rtsne.

normalize

Whether to normalize the data prior to running t-SNE; passed as argument “normalize” to Rtsne.

perplexity

t-SNE perplexity parameter, passed as argument “perplexity” to Rtsne. The perplexity is automatically revised if it is too large; see Rtsne for more information.

theta

t-SNE speed/accuracy trade-off parameter; passed as argument “theta” to Rtsne.

max_iter

Maximum number of t-SNE iterations; passed as argument “max_iter” to Rtsne.

eta

t-SNE learning rate parameter; passed as argument “eta” to Rtsne.

check_duplicates

When check_duplicates = TRUE, checks whether there are duplicate rows in fit$L; passed as argument “check_duplicates” to Rtsne.

verbose

If verbose = TRUE, progress updates are printed; passed as argument “verbose” to Rtsne.

...

Additional arguments passed to Rtsne.

Value

A list with two list elements: Y, an n x d matrix containing the embedding Y returned by Rtsne, where n is the number of rows of the loadings matrix, and d = dims; rows, the rows of the loadings matrix included in the t-SNE embedding.

Details

This is a lightweight interface for rapidly producing t-SNE embeddings from matrix factorizations or multinomial topic models; in particular, tsne_from_topics replaces the t-SNE defaults with settings that are more suitable for visualizing the structure of a matrix factorization or topic model (e.g., the PCA step in Rtsne is activated by default, but disabled in tsne_from_topics). See Kobak and Berens (2019) for guidance on choosing t-SNE settings such as the "perplexity" and learning rate (eta).

Note that since tsne_plot uses a nonlinear transformation of the data, distances between points are less interpretable than a linear transformation visualized using pca_plot for example.

References

Kobak, D. and Berens, P. (2019). The art of using t-SNE for single-cell transcriptomics. Nature Communications 10, 5416. https://doi.org/10.1038/s41467-019-13056-x

See also