Visualize the "structure" of the Poisson NMF loadings or multinomial topic model mixture proportions by projection onto a 2-d surface. Samples in the projection are colored according to the their loadings/proportions. By default, t-SNE is used to compute the 2-d embedding from the loadings or mixture proportions.

tsne_plot(
  fit,
  color = c("mixprop", "loading"),
  k,
  tsne,
  ggplot_call = tsne_plot_ggplot_call,
  plot_grid_call = function(plots) do.call(plot_grid, plots),
  ...
)

tsne_plot_ggplot_call(dat, topic.label, font.size = 9)

Arguments

fit

An object of class “poisson_nmf_fit” or “multinom_topic_model_fit”.

color

The data mapped to the color “aesthetic” in the plot. When color = "loading", the estimated loadings (stored the L matrix) in the Poisson NMF model are plotted; when color = "mixprop", the estimated mixture proportions (which are recovered from the loadings by calling poisson2multinom are shown. When fit is a “multinom_topic_model_fit” object, the only available option is color = "mixprop". In most settings, the mixture proportions are preferred, even if the 2-d embedding is computed from the loading matrix of the Poisson NMF model.

k

The topic, or topics, selected by number or name. One plot is created per selected topic. When not specified, all topics are plotted.

tsne

A 2-d embedding of the samples (rows of X), or a subset of the samples, such as an output from tsne_from_topics. It should be a list object with the same structure as a tsne_from_topics output; see tsne_from_topics for details. If not provided, a 2-d t-SNE embedding will be estimated automatically by calling tsne_from_topics.

ggplot_call

The function used to create the plot. Replace tsne_plot_ggplot_call with your own function to customize the appearance of the plot.

plot_grid_call

When multiple topics are selected, this is the function used to arrange the plots into a grid using plot_grid. It should be a function accepting a single argument, plots, a list of ggplot objects.

...

Additional arguments passed to tsne_from_topics. These arguments are only used if tsne is not provided.

dat

A data frame passed as input to ggplot, containing, at a minimum, columns “d1”, “d2” (the first and second dimensions in the 2-d embedding), and “loading”.

topic.label

The name or number of the topic being plotted; it is only used to determine the plot title.

font.size

Font size used in plot.

Value

A ggplot object.

Details

This is a lightweight interface primarily intended to expedite creation of scatterplots for visualizing the loadings or mixture proportions in 2-d; most of the “heavy lifting” is done by ggplot2. The 2-d embedding itself is computed by invoking function tsne_from_topics (unless the “tsne” input is provided). For more control over the plot's appearance, the plot can be customized by modifying the ggplot_call and plot_grid_call arguments.

An effective 2-d visualization may also necessitate some fine-tunning of the t-SNE settings, such as the “perplexity”, or the number of samples included in the plot. The t-SNE settings can be controlled by the additional arguments (...) passed to tsne_from_topics; see tsne_from_topics for details. Alternatively, a 2-d embedding may be pre-computed, and passed as argument tsne to tsne_plot.