Visualize the structure of the Poisson NMF loadings or
the multinomial topic model topic proportions by projection onto
a 2-d surface. plot_hexbin_plot
is most useful for
visualizing the PCs of a data set with thousands of samples or
more.
embedding_plot_2d(
fit,
Y,
fill = "loading",
k,
fill.label,
ggplot_call = embedding_plot_2d_ggplot_call,
plot_grid_call = function(plots) do.call(plot_grid, plots)
)
embedding_plot_2d_ggplot_call(
Y,
fill,
fill.type = c("loading", "numeric", "factor", "none"),
fill.label,
font.size = 9
)
pca_plot(
fit,
Y,
pcs = 1:2,
n = 10000,
fill = "loading",
k,
fill.label,
ggplot_call = embedding_plot_2d_ggplot_call,
plot_grid_call = function(plots) do.call(plot_grid, plots),
...
)
tsne_plot(
fit,
Y,
n = 2000,
fill = "loading",
k,
fill.label,
ggplot_call = embedding_plot_2d_ggplot_call,
plot_grid_call = function(plots) do.call(plot_grid, plots),
...
)
umap_plot(
fit,
Y,
n = 2000,
fill = "loading",
k,
fill.label,
ggplot_call = embedding_plot_2d_ggplot_call,
plot_grid_call = function(plots) do.call(plot_grid, plots),
...
)
pca_hexbin_plot(
fit,
Y,
pcs = 1:2,
bins = 40,
breaks = c(0, 1, 10, 100, 1000, Inf),
ggplot_call = pca_hexbin_plot_ggplot_call,
...
)
pca_hexbin_plot_ggplot_call(Y, bins, breaks, font.size = 9)
An object of class “poisson_nmf_fit” or “multinom_topic_model_fit”.
The n x 2 matrix containing the 2-d embedding, where n is
the number of rows in fit$L
. If not provided, the embedding
will be computed automatically.
The quantity to map onto the fill colour of the points
in the PCA plot. Set fill = "loading"
to vary the fill
colour according to the loadings (or topic proportions) of the
select topiced or topics. Alternatively, fill
may be set to a
data vector with one entry per row of fit$L
, in which case
the data are mapped to the fill colour of the points. When
fill = "none"
, the fill colour is not varied.
The dimensions or topics selected by number or name. When
fill = "loading"
, one plot is created per selected dimension
or topic; when fill = "loading"
and k
is not
specified, all dimensions or topics are plotted.
The label used for the fill colour legend.
The function used to create the plot. Replace
embedding_plot_2d_ggplot_call
or pca_hexbin_plot_ggplot_call
with your own function to customize the appearance of the plot.
When fill = "loading"
and multiple
topics (k
) are selected, this is the function used to
arrange the plots into a grid using plot_grid
.
It should be a function accepting a single argument, plots
,
a list of ggplot
objects.
The type of variable mapped to fill colour. The
fill colour is not varied when fill.type = "none"
.
Font size used in plot.
The two principal components (PCs) to be plotted, specified by name or number.
The maximum number of points to plot. If n
is less
than the number of rows of fit$L
, the rows are subsampled at
random. This argument is ignored if Y
is provided.
Additional arguments passed to
pca_from_topics
, tsne_from_topics
or
umap_from_topics
. These additional arguments are only
used if Y
is not provided.
Number of bins used to create hexagonal 2-d
histogram. Passed as the “bins” argument to
stat_bin_hex
.
To produce the hexagonal histogram, the counts are
subdivided into intervals based on breaks
. Passed as the
“breaks” argument to cut
.
A ggplot
object.
This is a lightweight interface primarily intended to
expedite creation of plots for visualizing the loadings or topic
proportions; most of the heavy lifting is done by
‘ggplot2’. The 2-d embedding itself is computed by invoking
pca_from_topics
, tsne_from_topics
or
umap_from_topics
. For more control over the plot's
appearance, the plot can be customized by modifying the
ggplot_call
and plot_grid_call
arguments.
An effective 2-d visualization may also require some fine-tunning
of the settings, such as the t-SNE “perplexity”, or the
number of samples included in the plot. The PCA, UMAP, t-SNE
settings can be controlled by the additional arguments
(...). Alternatively, a 2-d embedding may be pre-computed, and
passed as argument Y
.
set.seed(1)
data(pbmc_facs)
# Get the Poisson NMF and multinomial topic models fitted to the
# PBMC data.
fit1 <- multinom2poisson(pbmc_facs$fit)
fit2 <- pbmc_facs$fit
# Plot the first two PCs of the loadings matrix (for the
# multinomial topic model, "fit2", the loadings are the topic
# proportions).
subpop <- pbmc_facs$samples$subpop
p1 <- pca_plot(fit1,k = 1)
p2 <- pca_plot(fit2)
p3 <- pca_plot(fit2,fill = "none")
p4 <- pca_plot(fit2,pcs = 3:4,fill = "none")
p5 <- pca_plot(fit2,fill = fit2$L[,1])
p6 <- pca_plot(fit2,fill = subpop)
p7 <- pca_hexbin_plot(fit1)
p8 <- pca_hexbin_plot(fit2)
# \donttest{
# Plot the loadings using t-SNE.
p1 <- tsne_plot(fit1,k = 1)
#> Read the 2000 x 6 data matrix successfully!
#> Using no_dims = 2, perplexity = 100.000000, and theta = 0.100000
#> Computing input similarities...
#> Building tree...
#> Done in 0.50 seconds (sparsity = 0.195510)!
#> Learning embedding...
#> Iteration 50: error is 56.566020 (50 iterations in 1.02 seconds)
#> Iteration 100: error is 49.586311 (50 iterations in 0.70 seconds)
#> Iteration 150: error is 48.779634 (50 iterations in 0.68 seconds)
#> Iteration 200: error is 48.481377 (50 iterations in 0.67 seconds)
#> Iteration 250: error is 48.321772 (50 iterations in 0.68 seconds)
#> Iteration 300: error is 0.499560 (50 iterations in 0.81 seconds)
#> Iteration 350: error is 0.354588 (50 iterations in 0.83 seconds)
#> Iteration 400: error is 0.303966 (50 iterations in 0.81 seconds)
#> Iteration 450: error is 0.280049 (50 iterations in 0.81 seconds)
#> Iteration 500: error is 0.266749 (50 iterations in 0.81 seconds)
#> Iteration 550: error is 0.258490 (50 iterations in 0.80 seconds)
#> Iteration 600: error is 0.252933 (50 iterations in 0.80 seconds)
#> Iteration 650: error is 0.248995 (50 iterations in 0.79 seconds)
#> Iteration 700: error is 0.246062 (50 iterations in 0.78 seconds)
#> Iteration 750: error is 0.243852 (50 iterations in 0.78 seconds)
#> Iteration 800: error is 0.242078 (50 iterations in 0.78 seconds)
#> Iteration 850: error is 0.240709 (50 iterations in 0.78 seconds)
#> Iteration 900: error is 0.239629 (50 iterations in 0.77 seconds)
#> Iteration 950: error is 0.238710 (50 iterations in 0.76 seconds)
#> Iteration 1000: error is 0.237995 (50 iterations in 0.76 seconds)
#> Fitting performed in 15.62 seconds.
p2 <- tsne_plot(fit2)
#> Read the 2000 x 6 data matrix successfully!
#> Using no_dims = 2, perplexity = 100.000000, and theta = 0.100000
#> Computing input similarities...
#> Building tree...
#> Done in 0.48 seconds (sparsity = 0.185092)!
#> Learning embedding...
#> Iteration 50: error is 55.169629 (50 iterations in 0.96 seconds)
#> Iteration 100: error is 48.296393 (50 iterations in 0.68 seconds)
#> Iteration 150: error is 47.207207 (50 iterations in 0.62 seconds)
#> Iteration 200: error is 46.770503 (50 iterations in 0.61 seconds)
#> Iteration 250: error is 46.531120 (50 iterations in 0.61 seconds)
#> Iteration 300: error is 0.483807 (50 iterations in 0.74 seconds)
#> Iteration 350: error is 0.337533 (50 iterations in 0.74 seconds)
#> Iteration 400: error is 0.280061 (50 iterations in 0.74 seconds)
#> Iteration 450: error is 0.251434 (50 iterations in 0.73 seconds)
#> Iteration 500: error is 0.235569 (50 iterations in 0.74 seconds)
#> Iteration 550: error is 0.226006 (50 iterations in 0.72 seconds)
#> Iteration 600: error is 0.219770 (50 iterations in 0.70 seconds)
#> Iteration 650: error is 0.215321 (50 iterations in 0.71 seconds)
#> Iteration 700: error is 0.212055 (50 iterations in 0.70 seconds)
#> Iteration 750: error is 0.209506 (50 iterations in 0.70 seconds)
#> Iteration 800: error is 0.207451 (50 iterations in 0.69 seconds)
#> Iteration 850: error is 0.205801 (50 iterations in 0.69 seconds)
#> Iteration 900: error is 0.204450 (50 iterations in 0.74 seconds)
#> Iteration 950: error is 0.203271 (50 iterations in 0.86 seconds)
#> Iteration 1000: error is 0.202383 (50 iterations in 0.75 seconds)
#> Fitting performed in 14.43 seconds.
p3 <- tsne_plot(fit2,fill = subpop)
#> Read the 2000 x 6 data matrix successfully!
#> Using no_dims = 2, perplexity = 100.000000, and theta = 0.100000
#> Computing input similarities...
#> Building tree...
#> Done in 0.49 seconds (sparsity = 0.184268)!
#> Learning embedding...
#> Iteration 50: error is 54.034875 (50 iterations in 1.06 seconds)
#> Iteration 100: error is 47.811985 (50 iterations in 0.73 seconds)
#> Iteration 150: error is 47.002541 (50 iterations in 0.72 seconds)
#> Iteration 200: error is 46.676555 (50 iterations in 0.72 seconds)
#> Iteration 250: error is 46.488674 (50 iterations in 0.71 seconds)
#> Iteration 300: error is 0.461843 (50 iterations in 0.75 seconds)
#> Iteration 350: error is 0.320256 (50 iterations in 0.76 seconds)
#> Iteration 400: error is 0.267543 (50 iterations in 0.77 seconds)
#> Iteration 450: error is 0.241771 (50 iterations in 0.76 seconds)
#> Iteration 500: error is 0.227141 (50 iterations in 0.77 seconds)
#> Iteration 550: error is 0.217905 (50 iterations in 0.76 seconds)
#> Iteration 600: error is 0.211658 (50 iterations in 0.76 seconds)
#> Iteration 650: error is 0.207181 (50 iterations in 0.77 seconds)
#> Iteration 700: error is 0.203827 (50 iterations in 0.78 seconds)
#> Iteration 750: error is 0.201253 (50 iterations in 0.78 seconds)
#> Iteration 800: error is 0.199169 (50 iterations in 0.79 seconds)
#> Iteration 850: error is 0.197529 (50 iterations in 0.79 seconds)
#> Iteration 900: error is 0.196210 (50 iterations in 0.78 seconds)
#> Iteration 950: error is 0.195116 (50 iterations in 0.78 seconds)
#> Iteration 1000: error is 0.194157 (50 iterations in 0.79 seconds)
#> Fitting performed in 15.54 seconds.
# Plot the loadings using UMAP.
p1 <- umap_plot(fit1,k = 1)
#> 09:21:01 UMAP embedding parameters a = 1.896 b = 0.8006
#> 09:21:01 Read 2000 rows and found 6 numeric columns
#> 09:21:01 Using FNN for neighbor search, n_neighbors = 30
#> 09:21:01 Commencing smooth kNN distance calibration using 4 threads
#> with target n_neighbors = 30
#> 09:21:02 Initializing from normalized Laplacian + noise (using irlba)
#> 09:21:02 Commencing optimization for 500 epochs, with 74134 positive edges
#> 09:21:04 Optimization finished
p2 <- umap_plot(fit2)
#> 09:21:04 UMAP embedding parameters a = 1.896 b = 0.8006
#> 09:21:04 Read 2000 rows and found 6 numeric columns
#> 09:21:04 Using FNN for neighbor search, n_neighbors = 30
#> 09:21:05 Commencing smooth kNN distance calibration using 4 threads
#> with target n_neighbors = 30
#> 09:21:05 56 smooth knn distance failures
#> 09:21:05 Found 3 connected components,
#> falling back to 'spca' initialization with init_sdev = 1
#> 09:21:05 Using 'irlba' for PCA
#> 09:21:05 PCA: 2 components explained 65.39% variance
#> 09:21:05 Scaling init to sdev = 1
#> 09:21:05 Commencing optimization for 500 epochs, with 72844 positive edges
#> 09:21:08 Optimization finished
p3 <- umap_plot(fit2,fill = subpop)
#> 09:21:08 UMAP embedding parameters a = 1.896 b = 0.8006
#> 09:21:08 Read 2000 rows and found 6 numeric columns
#> 09:21:08 Using FNN for neighbor search, n_neighbors = 30
#> 09:21:08 Commencing smooth kNN distance calibration using 4 threads
#> with target n_neighbors = 30
#> 09:21:08 54 smooth knn distance failures
#> 09:21:08 Found 2 connected components,
#> falling back to 'spca' initialization with init_sdev = 1
#> 09:21:08 Using 'irlba' for PCA
#> 09:21:08 PCA: 2 components explained 65.4% variance
#> 09:21:08 Scaling init to sdev = 1
#> 09:21:08 Commencing optimization for 500 epochs, with 72822 positive edges
#> 09:21:11 Optimization finished
# }