These are topic modeling results from the “20 Newsgroups” data, with k = 10 topics. The data were originally downloaded from http://qwone.com/~jason/20Newsgroups/ and prepared by running code that found in an R Markdown file in this GitHub repository: https://github.com/stephenslab/fastTopics-experiments. See the “inst” directory of this package for the scripts used to generate these results.
newsgroups is a list with the following elements:
Original labeling of the documents: each document is from one of 20 “newsgroups”.
Estimated topic proportions matrix; rows are documents and columns are topics.
Matrix containing posterior mean estimates of log-fold
changes (in base-2 logarithm). These were computed using
de_analysis with lfc.stat = "vsnull". Columns
are words and columns are topics.
data(newsgroups)
table(newsgroups$topics)
#>
#> alt.atheism comp.graphics comp.os.ms-windows.misc
#> 798 970 963
#> comp.sys.ibm.pc.hardware comp.sys.mac.hardware comp.windows.x
#> 979 958 982
#> misc.forsale rec.autos rec.motorcycles
#> 964 987 993
#> rec.sport.baseball rec.sport.hockey sci.crypt
#> 991 997 989
#> sci.electronics sci.med sci.space
#> 984 987 985
#> soc.religion.christian talk.politics.guns talk.politics.mideast
#> 997 909 940
#> talk.politics.misc talk.religion.misc
#> 774 627
dim(newsgroups$L)
#> [1] 18774 10
dim(newsgroups$F)
#> [1] 18332 10