These are topic modeling results from the “20 Newsgroups” data, with k = 10 topics. The data were originally downloaded from http://qwone.com/~jason/20Newsgroups/ and prepared by running code that found in an R Markdown file in this GitHub repository: https://github.com/stephenslab/fastTopics-experiments. See the “inst” directory of this package for the scripts used to generate these results.

Format

newsgroups is a list with the following elements:

topics

Original labeling of the documents: each document is from one of 20 “newsgroups”.

L

Estimated topic proportions matrix; rows are documents and columns are topics.

F

Matrix containing posterior mean estimates of log-fold changes (in base-2 logarithm). These were computed using de_analysis with lfc.stat = "vsnull". Columns are words and columns are topics.

Examples

data(newsgroups)
table(newsgroups$topics)
#> 
#>              alt.atheism            comp.graphics  comp.os.ms-windows.misc 
#>                      798                      970                      963 
#> comp.sys.ibm.pc.hardware    comp.sys.mac.hardware           comp.windows.x 
#>                      979                      958                      982 
#>             misc.forsale                rec.autos          rec.motorcycles 
#>                      964                      987                      993 
#>       rec.sport.baseball         rec.sport.hockey                sci.crypt 
#>                      991                      997                      989 
#>          sci.electronics                  sci.med                sci.space 
#>                      984                      987                      985 
#>   soc.religion.christian       talk.politics.guns    talk.politics.mideast 
#>                      997                      909                      940 
#>       talk.politics.misc       talk.religion.misc 
#>                      774                      627 
dim(newsgroups$L)
#> [1] 18774    10
dim(newsgroups$F)
#> [1] 18332    10