Simulate gene expression data (UMI counts) under a toy expression model. Samples (expression profiles) are drawn from a multinomial topic model in which topics are "gene programs".

simulate_toy_gene_data(n, m, k, s)

Arguments

n

The number of samples (gene expression profiles) to simulate.

m

The number of counts (genes) to simulate.

k

The number of topics ("gene programs") used to simulate the data.

s

A scalar specifying the total expression of each sample; it specifies the "size" parameter in the calls to rmultinom.

Value

The return value is a list containing the counts matrix X, and the gene frequencies F and mixture proportions L used to generate the counts.

Details

The mixture proportions are generated as follows. With probability 0.9, one proportion is one, or close to one, and the remaining are zero, or close to zero; that is, the counts are primarily generated from a single gene program. Otherwise (wtth probability 0.1), the mixture proportions are roughly equal.

Gene frequencies are drawn uniformly at random from [0,1].