Generate data from a GLM-PCA model with a specified rank.

generate_glmpca_data_pois(n, p, K, link = c("log", "log1p"))

Arguments

n

Number of rows (genes).

p

Number of columns (cells).

K

Rank of the underlying mean structure.

link

Character vector describing the link between the product of the loading and factors and the mean of the data.

Value

list with the following components

  • LL - loadings of underlying mean structure. A K x n matrix

  • FF - factors of underlying mean structure. A K x p matrix

  • Y - n x p matrix of generated data.

Details

This function assumes that each column of the data is generated from a multinomial distribution. Let $$Y_j$$ denote column j of the generated data matrix. First, we set $$sum(Y_j)$$ equal to a value generated from a $$Uniform(50, 5000)$$ distribution. Then, we generate $$L$$ and $$F$$ from mixture distributions, and calculate $$H = exp(L'F)$$. Then, we generate the individual elements of $$Y_j$$ from a multinomial model where the probability for each individual element is just $$H_j$$ normalized.

Examples

set.seed(1)
sim_data <- generate_glmpca_data_pois(1000, 500, 1)