Generate data from a GLM-PCA model with a specified rank.
generate_glmpca_data_pois(n, p, K, link = c("log", "log1p"))
Number of rows (genes).
Number of columns (cells).
Rank of the underlying mean structure.
Character vector describing the link between the product of the loading and factors and the mean of the data.
list with the following components
LL - loadings of underlying mean structure. A K x n matrix
FF - factors of underlying mean structure. A K x p matrix
Y - n x p matrix of generated data.
This function assumes that each column of the data is generated from a multinomial distribution. Let $$Y_j$$ denote column j of the generated data matrix. First, we set $$sum(Y_j)$$ equal to a value generated from a $$Uniform(50, 5000)$$ distribution. Then, we generate $$L$$ and $$F$$ from mixture distributions, and calculate $$H = exp(L'F)$$. Then, we generate the individual elements of $$Y_j$$ from a multinomial model where the probability for each individual element is just $$H_j$$ normalized.
set.seed(1)
sim_data <- generate_glmpca_data_pois(1000, 500, 1)