These data are a selection of the reference transcriptome profiles generated via single-cell RNA sequencing (RNA-seq) of 10 bead-enriched subpopulations of PBMCs (Donor A), described in Zheng et al (2017). The data are unique molecular identifier (UMI) counts for 16,791 genes in 3,774 cells. (Genes with no expression in any of the cells were removed.) Since the majority of the UMI counts are zero, they are efficiently stored as a 16,791 x 3774 sparse matrix. These data are used in the vignette illustrating how ‘fastglmpca’ can be used to analyze single-cell RNA-seq data. Data for a separate set of 1,000 cells is provided as a “test set” to evaluate out-of-sample predictions.
pbmc_facs
is a list with the following elements:
16,791 x 3,774 sparse matrix of UMI counts, with
rows corresponding to genes and columns corresponding to
cells (samples). It is an object of class "dgCMatrix"
).
UMI counts for an additional test set of 100 cells.
Data frame containing information about the samples, including cell barcode and source FACS population (“celltype” and “facs_subpop”).
Sample information for the additional test set of 100 cells.
Data frame containing information and the genes, including gene symbol and Ensembl identifier.
GLM-PCA model that was fit to the UMI count data in the vignette.
G. X. Y. Zheng et al (2017). Massively parallel digital transcriptional profiling of single cells. Nature Communications 8, 14049. doi:10.1038/ncomms14049
library(Matrix)
data(pbmc_facs)
cat(sprintf("Number of genes: %d\n",nrow(pbmc_facs$counts)))
#> Number of genes: 16791
cat(sprintf("Number of cells: %d\n",ncol(pbmc_facs$counts)))
#> Number of cells: 3774
cat(sprintf("Proportion of counts that are non-zero: %0.1f%%.\n",
100*mean(pbmc_facs$counts > 0)))
#> Proportion of counts that are non-zero: 4.3%.