1Department of Bio and Health Informatics, Technical University of Denmark; 2Human Immune Monitoring Center, Institute for Immunity, Stanford University
Single cell cytometry has been a scientific discipline for several decades. Hundreds of cell types and subtypes have been defined using the patterns of lineage marker expression measured by antibody-based assays such as flow cytometry. Deconvoluting the cell types present in a complex mixture such as PBMC is traditionally done by stepwise biaxial gating – a practice which is strongly anchored in the biological knowledge applied to reporter panel design prior to data acquisition. Advances in mass cytometry has enabled the measurement of 40+ features per cell, thus making biaxial gating impractical. This has catalyzed the development of multidimensional analysis tools for data deconvolution, most of which have a strong emphasis on unsupervised clustering of cell types. While an unbiased, data-driven approach is useful when searching for rare or novel cell populations, we here show that precision stratification of well-known immune cell populations from a PBMC mixture benefits from explicitly accounting for both prior biological and technical knowledge. Using 24 lineage markers measured by mass cytometry in PBMC samples from more than 250 healthy donors, we designed a Bayesian (probabilistic) statistical modeling. The model is trained on the distributions of marker expression in manually gated sets of immune cell populations. It considers both the technical stochasticity of the CyTOF workflow and the class assignment properties of manual gating, such as multi-class or no class memberships, to produce independent and thus deterministic class assignments. Given the vast amounts of cytometry data that have, and is continuously being generated, this approach is extendable to other tissues and diseases, as well as the identification of rare and novel cell populations.