Description
Sensitive detection of rare disease-associated cell subsets via representation learning
Eirini Arvaniti1, Manfred Claassen1*
1Institute for Molecular Systems Biology, ETH Zurich, Zurich, Switzerland
*Corresponding author
Rare cell populations play a pivotal role in the initiation and progression of diseases such as cancer. However, the identification of such subpopulations remains a difficult task. This work describes CellCnn, a representation learning approach to detect rare cell subsets associated with disease using high-dimensional single-cell measurements.
Existing approaches address the task of detecting phenotype-associated cell populations via small variations of the following pipeline: cell populations are defined via a clustering algorithm, a cluster-based representation of each sample is computed and, finally, a supervised learning module is used to associate with the phenotype of interest. Successful application of such approaches may be compromised by the quality of the clustering result, especially for rare hard-to-detect cell types.
To overcome this limitation, CellCnn does not separate the steps of extracting a cell population representation and associating it with disease status. Combining these two tasks requires an approach that is capable of operating on the basis of a set of unordered single cell measurements and specifically learns representations of single cell measurements that are associated with the considered phenotype. We bring together concepts from multiple instance learning and convolutional neural networks to meet these requirements.
In this study, we apply CellCnn in a classification setting to reconstruct cell type-specific signaling responses in samples of peripheral blood mononuclear cells. We additionally apply CellCnn in a regression setting to identify abundant cell populations associated with disease onset after HIV infection. Finally, we demonstrate the unique ability of CellCnn to identify extremely rare (down to 0.01% frequency) phenotype-associated cell subsets by detecting memory-like NK cells associated with prior CMV infection and leukemic blasts in minimal residual disease-like situations.