Machine learning of gating strategies from high-dimensional cytometry data
Etienne Becht1, Elaine Coustan-Smith2, Dario Campana2, Evan Newell1
1Singapore Immunology Network, A*STAR ; 2National University of Singapore
Recent advances in fluorescence flow and mass cytometry allow for simultaneous assessment of large numbers of cellular proteins at the single cell level. Identification of cell populations that are relevant in a given biological context has thus shifted from relying exclusively on examination of sets of bi-axial projections and subsettings of the data (i.e., gating strategies) to automated clustering and information-maximizing low-dimensional projections. However, manual gating strategies are still important because they leverage accumulated knowledge, can be easier to interpret and reproduce, and are currently required for the purification of cell populations using fluorescence-activated cell sorters.
Here we propose a computational method, called Hypergate (for automated Hyperrectangular Gate), that given a cytometry dataset containing a selected subset of cells of interest (usually defined by other high dimensional analysis methods, such as t-SNE), outputs a gating strategy that identifies the cell population with high purity and yield. Hypergate operates by finding a high-dimensional rectangle so that events that fall inside it corresponds to those identified by the user.
The output enables a human-readable phenotypic characterization of the cells of interest, facilitates the design of sorting experiments, and can be used to classify new events. We show using public datasets that Hypergate is able to re-discover the phenotypes of expert-defined cell populations. We also show that it outperforms unsupervised classification methods as well as Support Vector Machines in classification tasks. Given a reasonable number of training events, Hypergate is not prone to overfitting. For cell sorting experiments, the number of channels used can be tuned retrospectively to match technical constraints. We also demonstrate its potential for use in the unbiased identification of minimal residual disease in the context of relapsing acute lymphoblastic leukemia.
Hypergate thus translates outputs of recent algorithmic methods into the graphical and relevant gating strategy language. Our results suggest that this method will be very useful for a wide range of applications.