Cepo uncovers cell identity through differential stability Hani Jieun Kim1,2,3, Kevin Wang1, Carissa Chen2,3, Yingxin Lin1,3, Patrick PL Tam4,5, David M Lin6, Jean YH Yang1,3, Pengyi Yang1,2,3,5* 1 School of Mathematics and Statistics, The University of Sydney, Sydney, NSW 2006, Australia 2 Computational Systems Biology Group, Children’s Medical Research Institute, Faculty of Medicine and Health, The University of Sydney, Westmead, NSW 2145, Australia 3 Charles Perkins Centre, The University of Sydney, Sydney, NSW 2006, Australia 4 Embryology Unit, Children’s Medical Research Institute, Faculty of Medicine and Health, The University of Sydney, Westmead, NSW 2145, Australia 5 School of Medical Science, Faculty of Faculty of Medicine and Health, The University of Sydney, Sydney, NSW 2006, Australia 6 Department of Biomedical Sciences, Cornell University, Ithaca, NY 14853, USA * Corresponding author (email@example.com) Defining cell identity is fundamental to understand the cellular heterogeneity in populations. Whilst exploring cell identity has been enabled by rapid technological advances in genome-wide profiling of single cells, only a few methods have been designed to identify genes associated with cell identity. None of the current approaches, among which the most widely used is differential expression (DE), has been evaluated systematically for their attribute and fidelity for defining cell identity genes from scRNAseq data. Here, we present Cepo, a method to retrieve genes defining cell identity from scRNA-seq data. We propose a biologically motivated metric, differential stability (DS), to identify cell-type specific genes on the premise that stable gene expression is a key indicator of cell identity. We perform a comprehensive benchmark against several differential analysis methods to show that Cepo outperforms current methods in assigning cell identity and enhances several cell identification applications such as cell-type characterisation, spatial mapping of single cells, and lineage inference of single cells. Moreover, Cepo is computational fast and efficient, requiring only seconds to analyse datasets with tens of thousands of single cells. As a method for identifying cell identity genes, we foresee that Cepo will facilitate the mining of the growing resource of single-cell data and realise the potential of single-cell analytics technologies to pinpoint cell identities that are relevant to the cellular phenotype.