Reference Component Analysis of Single Cell Transcriptomes Reveals Cellular Heterogeneity in Colorectal Cancer
HuipengLi1,*, EliseT. Courtois1,*, Debarka Sengupta1, Yuliana Tan1, Say Li Kong2, Tan Wah Siew3,Mark Wong3,Lim Kiat Hon4, Lawrence Wee5, Axel Hillmer2, Iain Beehuat Tan2,6,7,+, Paul Robson8,+ and ShyamPrabhakar1,+
1Genome Institute of Singapore, Computational and Systems Biology, Singapore; 2Genome Institute of Singapore, Cancer Therapeutics and Stratified Oncology, Singapore; 3Singapore General Hospital, Department of Colorectal Surgery, Singapore; 4Singapore General Hospital, Department of Pathology, Singapore; 5Institute for Infocomm Research, Data Analysis Department, Singapore; 6National Cancer Centre Singapore, Department of Medical Oncology, Singapore; 7Program in Cancer & Stem Cell Biology, Duke-NUS Medical School, Singapore; 8The Jackson Laboratory for Genomic Medicine, Single Cell Biology Laboratory, Farmington, CT, USA
*These authors contributed equally to this work
Tumor heterogeneity is considered one of the greatest obstacles to cancer treatment, since it underlies drug resistance and resistance to immunotherapy. Although many studies have profiled tumor heterogeneity at the level of DNA mutations, transcriptomic heterogeneity remains almost entirely unexplored. Single-cell RNA-seq can potentially address this gap, but due to high levels of noise, technical bias and batch effects, existing algorithms show poor performance in distinguishing cell types and cell states in clinical samples. We developed an algorithm, Reference Component Analysis (RCA), that uses a biologically informed distance measure to robustly cluster single cells by their transcriptomes. When tested on a novel benchmark data set, RCA was the only method that clustered cell types with accuracy approaching 100% despite batch effects and technical variability. To characterize intratumor heterogeneity, we applied RCA to over 1,500 single cells isolated from 11 colorectal tumors, along with matched normal tissue. Cell clusters identified by RCA revealed numerous functional differences between tumor and normal samples that were invisible in bulk-transcriptome analysis. These functional differences shed light on multiple aspects of tumor biology, including epithelial mesenchymal transition (EMT), cancer stem cells (CSCs), and signaling pathways such as TGF-β. Notably, we identified two subtypes of cancer-associated fibroblasts (CAFs) with distinct transcriptome profiles. Moreover, we were able to predict colorectal cancer patient survival using single cell expression signatures. Intriguingly, we identified significant similarity between tumor-specific expression signatures and perturbations induced in vitro by specific small molecules, some of which are already standard of care. Our approach is the first to successfully identify cell types clusters from single-cell RNA-seq data of tumor samples, thus facilitating a high resolution view into intra-tumor heterogeneity and microenvironmental complexity in colorectal cancer.