Description
Prioritizing biology based on single-cell transcriptomics
Pascal Timshel, Tune H. Pers
The Novo Nordisk Foundation Center for Basic Metabolic Research, University of Copenhagen
Background: A key challenge for gaining biological insights from genetic associations is to identify which genes and pathways explain the associations. We have previously shown that reconstituted gene sets augment biological interpretation of genome-wide association studies (GWAS; Pers, 2015). ‘Reconstituted gene sets’ are based on multi-tissue expression data and comprise probabilities of each gene’s membership for a given pathway. However, a remaining major limitation is that these gene sets are not tissue or cell type specific.
Methods: We introduce DEPICT for single-cell (DEPICT-sc), which uses single-cell gene expression data to more accurately highlight enriched pathways. We learn robust ‘single-cell transcriptional components’ (scTC) from single-cell transcriptomic data by applying matrix decomposition models on data from >20,000 cells, comprising neuronal and metabolic cell-types (Lake, 2016; Macosko, 2015; Baron, 2016). Then we use the scTCs and >14,000 predefined gene sets to construct reconstituted gene sets.
Results: We show that scTCs reflect genuine biological differences between cell-types. We show that the top scTCs enrich for relevant cell-type biology, e.g. ‘detection of light stimulus’ (P < 2.23 x 10-108) for retina cells, ‘insulin secretion’ (P < 6.11 x 10-28) for pancreatic cells and ‘GABAergic synaptic transmission’ (P < 1.85 x 10-22) for cerebral cortex cells. To illustrate the relevance of DEPICT-sc for gaining biological insights from genetic associations, we integrate the reconstituted gene sets with relevant GWAS data. E.g. for age-related macular degeneration we successfully prioritize ‘response to reactive oxygen’ (P < 1.25 x 10-10).
Conclusion: Our results suggest that single-cell gene expression data leads to more specific prioritization of likely etiologic gene sets and pathways. DEPICT-sc can easily be adapted to prioritize genes, pathways and cell types for other traits and diseases.