Cell cycle characterization and discovery using principal graphs
Luca Albergante1, Andrei Zinovyev1, Emmanuel Barillot1
1Computational Systems Biology of Cancer, Unit 900, Institut Curie, Paris, France
The variability connected with the cell cycle constitutes one of the main source of heterogeneity in single cell transcriptomic data. It is therefore crucial to develop conceptual and computational tools aimed at inferring the most probable cell cycle stage of cells from available data. Such tools will be then instrumental to better comprehend cell cycle-independent heterogeneity and to promote a better characterization of the connections between proliferative dysregulation and pathological conditions (e.g., cancer).
To address this issue, we used a data-driven approach based on principal graphs to obtain circular and quasi-circular paths embedded into single cell RNA-Seq data and showed that such paths are associated with cell cycle progression. Our approach has been validated across several datasets both in mouse and human and has been applied to distill and extend previous information on the association of specific genes with the different stages of the cell cycle.
Our methodology was implemented using the R package rpgraph that we developed. The core functions of the package allow the construction and analysis of principal graphs with different topologies (curves, circles and trees at the moment) for arbitrary data. Specific functions designed to explore RNA-Seq data and to study the cell cycle are also included.
Our analysis produced several outcomes. First, we derived a new set of genes which display a periodic behavior, thus providing new genetic associations with the cell cycle. Second, we showed how different types of cells can have a different set of periodic genes, thus supporting tissue-specific cell cycle regulation. Finally, we associated cells, and thus gene expression, with a pseudo-time, thus allowing the exploration of the interplay between different genes over time.
Our findings support the idea that a clear trace of cell cycle progression is detectable in single cell RNA-Seq data and that objective methods can be used to assess the proliferative activity of cells and to derive their cell cycle stages. Future extensions of our approach will provide new insights into the topological structure of the different genetic programs of cells.