Promoting the development and evaluation of single-cell modelling methods using realistic synthetic data
Wouter Saelens1,2*, Robrecht Cannoodt1,2,3, Yvan Saeys1,2
1VIB Center for Inflammation Research; Ghent; Belgium; 2Ghent University; Ghent; Belgium
3Ghent University Hospital; Ghent; Belgium
* Corresponding author
The rapidly decreasing cost of single-cell profiling techniques have spurred the development of many new single-cell modelling algorithms. Yet, we have only uncovered the tip of the iceberg. Current methods, for example, cannot handle the concurrency of multiple intracellular dynamic processes, intercellular communication and the full complexity of differentiation processes (Wagner, 2016). Moreover, development and evaluation of improved methods is currently hindered as current public single-cell datasets are still frequently limited in number of cells and sequencing depth.
We therefore developed a first large-scale in silico single-cell data generator. It uses a detailed model of gene regulation based on differential equations, together with realistic models of cell differentiation and cell communication. Our method can generate synthetic data of individual cells with multiple parallel dynamic processes (such as differentiation and cell cycle), includes communication between cells, and can handle complex differentiation patterns such as consecutive branching and convergence events.
We will show how the synthetic data generator can be used to evaluate current methods related to pseudotemporal ordering, dimensionality reduction and regulatory network inference. Our initial results indicate that simple linear or bifurcating trajectories can be handled by some state-of-the-art methods (such as Monocle v2), although these methods frequently have difficulties dealing with more complex synthetic data models. We will therefore also illustrate how to accurately evaluate new proof-of-concepts of modelling methods for cell differentiation and cell interactions. We believe that, while an additional evaluation on real data is essential, realistic in silico validation is an important first evaluation step, especially in absence of real evaluation datasets.