Analysis framework for subcellular mRNA localization
Aubin Samacoits2, Racha Chouaib1, Abdel Traboulsi1, Adam Saffiedine1, Marion Peter1, Edouard Bertrand1, Thomas Walter3*, Florian Mueller2*.
1IGMMMontpellier, UMR 5535 CNRS - Montpellier , France; 2IMOD, Institut Pasteur and CNRS UMR - 3691 - Paris, France; 3MINES ParisTech, PSL, Institut Curie, INSERM U900 - Paris, France
While the studies on gene expression have traditionally focused on the expression level, the interest in sub-cellular mRNA localization has greatly increased in past years.
With single molecule FISH (smFISH) it is possible to visualize single mRNA molecules and hence investigate their spatial distribution in individual cells. Moreover, recent experimental advances allow performing smFISH on larger scales opening the door for single cell transcriptomics.
With large amounts of image data produced by such approaches comes the need of a validated statistical framework for the analysis of mRNA localization.
The overall strategy is to map the distribution of spatial coordinates of single mRNAs inside each cell to a feature space and then use machine learning to identify the different mRNA localization patterns. While this approach is in principle promising, the main problem remains that in the absence of annotated data sets, there is no way of assessing the quality of such an analysis pipeline.
Here, we present a simulation environment for 3D smFISH with non-random mRNA localization. We base these simulations on experimental data, providing accurate 3D shapes for cells and nuclei and simulate localization patterns based on experimental observations.
Based on this simulated image database, we developed and validated an analysis workflow that outperformed existing mRNA localization analysis tools. We lastly applied this workflow to experimental data, and were able to distinguish different mRNA localization patterns.
Altogether, our tool overcomes the current problem of unvalidated ad-hoc procedures for the analysis of localization patterns and therefore provides a solid and rigorous basis for the biological conclusions drawn from these rich and challenging data sets.