Addressing confounding factors and multi-omics integration for single cell dataset using machine learning
Nigatu Ayele Adossa1, Leif Schauser1,
1Qiagen A/S; 2University of Turku
The complex nature of biological science can be more understood by integrating different cellular measurements at a single cell resolution. Recently, several researchers have identified novel cell types using single cell measurement techniques. Single cell cellular measurement techniques produce tremendous amount of data with higher noise. Therefore, there is high demand for development of statistical methods that deals with such noisy and big data. Integration of several multi omics data at single cell resolution would also give more accurate results in understanding the cellular system.
As part of the PhD project, we would like to address the challenge of both technical and biological variabilities that arise from both single cell RNA sequencing and single cell Bisulfite sequencing techniques.In addition, integrating the genome wide measurements of gene expressions, DNA methylations, transcription factor binding proteins and binding sites together with high throughput imaging for single cell analysis, we would like to report molecular markers that are crucial for differentiation and plasticity in helper T cells. The dataset has been generated under partner institutes of ENLIGHT-TEN consortium.
Statistical methods such as factor analysis and latent variable modeling is being employed in removing the technical, biological variations. Non-parametric method such as latent dirichlet allocation (LDA) is being implemented in cell type identification and differential expression analysis.Different machine learning methods such as ensemble methods and neural network with multi-layer perceptron (MLP) is being used to integrate multi-omics measurement data.