Deep-ute: Single-cell RNA sequencing imputation using deep neural networks
Simon LM1, Eraslan G1, Theis FJ1,*
1Helmholtz Zentrum München, German Research Center for Environmental Health, Institute of Computational Biology, Ingolstädter Landstr. 1, 85764 Neuherberg, Germany
Single-cell RNA sequencing (scRNA-seq) technology profiles gene expression within single cells. Due to the very small amounts of starting material, capturing the entirety of RNA molecules is challenging and single genes can be missed. This “dropout” effect generates a large number of zeros in the gene expression matrix and is attenuated in the more recent droplet based single-cell technologies that can profile up to tens of thousands of single cells with relatively little read coverage. Because of the high correlation structure inherent in gene expression data, information from other genes can be used to impute gene expression of affected genes and correct the dropout effect. We have developed a deep neural network approach to correct gene expression estimates in scRNA-seq data using the equivalent of dropout in scRNA-seq to the dropout regularization technique of neural networks. We predict actual expression levels of dropout genes using expression information of related cells within an autoencoder setting, which is similar to the matrix completion approach popularized in the Netflix challenge. To evaluate the performance we compare our approach to common imputation techniques including Least Absolute Shrinkage and Selection Operator (LASSO), k-Nearest Neighbors algorithm (KNN) and Random Forest. We applied the various imputation strategies to three real scRNA-seq data sets with up to tens of thousands of cells. To assess prediction accuracy we simulated dropout, analyzed mixed species data and examined within-versus-between pathway correlation structure.
You must be logged in and own this product in order to post comments.