Description
Bias and variance in Single Particle Analysis C.O.S. Sorzano, A. Jimenez-Moreno, D. Maluenda, M. Martınez, E. Ramirez-Aportela, R. Melero, A. Cuervo, J. Conesa, J. Filipovic, P. Conesa, L. del Cano, Y.C. Fonseca, J. Jimenez-de la Morena, P. Losana, R. Sanchez-Garcia, D. Strelak, E. Fernandez-Gimenez, F. de Isidro, D. Herreros, J.L. Vilas, R. Marabini, J.M. Carazo Center of Biotechnology, Spanish Natl. Research Council (CSIC) Cryo-Electron Microscopy (cryoEM) has become a well-established technique to elucidate the three-dimensional (3D) structure of biological macromolecules. Projection images from thousands of macromolecules assumed to be structurally identical are combined into a single 3D map that represents the Coulomb potential of the macromolecule under study. In this article, we discuss possible caveats along the image processing path and how to avoid them in order to have a reliable 3D structure. Some of these problems are very well known in the community and we may refer to them as sample related (like specimen denaturation at interfaces or non-uniform projection geometry leading to underrepresented projection directions). The rest are algorithmic related, and while some of them have been discussed in-depth in the literature, like using an incorrect choice of the initial volume, there are others that have received much less attention but, however, they are fundamental in any data analysis approach. Chiefly among them, we refer to instabilities in the estimation of many of the key parameters required for a correct three-dimensional reconstruction that happen all along the processing workflow and that may affect significantly the reliability of the whole process. In the field, the term overfitting has been coined to refer to some particular kind of artifacts. We argue that overfitting is actually the statistical bias in key steps of particle estimation in the 3D reconstruction process, including intrinsic algorithmic bias. We also show that common tools (FSC) and strategies (gold standard), that we normally use to detect or prevent overfitting, do not fully protect us against it. Alternatively, we propose that detecting the biases that lead to overfitting is much easier when addressed at the level of parameter estimation, rather than detecting it once we have combined the particle images into a 3D map. Parameter bias can be detected by comparing the results from multiple algorithms (or at least, independent executions of the same algorithm). Then, these multiple executions could be averaged in order to have a lower variance estimate of the underlying parameters.
Speaker(s):