Please note that the account you create here is different than your Keystone Symposia account at www.keystonesymposia.org used to register for our multi-day conferences and is uniquely for viewing our virtual content.
Recently, the team at DeepMind has released a deep learning based protein folding structure prediction algorithm called Alphafold2. This algorithm has achieved remarkable scores at the CASP14 competition, and in many cases achieves remarkable accuracy. However, there are limitations to Alphafold2, and in some cases Alphafold2 staggeringly fails. Because domains are the functional unit of drug development, it would be useful to characterize the accuracy of AlphaFold2’s predictions at each domain. We performed a sequence based structure alignment between 1) the full length of the AlphaFold-predicted and protein structures, and 2) the AlphaFold-predicted and protein structures over just the specific PFAM domain. We calculated the root mean square deviation (RMSD) of the two structures over the domain and full length structures. The average RMSD value over the full length alignments were .75 Angstroms, with the median being .64 Angstroms. Importantly, this dataset includes structures proprietary to Lilly, and that were not present in the RCSB PDB that was used as AlphaFold’s training set. We found that of the 53 domains that had proteins with structures that covered at least 20% of the domain and had at least 90% sequence identity with the AlphaFold2 sequence, that all of them had RMSD values of less than 1 Angstrom, with 29 having RMSD values below .5 Angstroms, suggesting that AlphaFold is able to predict these 53 domains with very high confidence. There were an additional 305 domains that did not have sufficient data to perform this analysis, suggesting that proteins of interest containing those domains may be less reliable.