Pseudogenization as a source of genetic variability to the Mycobacterium tuberculosis complex
Naila C. Soler-Camargo1,2, Cristina K. Zimpel1,2, Taina Tainá Silva-Pereira1, Maurício F. Camacho3; André Zelanis3 Alexandre H. Aono3; Ana Marcia Sa Guimaraes1*
1Laboratory of Applied Research in Mycobacteria, Department of Microbiology, Institute of Biomedical Sciences, University of São Paulo, Brazil; 2Department of Preventive Veterinary Medicine and Animal Health, College of Veterinary Medicine, University of São Paulo, Brazil; 3Functional Proteomics Laboratory, Federal University of São Paulo (UNIFESP), São José dos Campos, SP.
* Corresponding author: firstname.lastname@example.org
Genomic analyses of the Mycobacterium tuberculosis complex (MTBC) are usually based on SNPs (single nucleotide polymorphisms) and indels (insertion or deletions) and only few have investigated its gene content. MTBC pseudogenes have never been comprehensively evaluated. Herein, we show for the first time that in silico predicted pseudogenes are source of genetic variability to MTBC members at the populational level. We developed a new methodology to analyze over 27,000 pseudogenes from 201 MTBC and M. canettii strains, and combined transcriptomics and proteomics of M. tuberculosis H37Rv (Mtb) to provide insights about pseudogenes’ expression. Our results indicate significant variability concerning the rate and conservancy of pseudogenes among different lineages and species of tuberculous mycobacteria, yet increased pseudogenization at certain functional classes and virulence factors. The rate of pseudogenization was significantly higher in M. africanum, M. bovis and M. tuberculosis L2 strains compared to M. tuberculosis L4 (p<0.001), and is not linked to host specialization or shift as observed in other clonal bacteria. As expected, M. canettii strains showed higher rate of pseudogenization when compared to MTBC, reflecting their recombinogenic genomes subjected to horizontal gene transfer. Most importantly, with the low conservation of pseudogenes among strains, we estimated that gene loci under pseudogenization correspond to almost 30% of the MTBC pan-genome. But despite the low conservancy, the same functional categories are hotspots of pseudogenization among strains. Transposases comprise a fair proportion of the annotated pseudogenes (~21%), but important genes related to metabolism and virulence are pseudogenized (e.g. PE/PPE genes, ESX-associated genes, Mce genes, Acyl-coA dehydrogenases, lipases, sigma factors, ABC transporters, etc), highlighting the importance of pseudogenes to the phenotypic plasticity of the MTBC. Transcriptomic analysis indicates that the transcription machinery of Mtb is able to fully transcribe most pseudogenes, indicating intact promoters and recent pseudogene evolutionary emergence. A proportion (~14%) of Mtb pseudogenes are also translated into proteins, suggesting that part of pseudogenes predicted in silico are actually functional genes, without neglecting the possibility of reversal of non-sense mutations or phase-variation. While indels (frameshifts) and IS elements (incomplete genes) were identified as the main genetic drivers of pseudogenization in these species, population bottlenecks and genetic drifts are likely the evolutionary processes acting on pseudogenes emergence over time, particularly in M. africanum, M. bovis and M. tuberculosis L2 strains. Thus, our study unveils a novel evolutionary perspective of the MTBC and underscore pseudogenization as evidence of genetic drift and source of genetic variability to the complex.