The abundant small proteome of Mycobacterium tuberculosis
Todd Gray1, Carol Smith1, Jill Canestrari1, Jing Wang1, Matthew Champion2, Keith Derbyshire1, Joseph Wade1
1NYSDOH Wadsworth Center, Division of Genetics, Albany, NY 12208
2University of Notre Dame, Dept. of Chemistry and Biochemistry South Bend, IN 46556
Mycobacterium tuberculosis molecular microbiology requires that its genome is accurately annotated to understand the myriad RNA and protein products it encodes. Open reading frame (ORFs) have largely been drawn by gene prediction algorithms that often fail to predict ORFs with non-canonical features, such as those that are short, overlap, or lack 5’ UTRs. While leaderless mRNAs generate clearly identifiable transcriptional (RNA-seq) and translational (Ribo-seq) profiles, traditional leadered translation initiation sites can be obscured by surrounding contours present in Ribo-seq profiles. Here, the antibiotic retapamulin was applied to Mycobacterium tuberculosis cultures to arrest initiating ribosomes at sites of active translation initiation. This initiating Ribo-seq approach (Ribo-RET) identified annotated ORFs and more than a thousand, novel short ORFs (<50 amino acids). Sites of putative translation initiation are enriched in features required for initiation, including start codons, Shine-Dalgarno sequences, and unstructured mRNA contexts. As an independent method of validation, we examined standard Ribo-seq data sets and found that the ORFs identified by the initiation codon exhibit elevated ribosome occupancy profiles typical of actively translated ORFs. Leveraging the GC-rich genome and associated codon-bias of mycobacteria, we found that novel short ORFs collectively lacked the clear codon bias exhibited by annotated genes, suggesting that many of the encoded novel small proteins are not functional – constituting translational noise. Conversely, other short ORFs encode small proteins that are highly conserved, suggesting that they are functional. We have independently validated the expression of many small proteins using proteomics optimized for small proteome analyses. Too small for catalytic motifs, conserved small proteins likely play supporting or regulatory roles. The unexpectedly large number of actively translated ORFs we identified throughout the Mycobacterium tuberculosis genome challenges current definitions and perceptions of genome annotation for all bacterial species.
This work was supported by NIH 1RO1AI09719101 and RO1 1R01GM139277