Mycobacterium Databases at BioCyc.org: Pathways, Omics Tools, and Enhanced Genome Annotation
BioCyc.org is a web portal for 18,000 sequenced microbes, including 235 Mycobacterium genomes. BioCyc couples high-quality curated data with a wide range of easy to use bioinformatics tools.
Each of the Mycobacterial databases in BioCyc  was constructed using a similar methodology. A series of computational inferences are applied to the annotated genome from GenBank, including prediction of metabolic reactions and associated metabolic pathways, transport reactions, operons, Pfam domains, and orthologs with other genomes in BioCyc. Next additional data are imported from related databases, including protein features and Gene Ontology terms from UniProt, protein localization data from PSORTDB, regulatory data from RegTransBase, and gene essentiality data from OGEE. We also imported drug screening data for more than 700 compounds from Ekins et al . Finally, manual curation was performed on the M. tuberculosis H37Rv genome to integrate relevant information from the experimental literature, including updating of gene functions, metabolic pathway information, and regulatory information. Mini-review summaries are authored for selected proteins and pathways, and curators add literature references, Gene Ontology terms, and evidence codes.
The resulting M. tuberculosis H37Rv database contains 287 metabolic pathways, 13,494 protein features (such as enzyme active sites and sequence variants), and 2,573 operons; and was curated from 3,905 ublications. During the curation process we have updated 373 gene functions. The database contains the equivalent of 151 textbook pages of mini-review summaries.
The BioCyc website provides extensive bioinformatics tools for searching and analyzing these databases and leveraging them for analysis of omics datasets. Genome-related tools include a genome browser, sequence search and alignment tools, and extraction of sequence regions. Pathway-related tools include pathway diagrams and navigation of zoomable organism-specific metabolic map diagrams. Operons, regulatory sites, and the full regulatory network can be displayed when such data are present. Comparative analysis tools enable comparisons of genome organization, of orthologs, and of pathway complements. Omics data analysis tools support enrichment analysis and painting of transcriptomics and metabolomics data onto individual pathways and the full metabolic map diagrams. The Omics Dashboard tool enables hierarchical exploration of omics datasets.
SmartTables enable users to construct and store tables of genes, metabolites, or pathways, and to perform multiple analyses. Pre-generated "special" SmartTables enable browsing and manipulation of complete lists of database objects, such as all transporters. SmartTable transformations enable easy conversions from one type of data into another. For example, convert a list of genes or metabolites into a list of all pathways in which those genes/metabolites participate.
 P.D. Karp et al., "The BioCyc collection of microbial genomes and metabolic pathways," Briefings in Bioinformatics, 2017. https://doi.org/10.1093/bib/bbx085.
 Ekins, S., Clark, A.M., and Sarker, M., J Cheminform 5(10):13 2013.