MutVis: Automated framework for analysis and visualization of mutational signatures in pathogenic bacterial strains
Akshatha Prasanna* and Vidya Niranjan
Department of Biotechnology, R.V. College of Engineering, Bengaluru, INDIA.
Mutation is one of the main forces driving nucleotide content variation in bacteria. Characterizing mutational bias causing mutation signatures across species or within genomes of bacteria helps in understanding their evolution and adaptation. In recent years, mutational signature analysis has become a routine practice in cancer genomics for diagnosis and classification. However, an integrated framework for analysis and visualization of mutational signatures in bacterial genome is lacking. Hence, we aim to develop an integrated, automated, open-source and user-friendly framework called MutVis to analyze and classify bacterial whole genome data based on mutational signatures. The current framework integrates various open-source tools and is scripted using Python, R programming and Snakemake workflow management software. Snakemake implementation eases the bioinformatics analysis by overcoming the hurdles of setting up the tools and executing command line tools. We demonstrated this study using geographically different resistant strains of Mycobacterium tuberculosis world-wide, downloaded from PATRIC TB-ARC Antibiotic Resistance Catalog (n=963). Our study supports variant calling, processing of VCF files, transition and transversion graphical representation, generation of mutational count matrix, graphical visualization of base-pair substitution spectrum (BPS) and mutation signatures. The base-pair substitution spectrum is visualized through the plot of 96 trinucleotide mutation types to study the geographic specific variations. Mutation signature is extracted based on the BPS and the contribution of each signature is estimated for each geographic type. The geographic types are hierarchically clustered using heatmap plot based on the similarity of derived mutation signatures. This provides information on active signatures in the individual sample and helps to detect clusters of similar mutational process.