Mutation Rate Meta-Analysis of Human Pathogenic Bacteria

Introduction

Understanding genetic variations and their impacts is crucial in studying microbial evolution and antibiotic resistance. Deletions, missense mutation, and other types of mutations create variants, which result in differing degrees of protein alteration and play a significant role in the adaptability and survival of bacteria. Visualizing these mutations using circus plots and stacked bar plots allows researchers to identify patterns and trends across different strains. By integrating these visualizations with the theory of neutral evolution, which favors advantageous mutations that are more likely to persist, we can better comprehend the dynamics of bacterial evolution for antibiotic resistance in their genome. This project aims to process vast genomic datasets of Staphylococcus aureus and Pseudomonas aeruginosa to automate variant analysis and ultimately create a comprehensive database to enhance our understanding of genomic mutations and inform possible further genome studies against antibiotic resistance.

Data

The Variant Calling Format (VCF) was obtained from the National Center for Biotechnology Information (NCBI) Sequence Read Archives. The files retrieved were in FASTQ format, a widely used text-based format containing raw data of biological sequences. This format is essential as it includes the nucleotide sequence and its corresponding quality score (the accuracy of each nucleotide in a biological sequence).

Code

Concatenate CSV.R

Summary: Merges all the annotated CSVs together into one .csv file.

It utilizes dplr, readr, andd ggplot2 packages. In the code it retrieves the working directory and imports all the csv files of each SRA Run. These files are read into a list and merged into a single data frame with "bind_rows".

Annotation Impact Bar Plot.R

Summary: Visualizes each SRA Run mutation impact with the use of R.

It utilizes dplr, readr, and ggplot2 packages. Reading eacg independent SRA run csv file to filter out rows with missing values in Annotation impact and counts each occurence of each Annotation Impact for the SRA Run. The code then arranged the graph based on "low" impact in descending order creating a stacked bar plot.

QC plot_SA.R

Summary: Visualize viable strains for the experiment considering genomic range to the reference genome of the respective bacterium.

It utilizes tidyverse, ggthemes, and ggplot2 packages. The script then reads the CSV file into a data frame and uses ggplot2 to create a scatter plot of the data, mapping Coverage to the x-axis and Count._of._SRA_Run to the y-axis, with points colored by the SRA_Run variable. It adds a polynomial regression line of degree 3.5 to the plot.

Qxygen and Temperature

Summary: Visualize the count of temperaure and oxygen requirement.

It utilizes tidyverse and dplr. The script reads the CSV file where it added "missing" to all the missing values. Seperated Oxygen into a different dataset and did the same for temperature. Then using ggplot2 to make the chart.

Freq_of_CheckM

Summary: Visualizes the top 15 most frequent CheckM marker sets

It uses ggplot and dplr to read the csv script, and then prodced to use the CheckM.marker.set column. It counted the frequency/mention of each type, then only using the top 15 created a bar plot that potrays the top 15 most frequent CheckM sets and how much of them there are.

Annotation_Count

Summary: Visualizes the top 10 most frequent CheckM marker sets with regard to comparing pseudogenes and protien-coding

It uses ggplot and dplt to read and visualize the csv file. Importing the file, it counts the frequency of CheckMs, using it as a coloring scheme for the scatter plot which depict the rate of Annotated Count Gene Protien-coding for pseudogene.

Combined Plots

Summary: Combined visual of the ChecmM marker sets, scatter, density, bar plot

The code imports necessary packages like tidyverse and ggplot2 to analyze and visualize genomic data by reading CSV files, filtering the top 4 most frequent CheckM marker sets, summarizing their average gene counts, and creating multiple visualizations including a scatter plot for protein-coding genes and pseudogenes, a density plot, a bar plot with shortened marker names, and a summary table, all combined into a clean 2x2 layout.

Plots

Pseudomonas aeruginosa

Staphylococcus aureus

Results

Pseudomonas aeruginosa

The plot visualized in the Annotation Impact plot reveals that the generally more modifier and low mutation impacts being seen in each SRA Run.

Staphylococcus aureus

The Annotation Impact plot provides a visualization of the overall distribution of mutation impacts the SRA Run, highlighting a trend where most mutations occur. There is a prevalent trend in the modifier and low effect, showing a more minor influence on the genetic sequence.

Acknowledgements

UMBC Translational Life Science Technology (TLST) student interns Nhi Luu, Aimee Icaza, and Ketsia Pierrelus are supported by Merck Data Science Fellowship for Observational Research Program and the UMBC College of Natural and Mathematical Sciences. Nhi Luu developed the annotation scripts, R-Shiny framework and integration, also provided refrece scripts to be edited.

Name		Name	Last commit message	Last commit date
Latest commit History 314 Commits
Images		Images
code		code
data		data
plots		plots
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Mutation Rate Meta-Analysis of Human Pathogenic Bacteria

Introduction

Data

Code

Plots

Pseudomonas aeruginosa

Staphylococcus aureus

Results

Pseudomonas aeruginosa

Staphylococcus aureus

Acknowledgements

About

Releases

Packages

Contributors 4

Languages

License

PhyloGrok/VCFplots

Folders and files

Latest commit

History

Repository files navigation

Mutation Rate Meta-Analysis of Human Pathogenic Bacteria

Introduction

Data

Code

Plots

Pseudomonas aeruginosa

Staphylococcus aureus

Results

Pseudomonas aeruginosa

Staphylococcus aureus

Acknowledgements

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages