This repository contains an integrative project analyzing the genetic sequences of SARS-CoV-2 variants from the 20 countries with the highest reported cases. The project explores genetic similarities and differences between variants, focusing on their distribution across Asian, Hispanic, European, and African populations.
Evidence_2_Integrative_Project.Rmd
: R Markdown file containing the complete analysis, including hierarchical clustering, phylogenetic trees, and nucleotide composition comparisons.fastas/
: Directory containing.fasta
files with genetic sequences for each country included in the analysis.
The project includes:
-
Phylogenetic Analysis:
- Construction of a phylogenetic tree using the JC69 model.
- Clustering analysis to identify relationships between variants from different countries.
-
Sequence Composition:
- Analysis of nucleotide composition for variants.
- Visualization of DNA base counts for each variant.
-
Key Findings:
- Identification of groups with high genetic similarity.
- Insights into the origins and descent of SARS-CoV-2 variants.
- Clone this repository:
git clone https://github.com/carlosagalicia/SARS-CoV-2-Sequence-Analysis.git