A script to calculate pairwise SNP distances from a core-genome VCF file and generate a distance matrix. Designed for bioinformatics workflows, this tool provides an easy-to-use interface for analyzing genomic similarities and differences.
- Calculates pairwise SNP distances between samples in a VCF file.
- Outputs a distance matrix as a CSV file.
- Optionally simplifies the matrix by removing duplicate distances and diagonal entries using the
-simplifyflag. - Automatically orders samples by their average SNP distances for better pattern visualization.
- R (version 3.5 or higher)
vcfRpackage (installed automatically if missing)
Run the script from the command line:
Rscript snps-matrix.R [path to core-vcf] [-simplify (optional)]
- Generate Full Distance Matrix:
Rscript snps-matrix.R core.vcf
Output: distance_matrix_ordered.csv
- Generate Simplified Distance Matrix:
Rscript snps-matrix.R core.vcf -simplify
Output: distance_matrix_simplified.csv
- Rows/Columns: Sample IDs (from the VCF file).
- Values: SNP distances between pairs of samples.
- Simplified Matrix: Lower triangle and diagonal entries are replaced with empty cells.
For questions or issues, please contact sb474@st-andrews.ac.uk.