Convert a FASTA alignment to SNP distance matrix
% cat test/good.aln
>seq1
AGTCAGTC
>seq2
AGGCAGTC
>seq3
AGTGAGTA
>seq4
TGTTAGAC
% snp-dists test/good.aln > distances.tab
Read 4 sequences of length 8
% cat distances.tab
snp-dists 0.7 seq1 seq2 seq3 seq4
seq1 0 1 2 3
seq2 1 0 3 4
seq3 2 3 0 4
seq4 3 4 4 0
snp-dists
is written in C to the C99 standard and only depends on zlib
.
brew install brewsci/bio/snp-dists
conda install -c bioconda -c conda-forge snp-dists
git clone https://github.com/tseemann/snp-dists.git
cd snp-dists
make
# run tests
make check
# optionally install to a specific location (default: /usr/local)
make PREFIX=/usr/local install
SYNOPSIS
Pairwise SNP distance matrix from a FASTA alignment
USAGE
snp-dists [options] alignment.fasta[.gz] > matrix.tsv
OPTIONS
-h Show this help
-v Print version and exit
-q Quiet mode; do not print progress information
-a Count all differences not just [AGTC]
-k Keep case, don't uppercase all letters
-m Output MOLTEN instead of TSV
-c Use comma instead of tab in output
-b Blank top left corner cell
URL
https://github.com/tseemann/snp-dists
Prints the name and version separated by a space in standard Unix fashion.
snp-dists 0.7.0
Don't print informational messages, only errors.
snp-dists 0.7.0,seq1,seq2,seq3,seq4
seq1,0,1,2,3
seq2,1,0,3,4
seq3,2,3,0,4
seq4,3,4,4,0
seq1 seq2 seq3 seq4
seq1 0 1 2 3
seq2 1 0 3 4
seq3 2 3 0 4
seq4 3 4 4 0
By default, all letters are (1) uppercased and (2) ignored if not A,G,T or C.
Normally one would not want to count ambiguous letters and gaps as a "difference" but if you desire, you can enable this option.
>seq1
NGTCAGTC
>seq2
AG-CAGTC
>seq3
AGTGNGTA
You may wish to preserve case, as you may wish lower-case characters to be masked in the comparison.
>seq1
AgTCAgTC
>seq2
AggCAgTC
>seq3
AgTgAgTA
seq1 seq1 0
seq1 seq2 1
seq1 seq3 2
seq1 seq4 3
seq2 seq1 1
seq2 seq2 0
seq2 seq3 3
seq2 seq4 4
seq3 seq1 2
seq3 seq2 3
seq3 seq3 0
seq3 seq4 4
seq4 seq1 3
seq4 seq2 4
seq4 seq3 4
seq4 seq4 0
Report bugs and give suggesions on the Issues page