Translate long fasta headers to short - and back!
Your alignment program X doesn't allow strings longer than n characters, but all your info is in the fasta headers of your file. What to do?
Use translate_fasta_headers.pl on your fasta file to create short labels and
a translation table. Run your program X, and then back-translate your fasta
headers by running translate_fasta_headers.pl again!
And if you created a tree with the short (or long) labels, try to
back-translate using replace_taxon_labels_in_newick.pl!
If you only wish to transform your long fasta headers to short, without keeping
the information about how they where translated, the quick solution might be to
use awk:
$ awk '/>/{$0=">Seq_"++n}1' long.fas
But, if you want to be able to back-translate, read on!
Replace fasta headers with headers taken from tab delimited file. If no tab file is given, the (potentially long) fasta headers are replaced by short labels "Seq_1", "Seq_2", etc, and the short and original headers are printed to a translation file.
If you wish, you may choose your own prefix (instead of Seq_). This could be
handy if, for example, you wish to concatenate files.
The script for translating labels in Newick trees is somewhat limited in capacity due to the restrictions and/or peculiarities of the Newick tree format. Use with caution.
$ translate_fasta_headers.pl [options] <fasta input>
$ replace_taxon_labels_in_newick.pl [options] <newick file>
From long to short labels:
$ translate_fasta_headers.pl --out=short.fas long.fas
And back, using a translation table:
$ translate_fasta_headers.pl --tabfile=short.fas.translation.tab short.fas
Slightly shorter version (see note about the --out option below about the
name of the auto-generated translation file!):
$ translate_fasta_headers.pl long.fas > short.fas
$ translate_fasta_headers.pl -t long.fas.translation.tab short.fas
Handle compressed (gzip) fasta files:
$ translate_fasta_headers.pl long.fas.gz | gzip > short.fas.gz
$ translate_fasta_headers.pl -t long.fas.gz.translation.tab short.fas.gz
Read fasta format from STDIN
$ cat short.fas | ./translate_fasta_headers.pl -t short.fas.translation.tab -
Use your own prefix:
$ translate_fasta_headers.pl --prefix='Own_' long.fas
Translate short seq labels in Newick tree to long:
$ replace_taxon_labels_in_newick.pl -t long.fas.translation.tab short.fas.phy
Print seq labels in Newick tree:
$ replace_taxon_labels_in_newick.pl -l short.fas.phy
-t, --tabfile=<filename>-- Specify tab-separated translation file with unique "short" labels to the left, and "long" names to the right. Translation will be from left to right.-o, --out=<filename>-- Specify output file for the fasta sequences. Note: If--out=<filename>is specified, the translation file will be named<filename>.translation.tab. This simplifies back translation. If, on the other hand,--outis not used, the translation file will be named after the infile (and written in the same folder as the infile)!-i, --in=<filename>-- Specify name of fasta file (can be gzip-compressed). Can be skipped as script reads files from ARGV.-n, --notab-- Do not create a translation file.-p, --prefix=<string>-- User your own prefix (default isSeq_). A numerical will be added to the labels (e.g.Own_1,Own_2, ...)-v, --version-- Print version number and quit.-h, --help-- Show this help text and quit.
-t, --tabfile=<translation.tab>-- File with table describing what will be translated with what.-l,-p, --labels-- Print taxon labels in tree. Option does not require a translation table.--no-quotemeta-- Turn off escaping of special symbols in the replacements.-o, --out=<out.file>-- Print to outfileout.file, else to STDOUT.-v, --version-- Print version number and quit.-h, --help-- Help text.
Johan Nylander
translate_fasta_headers.pl-- Perl scriptreplace_taxon_labels_in_newick.pl-- Perl scriptdata/long.fas-- Example file with long fasta headersdata/long.fas.gz-- Example file with long fasta headers, gzip compresseddata/short.fas.translation.tab-- Example translation tabledata/short.fas-- Example output with short fasta headersdata/short.fas.phy-- Example Newick tree with short labelsREADME.md-- Documentation, markdown format
Copyright (c) 2013-2026 Johan Nylander