Faster FASTA is a collection of command-line utilities for processing FASTA and FASTQ files, memory-mapped or streamed from external storage or via stdin.
It's a faster SIMD-accelerated alternative to pure Go seqkit and C++ fastp tools.
It's implemented in Rust with StringZilla to provide high-performance functionality with auto-format detection (@ vs > header inspection).
Multi-Format tools for FASTA & FASTQ:
fasta-dedup- remove duplicate sequencesfasta-sample- randomly sample sequences using reservoir samplingfasta-sort- sort by name, sequence, or lengthfasta-revcomp- reverse complement DNA sequences (quality also reversed for FASTQ)fasta-dna2rna- convert DNA to RNA (T → U)
FASTQ-specific tools:
fastq-filter- filter by quality, length, and N-contentfastq-trim- quality-based and fixed-position trimmingfastq-stats- comprehensive statistics with histogramsfastq-to-fasta- format conversion (drop quality scores)fastq-interleave- merge paired-end files (R1 + R2 → interleaved)fastq-deinterleave- split interleaved file (interleaved → R1 + R2)
cargo install --git https://github.com/gata-bio/faster-fasta # install from GitHub
cargo install --path . --force # or install from local cloneRemove duplicates:
fasta-dedup sequences.fasta -o unique.fastaSample 1000 sequences:
fasta-sample sequences.fasta --count 1000 -o sample.fastaSort by length:
fasta-sort --length sequences.fasta -o sorted.fastaReverse complement:
fasta-revcomp sequences.fasta -o revcomp.fastaConvert to RNA:
fasta-dna2rna sequences.fasta -o rna.fastaFASTQ quality filtering and trimming:
# Keep reads with mean Q≥25 and length ≥75
fastq-filter reads.fastq --min-quality 25 --min-length 75 -o filtered.fastq
# Trim low-quality tails and drop short reads
fastq-trim reads.fastq --quality-cutoff 20 --trim-tail 5 --min-length 50 -o trimmed.fastqFASTQ stats and format conversions:
fastq-stats reads.fastq --histogram | head # quick QC summary
fastq-to-fasta reads.fastq -o reads.fasta # drop qualitiesPaired-end juggling:
fastq-interleave R1.fastq R2.fastq -o interleaved.fastq
fastq-deinterleave interleaved.fastq -1 out_R1.fastq -2 out_R2.fastqAll utilities support stdin and stdout for composability:
cat sequences.fasta | fasta-dedup | fasta-sort -l -r > output.fastaConsider pulling some traditional dataset, like the UniProt Swiss-Prot database and the paired Escherichia coli (E. coli) Illumina reads, to benchmark performance.
curl -O ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/complete/uniprot_sprot.fasta.gz && \
gunzip uniprot_sprot.fasta.gz && \
grep -c '^>' uniprot_sprot.fasta # contains 573'661 sequences
curl -L -O ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR250/013/SRR25083113/SRR25083113_1.fastq.gz SRR25083113_1.fastq.gz && \
gunzip SRR25083113_1.fastq.gz && \
grep -c '^@' SRR25083113_1.fastq # contains 1'181'120 sequences
curl -L -O ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR250/013/SRR25083113/SRR25083113_2.fastq.gz SRR25083113_2.fastq.gz && \
gunzip SRR25083113_2.fastq.gz && \
grep -c '^@' SRR25083113_2.fastq # contains 1'181'120 sequencesRun following commands to compare the performance of fasta-dedup against a traditional awk approach for removing duplicate sequences:
time fasta-dedup uniprot_sprot.fasta -o unique_faster.fasta
grep -c '^>' unique_faster.fasta # prints 485'423 sequences after 0.4s
time awk '/^>/ {if (seq != "" && !seen[seq]++) {print header; print seq} header = $0; seq = ""; next} {seq = seq $0} END {if (seq != "" && !seen[seq]++) {print header; print seq}}' uniprot_sprot.fasta > unique_awk.fasta
grep -c '^>' unique_awk.fasta # prints 485'423 sequences after 11.3sYou can also compare against a popular toolkit like seqkit:
brew install seqkit hyperfine
# Deduplication: 0.4s vs 1.1s
hyperfine \
'fasta-dedup uniprot_sprot.fasta -o /tmp/ff.fasta' \
'seqkit rmdup -s uniprot_sprot.fasta -o /tmp/seqkit.fasta' --warmup 1
# Sorting by length: 0.8s vs 2.9s
hyperfine \
'fasta-sort --length SRR25083113_1.fastq -o /tmp/ff_sorted.fastq' \
'seqkit sort -l SRR25083113_1.fastq -o /tmp/seqkit_sorted.fastq' --warmup 1
# Sampling (10% fraction): 0.17s vs 0.20s
hyperfine \
'fasta-sample SRR25083113_1.fastq --fraction 0.1 -o /tmp/ff_sample.fastq' \
'seqkit sample -p 0.1 SRR25083113_1.fastq -o /tmp/seqkit_sample.fastq' --warmup 1
# FASTQ stats
hyperfine \
'fastq-stats SRR25083113_1.fastq' \
'seqkit stats SRR25083113_1.fastq' --warmup 1