Skip to content

jiehua1995/MotifScan

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MotifScan

MotifScan logo

Streaming, low-memory motif scanning CLI in Rust

Exact matching for FASTA and FASTQ reads, with optional reverse-complement scanning, CSV motif tables, read-level hit reports, and Aho-Corasick acceleration for larger motif sets.

MotifScan is a streaming, low-memory, multi-threaded Rust CLI for exact motif scanning in FASTA and FASTQ reads.

Build Dependency

  • Rust toolchain for building from source

Features

  • Exact matching only
  • Optional reverse-complement scanning
  • Single-motif or CSV motif input
  • Optional read-level hit output
  • FASTA, FASTQ, FASTA.GZ, and FASTQ.GZ support
  • Aho-Corasick acceleration when scanning many motifs

Installation

cargo build --release

Binary path:

./target/release/motifscan

Version and citation:

motifscan -v
motifscan --version

Quick Start

Multiple motifs:

motifscan count \
  -i reads.fastq \
  --motifs motifs.csv \
  --revcomp \
  -o count.csv

Single motif:

motifscan count \
  -i reads.fa \
  --motif ATTATGAGAATAGTGTG \
  --motif-name motif1 \
  -o count.csv

Read-level hits:

motifscan count \
  -i reads.fastq \
  --motifs motifs.csv \
  --report-read-hits read_hits.csv \
  -o count.csv

Main Options

  • -i, --input <FILE>: input reads file
  • --motif <SEQUENCE>: one motif provided on the command line
  • --motif-name <NAME>: name used for --motif, default motif
  • --motifs <FILE>: two-column CSV motif table
  • --revcomp: also scan reverse complements
  • -t, --threads <INT>: worker threads
  • --progress: show progress on stderr
  • --verbose: enable info-level logs
  • --debug: enable debug-level logs
  • -o, --output <FILE>: summary CSV output
  • --report-read-hits <FILE>: optional read-level hit CSV output

Motif CSV Format

name,sequence
motif1,ATTATGAGAATAGTGTG
motif2,TTCATTCATGGTGGCAGTAAAATGTTTATTGTG
motif3,ATGAA

Rules:

  • Comma-separated only
  • Optional header row
  • Exactly two columns: name,sequence
  • Motifs must use exact bases only: A, C, G, T, U

Output CSV Columns

Summary:

motif,sequence,length,reads_with_hit,total_hits,forward_hits,revcomp_hits

Read hits:

read_id,motif,strand,position,matched_sequence

Notes

  • Input sequences are normalized to uppercase before matching.
  • Overlapping hits are counted.
  • Palindromic motifs are not double-counted in reverse-complement mode.
  • If a motif is longer than a read, that read is skipped for that motif.
  • FASTQ input is parsed as standard four-line records.
  • The scanner currently supports exact matching only.

Citation

Just mention this repository or cite like:

@software{motifscan,
  author = {jiehua1995},
  title = {MotifScan},
  url = {https://github.com/jiehua1995/MotifScan},
  version = {0.1.6}
}

Releases

If you do not want to build from source, you can download a prebuilt artifact from GitHub Releases generated by CI. Local builds are still recommended because they are the best way to ensure the binary matches your environment.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages