SeqRush

SeqRush is a prototype pangenome graph construction tool inspired by seqwish. Planned features include lock-free union-find structures and WFA2-based alignments.

Overview

SeqRush builds pangenome graphs by:

Planned: perform all-vs-all pairwise alignments using WFA2 (Wavefront Alignment)
Planned: use a lock-free union-find data structure to merge matching positions
Construct a graph where sequences are embedded as paths

The design aims to leverage UFRush (lock-free union-find) for true parallel graph construction once implemented.

Quick Start

# Create a test FASTA file
cat > test.fasta << EOF
>seq1
ATCGATCGATCGATCG
>seq2
ATCGATGGATCGATCG
>seq3
ATCGATCGATCGATGG
EOF

# Build the project with CLI support
cargo build --release --features cli
# Build the pangenome graph using flags
seqrush -s test.fasta -o test.gfa

# View the output
cat test.gfa

If you build without --features cli, use positional arguments instead:

seqrush test.fasta test.gfa

Features

Planned: Lock-free Parallel Processing via UFRush
Planned: Memory Efficient alignment using WFA2's UltraLow mode
Planned: All-vs-All Alignment of all input sequences
Configurable Parameters: Alignment scoring, minimum match length
Standard GFA Output: Compatible with tools like odgi and vg
Path Integrity: Sequences are perfectly reconstructible from the graph

Installation

Prerequisites

Rust via rustup (install the version pinned in rust-toolchain.toml or provide it via an offline setup)
Git

Build from Source

git clone https://github.com/KristopherKubicki/seqrush.git
cd seqrush
# rustup will automatically install the toolchain defined in `rust-toolchain.toml`
cargo build --release

To use the CLI flags, build the binary with the cli feature enabled:

cargo build --release --features cli

The binary will be available at target/release/seqrush.

Usage

Basic Usage

seqrush -s sequences.fasta -o graph.gfa

Enable the optional cli feature to use command-line flags.

Advanced Options

seqrush \
  -s sequences.fasta \        # Input FASTA file
  -o graph.gfa \              # Output GFA file
  -t 8 \                      # Number of threads (default: 1)
  -k 15 \                     # Minimum match length (default: 15)
  -S "0,5,8,2,24,1" \         # Alignment scores: match,mismatch,gap1_open,gap1_extend,gap2_open,gap2_extend
  -v                          # Verbose output

Alignment Scoring

The -S/--scores parameter accepts comma-separated values:

# Two-piece affine gap model (default)
-S "0,5,8,2,24,1"  # match=0, mismatch=5, gap1_open=8, gap1_extend=2, gap2_open=24, gap2_extend=1

# Single affine gap model
-S "0,5,8,2"  # match=0, mismatch=5, gap_open=8, gap_extend=2

# Custom scoring for high similarity sequences
-S "0,4,6,1"  # More permissive scoring

Note: Two-piece affine gap support requires compatible WFA2 library version.

Example Workflow

# Build a pangenome graph
seqrush -s genomes.fasta -o pangenome.gfa

# Visualize with odgi
odgi build -g pangenome.gfa -o pangenome.og
odgi viz -i pangenome.og -o pangenome.png

# Check graph statistics
odgi stats -i pangenome.og -S

Algorithm Details

Core Algorithm

Load Sequences: Read FASTA file and assign global positions to each base
Planned: initialize a UFRush instance with one element per base
Planned: align all sequence pairs using WFA2
Planned: process matches ≥ min_match_length and unite positions
Build Graph: Walk sequences to identify nodes and edges

Key Implementation Details

Planned: CIGAR Processing for WFA2's fine-grained output
Planned: Match Accumulation across CIGAR operations
Planned: Base Verification since 'M' may represent mismatch
Path Construction: Each sequence becomes a path through deduplicated nodes

Performance Characteristics

Time Complexity: O(n²×L) for n sequences of length L (pairwise alignment)
Space Complexity: O(N) where N is total sequence length
Parallel Scaling: Near-linear with thread count for alignment phase

Output Format

SeqRush generates GFA 1.0 format with:

H: Header with version
S: Segments (nodes) with single-character sequences
P: Paths representing input sequences
L: Links between adjacent nodes in paths

Example output:

H	VN:Z:1.0
S	1	A
S	2	C
S	3	G
P	seq1	1+,2+,3+	*
L	1	+	2	+	0M
L	2	+	3	+	0M

Development

Running Tests

# Run all tests
cargo test --features cli

# Run with verbose output
cargo test -- --nocapture

# Run a specific test
cargo test run_seqrush_writes_output

Building Documentation

cargo doc --open

Project Structure

seqrush/
├── src/
│   ├── lib.rs          # Library interface
│   └── main.rs         # CLI binary
├── tests/
│   └── integration_tests.rs
├── Cargo.toml
└── README.md

Limitations

Input sequences must fit in memory
Currently builds the entire graph at once (no streaming)
Single-character nodes (no compaction)
Limited to DNA sequences (ACGT alphabet)

Known Issues

Path integrity verification may fail for some sequences with complex indel patterns. The graph structure is correct, but path reconstruction needs improvement for certain edge cases.

Citation

If you use SeqRush in your research, please cite:

SeqRush: Lock-free parallel pangenome graph construction
Kristopher Kubicki, 2025
https://github.com/KristopherKubicki/seqrush

License

This project is licensed under the MIT License. See LICENSE for details.

Acknowledgments

SeqRush is inspired by:

seqwish by Erik Garrison
WFA2-lib by Santiago Marco-Sola
UFRush lock-free union-find implementation

Future Work

Integrate WFA2 for efficient pairwise alignments
Implement lock-free union-find (UFRush) for parallel graph construction
Support streaming graph output for large datasets
Extend sequence alphabet beyond DNA

Name		Name	Last commit message	Last commit date
Latest commit History 118 Commits
.codex		.codex
.github/workflows		.github/workflows
docs		docs
src		src
tests		tests
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
DESIGN.md		DESIGN.md
LICENSE		LICENSE
README.md		README.md
partition714.del.fa		partition714.del.fa
rust-toolchain.toml		rust-toolchain.toml
test_cases.sh		test_cases.sh
validate.py		validate.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

SeqRush

Overview

Quick Start

Features

Installation

Prerequisites

Build from Source

Usage

Basic Usage

Advanced Options

Alignment Scoring

Example Workflow

Algorithm Details

Core Algorithm

Key Implementation Details

Performance Characteristics

Output Format

Development

Running Tests

Building Documentation

Project Structure

Limitations

Known Issues

Citation

License

Acknowledgments

Future Work

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

pangenome/seqrush

Folders and files

Latest commit

History

Repository files navigation

SeqRush

Overview

Quick Start

Features

Installation

Prerequisites

Build from Source

Usage

Basic Usage

Advanced Options

Alignment Scoring

Example Workflow

Algorithm Details

Core Algorithm

Key Implementation Details

Performance Characteristics

Output Format

Development

Running Tests

Building Documentation

Project Structure

Limitations

Known Issues

Citation

License

Acknowledgments

Future Work

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages