rust-fastleng

fastleng is a tool created specifically for gathering sequence length information from a FASTQ, FASTA, or unaligned BAM file.

Why another FASTX stat tool?

While there are numerous tools that will generate summary statistics for FASTX files, I was not able to find one that computed all the desired length metrics for both FASTQ and FASTA. pyfastx was the closest, but it seems to limit certain statistics (e.g. N50) to only one file type.

In constrast, aside from the initial parsing, fastleng is agnostic to the file type. However, it is (currently) focused only on generating metrics derived from the sequence lengths. For more comprehensive metrics, it may be better to use tools like pyfastx or fastp.

Installation

All installation options assume you have installed Rust along with the cargo crate manager for Rust.

From Cargo

cargo install fastleng
fastleng -h

From GitHub

git clone https://github.com/HudsonAlpha/rust-fastleng.git
cd rust-fastleng
#testing optional
cargo test --release
cargo build --release
./target/release/fastleng -h
#for local install
cargo install --path .

Usage

Typical Usage

The following command will invoke fastleng on a given FASTQ file and redirect the results from stdout into a JSON file:

fastleng {data.fq.gz} > {output.json}

Example output

{
  "total_bases": 21750112406,
  "total_sequences": 1305936,
  "mean_length": 16654.807284583625,
  "median_length": 16600.0,
  "n10": 18849,
  "n25": 17833,
  "n50": 16739,
  "n75": 15842,
  "n90": 15209
}

total_bases - the total number of basepairs across all sequences in the input file
total_sequences - the total number of sequences (i.e. strings) contained in the input file
mean_length - the average length of the counted sequences
median_length - the median length of the counted sequences
n10, n25, n50, n75, n90 - the N-score of the sequences for 10, 25, 50, 75, and 90 respectively; these should be monotonically decreasing, respectively

Options to consider

-h - see full list of options and exit
-l, --length-json - enables the saving of the raw length counts to a specified JSON file
-o, --out-json - enabled used to specify the filename to write the length statistics to (default: stdout)

TODO List

Create an option for other N-score values (or maybe all integer N-score values)
If you have other length-based statistics, feel free to open a feature request on GitHub.

Performance notes

We have not performed formal benchmarking. Anecdotally, the vast majority of the run-time is spent loading the FASTX file, so the program is very I/O bound currently.

Reference

Fastleng does not currently have a pre-print or paper associated with it.

License

Licensed under either of

Apache License, Version 2.0 (LICENSE-APACHE or http://www.apache.org/licenses/LICENSE-2.0)
MIT license (LICENSE-MIT or http://opensource.org/licenses/MIT)

at your option.

Contribution

Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.

Name		Name	Last commit message	Last commit date
Latest commit History 54 Commits
.github/workflows		.github/workflows
src		src
test_data		test_data
.gitignore		.gitignore
Cargo.toml		Cargo.toml
LICENSE-APACHE		LICENSE-APACHE
LICENSE-MIT		LICENSE-MIT
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

rust-fastleng

Why another FASTX stat tool?

Installation

From Cargo

From GitHub

Usage

Typical Usage

Example output

Options to consider

TODO List

Performance notes

Reference

License

Contribution

About

Releases 3

Packages

Languages

License

HudsonAlpha/rust-fastleng

Folders and files

Latest commit

History

Repository files navigation

rust-fastleng

Why another FASTX stat tool?

Installation

From Cargo

From GitHub

Usage

Typical Usage

Example output

Options to consider

TODO List

Performance notes

Reference

License

Contribution

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 3

Packages 0

Languages

Packages