Check out the blog post for more information and full benchmarks!
I was inspired by the elegant data structure discussed in the MSFragger paper, and decided to implement an (open source) version of it in Rust - with great results.
Sage has excellent performance characteristics (5x faster than - the closed source - MSFragger), but does not sacrifice code quality or size to do so!
- Incredible performance out of the box
- Effortlessly cross-platform (Linux/MacOS/Windows), effortlessly parallel (uses all of your CPU cores)
- Fragment indexing strategy allows for blazing fast narrow and open searches
- MS3-TMT quantification (experimental!)
- Capable of searching for chimeric/co-fragmenting spectra
- FDR calculation using target-decoy competition, with built-in linear discriminant anlysis
- PEP calculation using a non-parametric model (KDE)
- Percolator/Mokapot compatible output
- Small and simple codebase
- Configuration by JSON files
- Hand-rolled, 100% pure Rust implementations of Linear Discriminant Analysis and KDE-mixture models for refinement of false discovery rates
- Both models demonstrate 1:1 results with scikit-learn, but have increased performance
- No need for a second post-search pipeline step
- Further boost PSM identification (by 1-3%) using prediction of retention times by a linear regression model
- Install the Rust programming language compiler
- Download Sage source code via git:
git clone https://github.com/lazear/sage.git
or by zip file - Compile:
cargo build --release
- Run:
./target/release/sage config.json
Once you have Rust installed, you can copy and paste the following lines into your terminal to complete the above instructions, and run Sage on the example mzML provided in the repository (a single scan from PXD016766)
git clone https://github.com/lazear/sage.git
cd sage
cargo run --release tests/config.json
Sage takes a single command line argument: a path to a JSON-encoded parameter file (see below).
Example usage: sage config.json
Running Sage will produce several output files:
- Record of search parameters (
results.json
) will be created that details input/output paths and all search parameters used for the search - MS2 search results will be stored as a Percolator-compatible (
<mzML path>.sage.pin
) file - this is just a tab-separated file, which can be opened in Excel/Pandas/etc - MS3 search results will be stored as a CSV (
<mzML path>.quant.csv
) if "quant" option is used in the parameter file
Sage search settings files have the following parameters:
Two notes:
- The majority of parameters are optional - only "database.fasta", "precursor_tol", and "fragment_tol" are required. Sage will try and use reasonable defaults for any parameters not supplied
- Tolerances are specified on the experimental m/z values. To perform a -100 to +500 Da open search (mass window applied to precursor), you would use
"da": [-500, 100]