mzrecal recalibrates mass spectrometry (MS1) data in mzML format, using peptide identifications in mzIdentML. mzRecal uses calibration functions based on the physics of the mass analyzer (FTICR, Orbitrap, TOF). The recalibration procedure was originally developed by Magnus Palmblad [1][2]. See also msRecal and recal2 for more information on the predecessors of mzRecal. Consuming and producing data in the same, open standard, format (mzML), mzRecal can be inserted into virtually any modular proteomics data analysis workflow, similar to msRecal [3]. This latest iteration of the software was described in an Application Note by Marissen and Palmblad in 2021 [4].
Check section Usage for a more complete description.
Ready-to-run executables of mzRecal for Linux and Microsoft Windows can be downloaded from https://github.com/524D/mzrecal/releases/latest (under "assets"). These executables have no external dependencies.
mzRecal is written in Go. The software was tested with Go version 1.16
On any recent Ubuntu/Debian, to install the prerequisites and download/build the executable:
- install Go using default install options
sudo apt install git
git clone https://github.com/524D/mzrecal
cd mzrecal; ./build.sh
The executables (both for Linux and for Windows) are put in directory ~/tools
.
On Windows, to install the prerequisites and download/build the executable:
- install Go using default install options
- Install git using default install options
- Restart Windows to add newly installed software to the PATH
- Open git bash (from the Windows start menu)
- Get mzRecal. From git bash prompt:
git clone https://github.com/524D/mzrecal
- Build mzRecal. From git bash prompt:
cd mzrecal; ./build.sh
. The executables (both for Windows and for Linux) are put in directorytools
relative to the user's home directory.
mzRecal uses file formats specified by the Proteomics Standards Initiative (PSI), notably mzML and mzIdentML.
For recalibration, a peak-picked mzML file and corresponding
mzIdentML (file extension .mzid) file are needed as input.
Running mzrecal
produces a recalibrated mzML file, plus a file with recalibration
parameters (.json format). The latter can be used to manually inspect
the calibration for each spectrum.
Note that the output mzML file will not contain the index wrapper (which is optional according to the mzML specification, but still required by some software). The msconvert program from the ProteoWizard toolkit is recommended to add the index.
Recalibration affects the MS1 spectra as well as the precursor masses of the MS2 spectra. Search engines commonly report the difference between theoretical mass and measured mass for identified peptides. The following plot shows the improvement of mzRecal on an Orbitrap and on a TOF dataset. This plot was made by running plot-recal.R (included in the mzRecal repository)
The current version of the code embeds two internal Go packages, one for reading mzIdentML and one for reading/writing mzML files. These packages will likely be split into a separate module at a later time.
The following is printed by running mzrecal -help
USAGE:
mzrecal [options] <mzMLfile>
This program can be used to recalibrate MS data in an mzML file
using peptide identifications in an accompanying mzID file.
OPTIONS:
-acceptprofile
Accept non-peak picked (profile) input.
This is a kludge, and will be removed when mzRecal can perform peak-picking.
By setting "acceptprofile", the value of option "calmult" is automatically
set to 0 and the default of "minpeak" is set to 100000
-cal filename
filename for output of computed calibration parameters
-calmult int
only the topmost (<calmult> * <number of potential calibrants>)
peaks are considered for computing the recalibration. <1 means all peaks. (default 10)
-charge range
charge range of calibrants, or the string "ident". If set to "ident",
only the charge as found in the mzIdentMl file will be used for calibration. (default "1:5")
-debug range
Print debug output for given spectrum range e.g. 3:6
-empty-non-calibrated
Empty MS2 spectra for which the precursor was not recalibrated.
-func function
recalibration function to apply. If empty, a suitable
function is determined from the instrument specified in the mzML file.
Valid function names:
FTICR, TOF, Orbitrap: Calibration function suitable for these instruments.
POLY<N>: Polynomial with degree <N> (range 1:5)
OFFSET: Constant m/z offset per spectrum.
-mincals int
minimum number of calibrants a spectrum should have to be recalibrated.
If 0 (default), the minimum number of calibrants is set to the smallest number
needed for the chosen recalibration function plus one. In any other case, if
the specified number is too low for the calibration function, it is increased to
the minimum needed value.
-minpeak float
minimum peak intensity to consider for computing the recalibration. (default 0)
-mzid filename
mzIdentMl filename
-o filename
filename of recalibrated mzML
-ppmcal float
0 (default): remove outlier calibrants according to HUPO-PSI mzQC,
the rest is accepted.
> 0: max mz error (ppm) for accepting a calibrant for calibration
-ppmuncal float
max mz error (ppm) for trying to use calibrant for calibration (default 10)
-quiet
Don't print any output except for errors
-rt range
rt window range(s) (default "-10.0:10.0")
-scorefilter string
filter for PSM scores to accept. Format:
<CVterm1|scorename1>([<minscore1>]:[<maxscore1>])...
When multiple score names/CV terms are specified, the first one on the list
that matches a score in the input file will be used.
The default contains reasonable values for some common search engines
and post-search scoring software:
MS:1002257 (Comet:expectation value)
MS:1001330 (X!Tandem:expectation value)
MS:1001159 (SEQUEST:expectation value)
MS:1002466 (PeptideShaker PSM score)
(default "MS:1002257(0.0:1e-2)MS:1001330(0.0:1e-2)MS:1001159(0.0:1e-2)MS:1002466(0.99:)")
-specfilter range
range of spectrum indices to calibrate (e.g. 1000:2000).
Default is all spectra
-stage int
0 (default): do all calibration stages in one run
1: only compute recalibration parameters
2: perform recalibration using previously computed parameters
-verbose
Print more verbose progress information
-version
Show software version
BUILD-IN CALIBRANTS:
In addition to the identified peptides, mzrecal will also use
for recalibration a number of compounds that are commonly found in many
samples. These compound are all assumed to have +1 charge. The following
list shows the build-in compounds with their (uncharged) masses:
cyclosiloxane6 (444.112748)
cyclosiloxane7 (518.131539)
cyclosiloxane8 (592.150331)
cyclosiloxane9 (666.169122)
cyclosiloxane10 (740.187913)
cyclosiloxane11 (814.206705)
cyclosiloxane12 (888.225496)
ENVIRONMENT VARIABLES:
When environment variable MZRECAL_DEBUG=1, extra information is added to the
JSON file that can help checking the performance of mzrecal.
USAGE EXAMPLES:
mzrecal yeast.mzML
Recalibrate yeast.mzML using identifications in yeast.mzid, write recalibrated
result to yeast-recal.mzML and write recalibration coefficients yeast-recal.json.
Default parameters are used.
mzrecal -ppmuncal 20 -scorefilter 'MS:1002257(0.0:0.001)' yeast.mzML
Idem, but accept peptides with 20 ppm mass error and Comet expectation value <0.001
as potential calibrants
NOTES:
The mzML file that is produced after recalibration does not contain an index. If an
index is required, we recommend post-processing the output file with msconvert
(http://proteowizard.sourceforge.net/download.html).
The authors gratefully acknowledge prior contributions from co-authors and collaborators in the development and testing of prior installments of the software.
mzRecal was made possible in part due to funding from the ELIXIR Implementation Study "Crowd-sourcing the annotation of public proteomics datasets to improve data reusability".
[1] Palmblad M, Bindschedler LV, Gibson TM, Cramer R (2006). Automatic internal calibration in liquid chromatography/Fourier transform ion cyclotron resonance mass spectrometry of protein digests. Rapid Commun. Mass Spectrom. 2006;20(20):3076-80.
[2] Palmblad M, van der Burgt YEM, Dalebout H, Derks RJE, Schoenmaker B, Deelder AM (2009). Improving mass measurement accuracy in mass spectrometry based proteomics by combining open source tools for chromatographic alignment and internal calibration. J. Proteomics. 2009;72(4):722-4.
[3] de Bruin JS, Deelder AM, Palmblad M (2012). Scientific workflow management in proteomics. Mol. Cell. Proteomics. 2012 Jul;11(7):M111.010595.
[4] Marissen R, Palmblad M (2021). mzRecal: universal MS1 recalibration in mzML using identified peptides in mzIdentML as internal calibrants. Bioinformatics. 2021 Feb 4;btab056.