timsread
is a high-performance C++ tool for converting Bruker TimsTOF (TDF) files to MGF format for MS/MS proteomics analysis. This repository includes both the standalone C++ executable and a Nextflow pipeline for automated, scalable processing.
- Download Required SDK and Libraries
- Extract Files
- Install System Dependencies
- Set Up Include Paths
- Compile
- Run
- Nextflow Pipeline
- Performance Notes
- Quality Comparison
- License
- Bruker timsdata SDK: TDF-SDK 2.21 (6 MB)
- Download the file:
timsdata-2.21.0.4.zip
(or latest version)
- Download the file:
- CppSQLite3: CppSQLite3 GitHub
- Download the source:
CppSQLite-master.zip
(or clone the repo)
- Download the source:
- Extract
timsdata-2.21.0.4.zip
to a directory, e.g.,Z:\Download\timsdata-2.21.0.4
. - Extract
CppSQLite-master.zip
to a directory, e.g.,Z:\Download\CppSQLite-master
.
sudo apt-get update
sudo apt-get install -y g++ sqlite3 libsqlite3-dev
You will need the following include directories:
CppSQLite-master/src
(containsCppSQLite3.h
andCppSQLite3.cpp
)timsdata-2.21.0.4/timsdata/include/c
(containstimsdata.h
)timsdata-2.21.0.4/timsdata/examples/timsdataSampleCpp/timsdataSampleCpp
(containstimsdata_cpp.h
)
From the project directory, run:
g++ -O3 -march=native -I/mnt/z/Download/CppSQLite-master/src -I/mnt/z/Download/timsdata-2.21.0.4/timsdata/include/c -I/mnt/z/Download/timsdata-2.21.0.4/timsdata/examples/timsdataSampleCpp/timsdataSampleCpp -o timsread timsread.cpp -L/mnt/z/Download/CppSQLite-master/src -lCppSQLite3 -L/mnt/z/Download/timsdata-2.21.0.4/timsdata/linux64 -ltimsdata -lsqlite3
- Adjust the
/mnt/z/Download/...
paths if your files are located elsewhere. - The
-O3 -march=native
flags enable optimizations for better performance. - On Windows, use the appropriate path format and a compatible compiler (e.g., MSVC).
The program converts Bruker TDF files to MGF format for MS/MS spectra and optionally extracts MS1 data.
- PXD045439:
230317_SIGRID_10_Slot1-41_1_4086.d
# Download example dataset
wget https://ftp.pride.ebi.ac.uk/pride/data/archive/2024/06/PXD045439/230317_SIGRID_10_Slot1-41_1_4086.d.tar
tar xvf 230317_SIGRID_10_Slot1-41_1_4086.d.tar
# sanity check
sqlite3 "230317_SIGRID_10_Slot1-41_1_4086.d/analysis.tdf" "SELECT COUNT(*) as null_mz FROM Precursors WHERE MonoisotopicMz IS NULL; SELECT COUNT(*) as negative_mz FROM Precursors WHERE MonoisotopicMz < 0;"
5314
0
set library path and run:
# Extract MS/MS data only (default and faster)
LD_LIBRARY_PATH=/mnt/z/Download/timsdata-2.21.0.4/timsdata/linux64 ./timsread 230317_SIGRID_10_Slot1-41_1_4086.d
# Extract both MS/MS and MS1 data
LD_LIBRARY_PATH=/mnt/z/Download/timsdata-2.21.0.4/timsdata/linux64 ./timsread "230317_SIGRID_10_Slot1-41_1_4086.d" -ms1
Takes about couple of minutes and expected default output:
Loading metadata...
# TDF file 230317_SIGRID_10_Slot1-41_1_4086.d contains 66477 frames.
# Loaded 291844 precursors and 56501 MS/MS frames.
# MS/MS data written to: 230317_SIGRID_10_Slot1-41_1_4086.d_msms.mgf
Processing frames...
Progress: 100% (66477/66477) MS1:0 MS2:56501
Total frames processed: 66477
MS1 frames: 0
MS/MS frames: 56501
Skipped precursors with invalid m/z: 8268
Processing completed!
Expected output with -ms1
switch:
LD_LIBRARY_PATH=/mnt/z/Download/timsdata-2.21.0.4/timsdata/linux64 ./timsread "230317_SIGRID_10_Slot1-41_1_4086.d" -ms1
Loading metadata...
# TDF file 230317_SIGRID_10_Slot1-41_1_4086.d contains 66477 frames.
# Loaded 291844 precursors and 56501 MS/MS frames.
# MS/MS data written to: 230317_SIGRID_10_Slot1-41_1_4086.d_msms.mgf
# MS1 data written to: 230317_SIGRID_10_Slot1-41_1_4086.d_ms1.txt
Processing frames...
Progress: 100% (66477/66477) MS1:9975 MS2:56501
Total frames processed: 66477
MS1 frames: 9976
MS/MS frames: 56501
Skipped precursors with invalid m/z: 8268
Processing completed!
*.d_msms.mgf
: MS/MS spectra in MGF format for protein identification*.d_ms1.txt
: MS1 spectra (if-ms1
flag used) with format:Frame_ID RT_seconds Scan_Number m/z Intensity Mobility
This repository includes a Nextflow pipeline (nextflow.nf
) for automated processing of TDF files using the timsread
tool.
- Nextflow installed
- Bruker timsdata SDK available at
/mnt/z/Download/timsdata-2.21.0.4/timsdata/linux64
- Compiled
timsread
executable in the project directory
# Extract MS/MS spectra only (default, faster)
nextflow run nextflow.nf --input 230317_SIGRID_10_Slot1-41_1_4086.d
# Extract both MS/MS and MS1 spectra
nextflow run nextflow.nf --input 230317_SIGRID_10_Slot1-41_1_4086.d --ms2_only False
# Custom output directory
nextflow run nextflow.nf --input <tdf_directory> --publishdir custom_output
- Automated Processing: Handles library path configuration and tool execution
- Scalable: Can process multiple TDF files in parallel
- Flexible Output: Optional MS1 extraction with
--ms2_only False
- Quality Control: Generates summary statistics for processed files
- Resume Capability: Use
-resume
to continue interrupted runs
- MGF Files: MS/MS spectra ready for protein identification
- MS1 Files: Optional MS1 spectra data (if requested)
- Summary Report:
results_file.tsv
with processing statistics
# Download and extract test dataset
wget https://ftp.pride.ebi.ac.uk/pride/data/archive/2024/06/PXD045439/230317_SIGRID_10_Slot1-41_1_4086.d.tar
tar xf 230317_SIGRID_10_Slot1-41_1_4086.d.tar
# Run Nextflow pipeline
nextflow run nextflow.nf --input 230317_SIGRID_10_Slot1-41_1_4086.d
# Check results
ls nf_output/
# Output: 230317_SIGRID_10_Slot1-41_1_4086.d_msms.mgf results_file.tsv
- Ensure all required header and source files are present in the specified directories.
- If you encounter missing dependencies, verify the include paths and that all SDK/library files are extracted.
- The program automatically skips precursors with invalid (NULL) m/z values from the database.
- Use
-ms1
flag only when needed as MS1 extraction significantly increases processing time and output file size. - For automated processing: Use the included Nextflow pipeline (see Section 7) which handles library paths and batch processing automatically.
If you see an error about
libtimsdata.so
not being found, you must set theLD_LIBRARY_PATH
environment variable to the directory containinglibtimsdata.so
as shown above. This only affects the current terminal session.
- Zero-filter approach: Removes only zero-intensity peaks for maximum data retention
- Optimized extraction: ~83% more spectra than Bruker's default export
- Batch processing: Efficient handling of large datasets (66K+ frames)
- Memory management: 4MB I/O buffers with periodic flushing
Our timsread extraction finds ~83% more spectra than Bruker's default MGF export due to:
- More comprehensive PASEF precursor extraction
- Less aggressive intensity filtering (zero-filter only)
- Complete processing of all MS/MS frames and precursors
Example comparison:
- Bruker export: 291,843 spectra
- timsread extraction: 534,186 spectra (+83% more data)
grep -A 30 "PEPMASS=1221.98873 17379" "230317_SIGRID_10_Slot1-41_1_4086.d/230317_SIGRID_10_Slot1-41_1_4086_6.0.313.mgf"
grep -A 50 "PEPMASS=1221.988770" "230317_SIGRID_10_Slot1-41_1_4086.d_msms.mgf"
Key Differences in Ion Handling? Bruker's Export (1 spectrum):
PEPMASS=1221.98873 17379
- 15 peaks total
- Peak at 1221.98649: 11508 intensity
- Peak at 1222.99170: 4364 intensity
- Clean, merged spectrum
Our timsread (Multiple spectra for same m/z):
PEPMASS=1221.988770 28038 (First occurrence)
PEPMASS=1221.988770 10418 (Second occurrence)
PEPMASS=1221.988770 10418 (Third occurrence)
PEPMASS=1221.988770 21162 (Fourth occurrence)
PEPMASS=1221.988770 3872 (Fifth occurrence)
PEPMASS=1221.988770 3872 (Sixth occurrence)
PEPMASS=1221.988770 10789 (Seventh occurrence)
PEPMASS=1221.988770 10789 (Eighth occurrence)
PEPMASS=1221.988770 18410 (Ninth occurrence)
Bruker merges multiple PASEF precursors with the same m/z into a single spectrum, while our method creates separate spectra for each PASEF precursor instance. This probably explains:
- Why we have 83% more spectra - We're not merging duplicate m/z precursors
- Different intensity values - Each instance has its own intensity
- Duplicate peaks - Same m/z fragments appear multiple times across different scans
- Groups precursors by m/z
- Merges peaks from multiple PASEF instances
- Results in fewer, but "cleaner" spectra
- Each PASEF precursor gets its own spectrum
- Preserves scan-level detail and timing information
- More comprehensive but with redundancy
This repository is a fork of gtluu/timsconvert, originally licensed under the Apache License 2.0.
All original code is © the original authors and remains under Apache 2.0.
Modifications and additional contributions by Animesh Sharma
are © 2023–2025 and are available under the same Apache 2.0 license.
To the extent possible, Animesh Sharma’s contributions may also be reused under the MIT License.
See NOTICE
and LICENSE-ANIMESH.md
for details.