A Java library for reading and writing DIA-NN spectral library files (.speclib format).
SpecLibIO provides a pure Java implementation for working with DIA-NN spectral libraries, enabling seamless integration with Java-based proteomics tools and workflows. The library supports reading and writing the binary .speclib format used by DIA-NN.
This project, including all implementation code, unit tests, and this README, was generated using vibe coding (AI-assisted development). The code has been reviewed, (partially) modified, and tested by Fengchao Yu.
The DiaNNSpecLibReader is based on the DIA-NN SpecLib parser from the ProteoWizard project.
The DiaNNSpecLibWriter is based on the implemented DiaNNSpecLibReader, providing symmetric write capabilities for the binary format.
Unit test files (.speclib and .tsv files) are from the ProteoWizard repository's test resources.
- Read DIA-NN spectral libraries: Parse binary
.speclibfiles into Java objects - Write DIA-NN spectral libraries: Generate binary
.speclibfiles from Java objects - Parquet to SpecLib conversion: Convert Parquet-formatted spectral libraries to SpecLib format
- Version support: Supports format versions up to -3 (latest DIA-NN format)
- Complete data model: Includes proteins, peptides, precursors, fragments, and metadata
- Stream-based I/O: Efficient handling of large library files
- Pure Java: No native dependencies, works on any platform with Java 11+
- Java 11 or higher
- Maven 3.x (for building)
import speclib.io.DiaNNSpecLibReader;
import speclib.io.SpectralLibrary;
// From file path
DiaNNSpecLibReader reader = new DiaNNSpecLibReader("library.speclib");
SpectralLibrary library = reader.read();
// From input stream
try (InputStream is = new FileInputStream("library.speclib")) {
DiaNNSpecLibReader reader = new DiaNNSpecLibReader(is);
SpectralLibrary library = reader.read();
}import speclib.io.DiaNNSpecLibWriter;
import speclib.io.SpectralLibrary;
SpectralLibrary library = new SpectralLibrary();
library.setName("My Library");
library.setGenDecoys(true);
library.setGenCharges(true);
// ... populate library with entries
DiaNNSpecLibWriter writer = new DiaNNSpecLibWriter(library);
writer.write("output.speclib");import speclib.io.ParquetToSpecLib;
// Convert Parquet file to SpecLib
ParquetToSpecLib converter = new ParquetToSpecLib("input.parquet");
converter.convertAndWrite("output.speclib");Run unit tests with Maven:
mvn testThe test suite includes:
- DiaNNSpecLibReaderTest: Reading various DIA-NN library formats
- DiaNNSpecLibWriterTest: Round-trip read/write validation, edge cases, format version compatibility
- ParquetToSpecLibTest: Parquet to SpecLib conversion with comprehensive field validation
- Tests conversion accuracy for all fields (precursors, fragments, proteins, RT, ion mobility)
- Validates fragment types, loss types, charges, and intensities
- Verifies round-trip conversion (Parquet → SpecLib → Read back)
The DIA-NN .speclib format is a binary format with the following structure:
- Header: Version, flags (gen_decoys, gen_charges, infer_proteotypicity)
- Metadata: Library name, FASTA names
- Proteins: Array of protein isoforms and protein groups
- Precursors: Array of precursor identifiers
- Names/Genes: Parallel arrays for protein names and gene names
- iRT range: Minimum and maximum indexed retention times
- Entries: Array of library entries with peptides and fragments
- Optional: Elution groups (version -1 and later)
All integers are stored as 4-byte little-endian, doubles as 8-byte little-endian IEEE 754.