Add NDTensorsJLD2Ext#1748
Open
mtfishman wants to merge 13 commits into
Open
Conversation
Custom JLD2 serialization for NDTensors storage types (Dense, BlockSparse, Diag, DiagBlockSparse — uniform and nonuniform — and EmptyStorage), split out from the ITensor-stack JLD2 serialization prototype on Tennis.jl#31.
Codecov Report✅ All modified and coverable lines are covered by tests.
Additional details and impacted files@@ Coverage Diff @@
## main #1748 +/- ##
==========================================
- Coverage 81.02% 4.95% -76.07%
==========================================
Files 80 69 -11
Lines 5074 5021 -53
==========================================
- Hits 4111 249 -3862
- Misses 963 4772 +3809
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
The on-disk struct names recorded in JLD2-written files now sit under NDTensors.* rather than NDTensorsJLD2Ext.*, matching the namespacing of the user-facing storage types. This avoids encoding the JLD2 extension's module path into the file format, which would otherwise lock that path in for any future cross-format or cross-language reader. NDTensorsJLD2Ext now holds only the JLD2.writeas / wconvert / rconvert methods.
The `_serialize_blockoffsets` / `_deserialize_blockoffsets` helpers are codec internals, not part of the on-disk format spec, so they belong alongside the writeas / wconvert / rconvert methods rather than in NDTensors proper. NDTensors/src/serialization_types.jl now contains only the struct layouts that define the file format.
…onvert methods Same wording everywhere, plus a TODO to remove the idempotent methods once the JLD2 double-call bug is fixed upstream. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ates
Cross-language readability changes to the on-disk Serialized* layouts:
- version :: UInt32 (matches Base.VersionNumber's field width).
- block_indices :: Matrix{Int64} shaped (ndims, num_blocks), the COO
convention used by Apache Arrow, PyData Sparse, and PyTorch sparse.
ndims is implicit in size(block_indices, 1) and is preserved by HDF5
even when num_blocks == 0, so the standalone ndims field is no longer
needed and has been removed.
- block_offsets :: Vector{Int64}.
- SerializedEmptyStorage now just carries version; the eltype is already
encoded in the parametric type record on disk, matching how the other
Serialized* storage types handle their eltype.
The block-offset helpers take Val(N) so the constructed BlockOffsets{N}
stays type-stable.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Storage and serialized value as the first positional argument, the type parameter Val(N) as the second. Matches the typical Julia idiom of putting the value being operated on first. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds docstrings to each Serialized* storage type documenting its field layout and on-disk schema version, and marks them as part of NDTensors's public API surface using the Julia-docs-recommended backwards-compatible public declaration form so cross-version downstream users can rely on them. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…vert
Refactors the backend-agnostic part of NDTensors serialization out of
NDTensorsJLD2Ext and into the main package, so other backends (a future
native HDF5 emitter, custom binary writers, anyone wanting to inspect the
on-disk shape from Julia) can use it without depending on JLD2:
- New `NDTensors.serialized_type(::Type{T})` returns the on-disk schema
type for an in-memory type T. Public API.
- `Base.convert` overloads handle the value-level transform in both
directions (in-memory <-> Serialized*). They live next to the schema
type definitions in NDTensors/src/serialization.jl.
- The block-offset (de)serialization helpers move along with them.
- NDTensorsJLD2Ext shrinks to a single `JLD2.writeas` declaration
delegating to `NDTensors.serialized_type` (scoped to the TensorStorage
hierarchy). JLD2's default `wconvert` / `rconvert` already delegate
to `Base.convert`, so no JLD2-specific value-conversion code remains.
- The per-type idempotent rconvert workarounds disappear: Julia's
`Base.convert(::Type{T}, x::T) = x` identity dispatch covers the JLD2
double-call bug for free.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Merges NDTensors/src/serialization_types.jl into NDTensors/src/serialization.jl and reorders the file so each storage type's section contains: 1. The Serialized* struct definition (with docstring). 2. The serialized_type declaration. 3. The Base.convert overloads in both directions. Keeps the shared block-offset helpers near the top of the file since they're used by two sections (BlockSparse, DiagBlockSparse). serialization_types.jl is deleted. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…user Previously sat at the top of the file as "shared helpers"; closer to their first use is clearer. They stay in this position because they're also reused by the DiagBlockSparse sections below. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Previously stored as (ndims, num_blocks) (column-major-natural for Julia,
mirroring Apache Arrow / PyData Sparse / PyTorch sparse conventions). For
this on-disk schema the practical case favors the opposite orientation:
- One row per block matches the natural "table of blocks" mental model.
- Cross-language readers (numpy / h5py) get `arr[i, :]` as the natural
row-major-contiguous access to one block's coordinates.
- Performance argument for column-major-natural Julia access doesn't
apply here — `block_indices` is metadata (small matrix, few blocks).
Also drops the "COO convention" jargon from docstrings: that terminology
isn't explanatory for someone reading our docs and isn't load-bearing.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…tion Reintroduces three-layer architecture: permissive serialize_convert / deserialize_convert named functions own the value-level transform with duck-typed s, hygienic typed Base.convert overloads delegate to them, and the JLD2 extension restores permissive JLD2.rconvert plus the idempotency overload (workaround for the JLD2 double-rconvert bug on Pair-nested types). Keeps Base.convert hygienic while preserving the source-side flexibility needed to absorb JLD2's AbstractReconstructedType when an on-disk schema struct's field layout drifts from the current definition. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ypes Drops `serialized_type`, `serialize_convert`, and `deserialize_convert` from the `public` declaration. These remain reachable from the JLD2 extension (and from `Base.convert` via the shims) but are no longer part of the package's stable public surface — no current external consumer calls them directly, so committing to them now is premature. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds
NDTensorsJLD2Ext(weak dep on JLD2 0.6) implementingJLD2.writeas/wconvert/rconvertforDense,BlockSparse,NonuniformDiag,UniformDiag,NonuniformDiagBlockSparse,UniformDiagBlockSparse, andEmptyStorage. Mirrors the layout ofNDTensorsHDF5Ext/.Split out from the prototype JLD2 serialization PR on Tennis.jl#31; the ITensors-level and ITensorNetworks-level pieces will follow as separate PRs against their home packages.