Skip to content

Add NDTensorsJLD2Ext#1748

Open
mtfishman wants to merge 13 commits into
mainfrom
mf/ndtensors-jld2-ext
Open

Add NDTensorsJLD2Ext#1748
mtfishman wants to merge 13 commits into
mainfrom
mf/ndtensors-jld2-ext

Conversation

@mtfishman
Copy link
Copy Markdown
Member

Summary

Adds NDTensorsJLD2Ext (weak dep on JLD2 0.6) implementing JLD2.writeas / wconvert / rconvert for Dense, BlockSparse, NonuniformDiag, UniformDiag, NonuniformDiagBlockSparse, UniformDiagBlockSparse, and EmptyStorage. Mirrors the layout of NDTensorsHDF5Ext/.

Split out from the prototype JLD2 serialization PR on Tennis.jl#31; the ITensors-level and ITensorNetworks-level pieces will follow as separate PRs against their home packages.

Custom JLD2 serialization for NDTensors storage types
(Dense, BlockSparse, Diag, DiagBlockSparse — uniform and
nonuniform — and EmptyStorage), split out from the
ITensor-stack JLD2 serialization prototype on
Tennis.jl#31.
@codecov
Copy link
Copy Markdown

codecov Bot commented May 12, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 4.95%. Comparing base (51c5b51) to head (6ce42f5).

❗ There is a different number of reports uploaded between BASE (51c5b51) and HEAD (6ce42f5). Click for more details.

HEAD has 8 uploads less than BASE
Flag BASE (51c5b51) HEAD (6ce42f5)
docs 3 1
12 6
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #1748       +/-   ##
==========================================
- Coverage   81.02%   4.95%   -76.07%     
==========================================
  Files          80      69       -11     
  Lines        5074    5021       -53     
==========================================
- Hits         4111     249     -3862     
- Misses        963    4772     +3809     
Flag Coverage Δ
docs 4.98% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

mtfishman and others added 12 commits May 14, 2026 11:50
The on-disk struct names recorded in JLD2-written files now sit under
NDTensors.* rather than NDTensorsJLD2Ext.*, matching the namespacing of
the user-facing storage types. This avoids encoding the JLD2 extension's
module path into the file format, which would otherwise lock that path
in for any future cross-format or cross-language reader. NDTensorsJLD2Ext
now holds only the JLD2.writeas / wconvert / rconvert methods.
The `_serialize_blockoffsets` / `_deserialize_blockoffsets` helpers are
codec internals, not part of the on-disk format spec, so they belong
alongside the writeas / wconvert / rconvert methods rather than in
NDTensors proper. NDTensors/src/serialization_types.jl now contains
only the struct layouts that define the file format.
…onvert methods

Same wording everywhere, plus a TODO to remove the idempotent methods once the
JLD2 double-call bug is fixed upstream.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ates

Cross-language readability changes to the on-disk Serialized* layouts:
- version :: UInt32 (matches Base.VersionNumber's field width).
- block_indices :: Matrix{Int64} shaped (ndims, num_blocks), the COO
  convention used by Apache Arrow, PyData Sparse, and PyTorch sparse.
  ndims is implicit in size(block_indices, 1) and is preserved by HDF5
  even when num_blocks == 0, so the standalone ndims field is no longer
  needed and has been removed.
- block_offsets :: Vector{Int64}.
- SerializedEmptyStorage now just carries version; the eltype is already
  encoded in the parametric type record on disk, matching how the other
  Serialized* storage types handle their eltype.

The block-offset helpers take Val(N) so the constructed BlockOffsets{N}
stays type-stable.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Storage and serialized value as the first positional argument, the type
parameter Val(N) as the second. Matches the typical Julia idiom of
putting the value being operated on first.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds docstrings to each Serialized* storage type documenting its field
layout and on-disk schema version, and marks them as part of NDTensors's
public API surface using the Julia-docs-recommended backwards-compatible
public declaration form so cross-version downstream users can rely on
them.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…vert

Refactors the backend-agnostic part of NDTensors serialization out of
NDTensorsJLD2Ext and into the main package, so other backends (a future
native HDF5 emitter, custom binary writers, anyone wanting to inspect the
on-disk shape from Julia) can use it without depending on JLD2:

- New `NDTensors.serialized_type(::Type{T})` returns the on-disk schema
  type for an in-memory type T. Public API.
- `Base.convert` overloads handle the value-level transform in both
  directions (in-memory <-> Serialized*). They live next to the schema
  type definitions in NDTensors/src/serialization.jl.
- The block-offset (de)serialization helpers move along with them.
- NDTensorsJLD2Ext shrinks to a single `JLD2.writeas` declaration
  delegating to `NDTensors.serialized_type` (scoped to the TensorStorage
  hierarchy). JLD2's default `wconvert` / `rconvert` already delegate
  to `Base.convert`, so no JLD2-specific value-conversion code remains.
- The per-type idempotent rconvert workarounds disappear: Julia's
  `Base.convert(::Type{T}, x::T) = x` identity dispatch covers the JLD2
  double-call bug for free.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Merges NDTensors/src/serialization_types.jl into NDTensors/src/serialization.jl
and reorders the file so each storage type's section contains:
  1. The Serialized* struct definition (with docstring).
  2. The serialized_type declaration.
  3. The Base.convert overloads in both directions.

Keeps the shared block-offset helpers near the top of the file since they're
used by two sections (BlockSparse, DiagBlockSparse). serialization_types.jl
is deleted.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…user

Previously sat at the top of the file as "shared helpers"; closer to
their first use is clearer. They stay in this position because they're
also reused by the DiagBlockSparse sections below.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Previously stored as (ndims, num_blocks) (column-major-natural for Julia,
mirroring Apache Arrow / PyData Sparse / PyTorch sparse conventions). For
this on-disk schema the practical case favors the opposite orientation:

  - One row per block matches the natural "table of blocks" mental model.
  - Cross-language readers (numpy / h5py) get `arr[i, :]` as the natural
    row-major-contiguous access to one block's coordinates.
  - Performance argument for column-major-natural Julia access doesn't
    apply here — `block_indices` is metadata (small matrix, few blocks).

Also drops the "COO convention" jargon from docstrings: that terminology
isn't explanatory for someone reading our docs and isn't load-bearing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…tion

Reintroduces three-layer architecture: permissive serialize_convert /
deserialize_convert named functions own the value-level transform with
duck-typed s, hygienic typed Base.convert overloads delegate to them, and
the JLD2 extension restores permissive JLD2.rconvert plus the idempotency
overload (workaround for the JLD2 double-rconvert bug on Pair-nested
types). Keeps Base.convert hygienic while preserving the source-side
flexibility needed to absorb JLD2's AbstractReconstructedType when an
on-disk schema struct's field layout drifts from the current definition.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ypes

Drops `serialized_type`, `serialize_convert`, and `deserialize_convert`
from the `public` declaration. These remain reachable from the JLD2
extension (and from `Base.convert` via the shims) but are no longer part
of the package's stable public surface — no current external consumer
calls them directly, so committing to them now is premature.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant