Skip to content

Conversation

@AlanMWatson
Copy link
Contributor

A builtin ome-zarr v3 writer for mesoSPIM.

Support:

  • On-the-fly assembly of multiscales during acquisition
  • High-performance multi threading keeps pace with data acquisition
  • Compression
  • Sharding
  • Chunksize can be adjusted with each multiscale
  • 2D downsampling for anisotropic data. Then 3D downsampling after multiscales converge to isotropic.

Notes:

  • Each tile is stored in its own ome-zarr dataset with a similar naming convention to other data acquisition methods
  • Defaults were tested on several systems using the demo config and long duration scans >3 hours.

@nvladimus
Copy link
Member

Dear Alan,
Thank you for this PR, very impressive!! 💯
I managed to test in demo-hardware mode, with two tiles, it worked nicely overall. I found a few issues so far:

  • There seems to be no transformation information (tile positions in zarr.json file), each tile is writtent in its own ZARR file, with no connection between them.
  • the OME-NGGF validator found some errors, namely No dimension_names for dataset 0 (etc)
  • the ReussMouseBrain-2x2Tiles-2Ch-2Arms dataset, which we can use as a reference for BigStitcher flavor of meta-information, packs each tile as a separate ZARR inside a root ZARR folder: Reuss-MousBrain2x2tiles-2ch.ome.zarr/s0-t0.zarr/i (i=0, 1, 2, 3)

I will look into these issues more closely, jut wanted to give a quick feedback.

@AlanMWatson
Copy link
Contributor Author

Thank you Nikita!

  • There seems to be no transformation information (tile positions in zarr.json file), each tile is writtent in its own ZARR file, with no connection between them.

I’ve added stage coordinates to the metadata for each array/scale in the multiscale tile. I tested by loading individual tiles in Neuroglancer, and they show the correct relative placement in the acquisition grid.

This was nested incorrectly; I’ve fixed it.
ome-zarr-models validate now returns: “✅ Valid OME-Zarr”.

  • the ReussMouseBrain-2x2Tiles-2Ch-2Arms dataset, which we can use as a reference for BigStitcher flavor of meta-information, packs each tile as a separate ZARR inside a root ZARR folder: Reuss-MousBrain2x2tiles-2ch.ome.zarr/s0-t0.zarr/i (i=0, 1, 2, 3)

What do you think is the best strategy for representing the collection of tiles? It isn’t clear to me whether OME-Zarr defines a single, recommended pattern for this - if at all. With the stage coordinates embedded as described above, the tiles now connect spatially. However, there is still no metadata that defines these tiles as a collection (i.e. part of the same logical grid)... We could write zarr.json group data in the root of the acquisition directory that points to each multiscale ome-zarr for each tile. Maybe something like this:

{
  "image_collection": {
    "version": "0.1-experimental",
    "images": [
      {"path": "r0_c0.ome.zarr"},
      {"path": "r0_c1.ome.zarr"},
      {"path": "r1_c0.ome.zarr"},
      {"path": "r1_c1.ome.zarr"}
    ]
  }
}

I think BigStitcher will want to see a more detailed XML which could be produced by mesoSPIM and dropped in the same place. It needs to also define "Translation to Regular Grid". Following this model, and maybe more consistant with resuability of the ome-zarr tile data, the translation coordinate transforms could be written to the root zarr.json and removed from individual tiles. That might look something like this:


{
  "axes": [
    {"name": "z", "type": "space", "unit": "micrometer"},
    {"name": "y", "type": "space", "unit": "micrometer"},
    {"name": "x", "type": "space", "unit": "micrometer"}
  ],
  "image_collection": {
    "version": "0.1-experimental",
    "axes": ["z","y","x"],
    "unit": "micrometer",
    "images": [
      {
        "path": "r0_c0.zarr",
        "coordinateTransformations": [
          {"type": "translation", "translation": [0.0, 0.0, 0.0]}
        ]
      },
      {
        "path": "r0_c1.zarr",
        "coordinateTransformations": [
          {"type": "translation", "translation": [0.0, 0.0, 666.0]}
        ]
      },
      {
        "path": "r1_c0.zarr",
        "coordinateTransformations": [
          {"type": "translation", "translation": [0.0, 666.0, 0.0]}
        ]
      },
      {
        "path": "r1_c1.zarr",
        "coordinateTransformations": [
          {"type": "translation", "translation": [0.0, 666.0, 666.0]}
        ]
      }
    ]
  }
}

@nvladimus
Copy link
Member

nvladimus commented Oct 28, 2025

Dear Alan,
Thanks for these updates! From what I can see in the ReussMouseBrain-2x2Tiles-2Ch-2Arms dataset, BigStitcher currently exports with the following ZARR structure:

  • root: dataset.ome.zarr group + dataset.xml
  • unique combinations of channels (488, 561, c=2), illuminations (left/right, i=2), and tiles T=4 create N=c*i*T=16 setups: dataset.ome.zarr/s0-t0.zarr, .. dataset.zarr/s15-t0.zarr. Parsing of these into correct channel/illumination names and tile positions happens via the BigStitcher-specific dataset.xml file.
  • each setup eg dataset.ome.zarr/s0-t0.zarr is a group that has has children /0 .. /3 that are downsampled levels of this setup, with the following meta-info (see below). I presume that the small pixel translations in higher-order levels (>=1) are due to averaging offsets, so all pyramids are aligned across scales.
  • each setup has always two zero nested sub-folders eg dataset.ome.zarr/s0-t0.zarr/3/0/0/z/y/x , I presume /0/0 are placeholders for time and channel (always 0 in this version). @StephanPreibisch can you weigh in on this?

I started a new branch for testing ome_zarr_writer, so you can see what I changed, since I cannot push my commits directly into this PR. It currently writes dataset.ome.zarr/s0-t0.zarr/0/c/z/y/x format, but for some reason I cannot get rid of c in this path. It looks like this c/ folder is part of Zarr 3 specification?

@StephanPreibisch do you expect any changes to this structure in the near future?
@m-albert is this compatible with multi-view stitcher?

# BigStitcher meta-info on setup level, eg  `dataset.zarr/s0-t0.zarr` 
{
  "multiscales": [
    {
      "name": "/",
      "version": "0.4",
      "axes": [
        {
          "type": "time",
          "name": "t",
          "unit": "millisecond",
          "discrete": false
        },
        {
          "type": "channel",
          "name": "c",
          "discrete": false
        },
        {
          "type": "space",
          "name": "z",
          "unit": "micrometer",
          "discrete": false
        },
        {
          "type": "space",
          "name": "y",
          "unit": "micrometer",
          "discrete": false
        },
        {
          "type": "space",
          "name": "x",
          "unit": "micrometer",
          "discrete": false
        }
      ],
      "datasets": [
        {
          "path": "0",
          "coordinateTransformations": [
            {
              "scale": [1.0, 1.0, 1.0, 1.0, 1.0],
              "type": "scale"
            },
            {
              "translation": [0.0, 0.0, 0.0, 0.0, 0.0],
              "type": "translation"
            }
          ]
        },
        {
          "path": "1",
          "coordinateTransformations": [
            {
              "scale": [1.0, 1.0, 1.0, 2.0, 2.0],
              "type": "scale"
            },
            {
              "translation": [0.0, 0.0, 0.0, 0.5, 0.5],
              "type": "translation"
            }
          ]
        },
        {
          "path": "2",
          "coordinateTransformations": [
            {
              "scale": [1.0, 1.0, 2.0, 4.0, 4.0],
              "type": "scale"
            },
            {
              "translation": [0.0, 0.0, 0.5, 1.5, 1.5],
              "type": "translation"
            }
          ]
        },
        {
          "path": "3",
          "coordinateTransformations": [
            {
              "scale": [1.0, 1.0, 4.0, 8.0, 8.0],
              "type": "scale"
            },
            {
              "translation": [0.0, 0.0, 1.5, 3.5, 3.5],
              "type": "translation"
            }
          ]
        }
      ],
      "coordinateTransformations": [
        {
          "scale": [1.0, 1.0, 1.0, 1.0, 1.0],
          "type": "scale"
        }
      ],
      "basePath": "",
      "paths": ["0", "1", "2", "3"],
      "units": ["micrometer", "micrometer", "micrometer", "micrometer"]
    }
  ]
}
# BigStitcher meta-info for `dataset.zarr/s0-t0.zarr/0/` array level
{
  "shape": [1, 1, 615, 2048, 2048],
  "chunks": [1, 1, 64, 128, 128],
  "fill_value": 0,
  "dtype": ">u2",
  "filters": [],
  "dimension_separator": "/",
  "zarr_format": 2,
  "order": "C",
  "compressor": {
    "id": "zstd",
    "level": 3
  }
}

@AlanMWatson
Copy link
Contributor Author

Nikitta,

It currently writes dataset.ome.zarr/s0-t0.zarr/0/c/z/y/x format

It is still writing each tile in a separate ...tile{}_dataset.ome.zarr/s{}-t{}.zarr/... structure. The tiles are already formatted as fully defined ome-zarrs. I suggest that if we want to use this strucuture, that we nest all of the s{}-t{}.zarr named tiles in a single <some_name>.ome.zarr folder - which is what the BigStitcher example is doing.

for some reason I cannot get rid of c in this path. It looks like this c/ folder is part of Zarr 3 specification?

This is what I am finding as well. 'chunk key namespace'

@AlanMWatson
Copy link
Contributor Author

Nikitta,

I suggest that if we want to use this strucuture, that we nest all of the s{}-t{}.zarr named tiles in a single <some_name>.ome.zarr folder - which is what the BigStitcher example is doing.

I pushed a fix that will nest the multiscale tiles in a single directory. Todo: Write zarr.json group metadata in this folder.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants