Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for validation of zarr filesets #1281

Open
yarikoptic opened this issue Apr 19, 2023 · 6 comments
Open

Add support for validation of zarr filesets #1281

yarikoptic opened this issue Apr 19, 2023 · 6 comments
Labels
blocked Blocked by some needed development/fix zarr

Comments

@yarikoptic
Copy link
Member

ATM I believe we are just testing if we can open them and two custom checks (not an empty group and not too deep of hierarchy). Initial validate support, with --strict option. in ome-zarr-py was recently merged so we should make use of it for our ome .zarrs.

@yarikoptic yarikoptic changed the title Add support for validation of zarr files Add support for validation of zarr filesets Apr 19, 2023
@jwodder
Copy link
Member

jwodder commented Apr 20, 2023

@yarikoptic

  • What makes you say that ome-zarr-py PR was merged? It was clearly closed without being merged.
  • It appears that the given validation method loads all(?) of the Zarr data into memory, which will be a problem for arbitrarily large Zarrs.

@yarikoptic yarikoptic added the blocked Blocked by some needed development/fix label Apr 20, 2023
@yarikoptic
Copy link
Member Author

yarikoptic commented Apr 20, 2023

  • What makes you say that ome-zarr-py PR was merged? It was clearly closed without being merged.

heh, not sure why I thought that "Closed" meant "Merged" to me ;) Left a question on that PR on what is the destiny/plan there in terms of validation.

  • It appears that the given validation method loads all(?) of the Zarr data into memory, which will be a problem for arbitrarily large Zarrs.

that would really make it unlikely to be usable by default... where do they do it?

from glancing over https://github.com/ome/ome-zarr-py/pull/142/files#diff-b50d9715cc6e4017cfc055fd0ed73ecb5d9158e17f4d58ca5b3ba08b89c46657R206 I thought it would just validate structure/metadata against some jsonschema.

@jwodder
Copy link
Member

jwodder commented Apr 20, 2023

@yarikoptic ome_zarr.utils.validate() calls visit(), which iterates over the return values of Reader.__call__(), which either descends through the node (I haven't yet found what's populating the "descend" structures) or (line 698) calls ZarrLocation.load(), which calls out to a third party library that I haven't looked at yet, but the name sure sounds like it's loading data.

@yarikoptic
Copy link
Member Author

I have followed ome/ome-zarr-py#142 (comment) and
ran check-jsonschema --schemafile /home/dandi/proj/ngff/0.4/schemas/image.schema <(curl --silent "$url")

coded within `/mnt/backup/dandi/dandizarrs/tools/jsonschema-check-zattrs` on drogon:
#!/bin/bash

# inspired by https://github.com/ome/ome-zarr-py/pull/142#issuecomment-1517024760

set -eu
#set -x
for z in "$@"; do
        zattrs="$z/.zattrs"
        if ! /bin/ls "$zattrs" &>/dev/null; then
                echo "$z - no .zattrs, skipping"
                continue
        fi
        url=$(git -C "$z" annex whereis .zattrs | grep https://dandiarchive | awk '{print $2;}' | head -n 1)
        echo "$z - $url"
        check-jsonschema --schemafile /home/dandi/proj/ngff/0.4/schemas/image.schema <(curl --silent "$url") | sed -e 's,^,  ,'g
done

and got following list of failures http://www.oneukrainian.com/tmp/dandizarrs-jsonschema-checks.out - so the majority of zarrs have

  Schema validation errors were encountered.
    /dev/fd/63::$.omero.channels[0].window: 'start' is a required property
    /dev/fd/63::$.omero.channels[0].window: 'end' is a required property

in fact - there is only 137 zarrs which pass validation and over 4k which do not.

@slaytonmarx could you please check with similar (check-jsonschema --schemafile https://raw.githubusercontent.com/ome/ngff/main/0.4/schemas/image.schema YOUR.zarr/.zattrs) command on zarr files you have?

@slaytonmarx
Copy link

I'll check tomorrow morning and let you know!

@slaytonmarx
Copy link

I received the same validation errors as Yarik:

smarx@leviathan:/mnt/beegfs/Lee/dandi/sub-MITU01/ses-20211001h11m49s01/micr$ check-jsonschema --schemafile https://raw.githubusercontent.com/ome/ngff/main/0.4/schemas/image.schema sub-MITU01_ses-20211001h11m49s01_sample-103_stain-LEC_run-1_chunk-10_SPIM.ome.zarr/.zattrs
Schema validation errors were encountered.
  sub-MITU01_ses-20211001h11m49s01_sample-103_stain-LEC_run-1_chunk-10_SPIM.ome.zarr/.zattrs::$.omero.channels[0].window: 'start' is a required property
  sub-MITU01_ses-20211001h11m49s01_sample-103_stain-LEC_run-1_chunk-10_SPIM.ome.zarr/.zattrs::$.omero.channels[0].window: 'end' is a required property

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blocked Blocked by some needed development/fix zarr
Projects
None yet
Development

No branches or pull requests

3 participants