Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial validate support, with --strict option #142

Closed
wants to merge 28 commits into from
Closed
Show file tree
Hide file tree
Changes from 6 commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
ef746b1
Initial validate support, with --warnings option
will-moore Dec 2, 2021
ace592a
Add jsonschema to deps in setup.py
will-moore Dec 2, 2021
7d26157
Copy schemas from ngff. Use LocalRefResolver to load schemas
will-moore Dec 3, 2021
145b938
Add validation tests to test_cli.py
will-moore Dec 3, 2021
4888698
Add jsonschema to requirements-test.txt
will-moore Dec 3, 2021
a20c8aa
Fix jsonschema dependency
will-moore Dec 3, 2021
7520081
Remove .DS_Store files
will-moore Dec 9, 2021
cbd4ee6
Add .DS_Store to .gitignore
will-moore Dec 9, 2021
f0683a6
Use latest schema version by default
will-moore Dec 9, 2021
d093cda
Use logging instead of print
will-moore Dec 9, 2021
33298bd
Move top-level /schemas to ome_zarr/schemas
will-moore Dec 9, 2021
4c28024
remove get_strict_schema(). use get_schema(strict)
will-moore Dec 9, 2021
b388a3a
fix duplicate validate()
will-moore Dec 9, 2021
674ebb7
Import schemas from ngff
will-moore Dec 14, 2021
7b3ab42
Delete schemas
will-moore Dec 14, 2021
e380451
Merge remote-tracking branch 'origin/master' into validate_json_schema
will-moore Mar 22, 2022
d301229
Update to use 'ome_ngff' package
will-moore Mar 22, 2022
c37e116
visit(path: str, func: callable)
will-moore Mar 23, 2022
c1ed016
Remove use of ngff repo. Use cached_path instead
will-moore Jul 1, 2022
9ccebe0
Merge remote-tracking branch 'origin/master' into validate_json_schema
will-moore Jul 1, 2022
e3abc33
Remove ome_ngff dependency
will-moore Jul 1, 2022
5fef946
Add cached_path to dependencies in environment.yml
will-moore Jul 4, 2022
3ad5315
Rename --warnings to --strict. Support Plate and Well
will-moore Jul 4, 2022
5fcfe94
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jul 4, 2022
6774922
Update test to use --strict
will-moore Jul 4, 2022
2bbce18
Support --clear_cache argument for validate
will-moore Jul 5, 2022
96961bd
Replace spec.get_schema() with SCHEMA_NAME
will-moore Jul 5, 2022
c03b0a0
Use format.get_metadata_version(data) to get version
will-moore Jul 5, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .isort.cfg
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
[settings]
known_third_party = dask,numpy,pytest,scipy,setuptools,skimage,zarr
known_third_party = dask,jsonschema,numpy,pytest,scipy,setuptools,skimage,zarr
multi_line_output = 3
include_trailing_comma = True
force_grid_wrap = 0
Expand Down
1 change: 1 addition & 0 deletions environment.yml
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ channels:
dependencies:
- flake8
- ipython
- jsonschema
- mypy
- omero-py
- pip
Expand Down
13 changes: 13 additions & 0 deletions ome_zarr/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@
from .scale import Scaler
from .utils import download as zarr_download
from .utils import info as zarr_info
from .utils import validate as zarr_validate


def config_logging(loglevel: int, args: argparse.Namespace) -> None:
Expand All @@ -29,6 +30,12 @@ def info(args: argparse.Namespace) -> None:
list(zarr_info(args.path, stats=args.stats))


def validate(args: argparse.Namespace) -> None:
"""Wrap the :func:`~ome_zarr.utils.validate` method."""
config_logging(logging.WARN, args)
list(zarr_validate(args.path, args.warnings))


def download(args: argparse.Namespace) -> None:
"""Wrap the :func:`~ome_zarr.utils.download` method."""
config_logging(logging.WARN, args)
Expand Down Expand Up @@ -99,6 +106,12 @@ def main(args: List[str] = None) -> None:
parser_info.add_argument("--stats", action="store_true")
parser_info.set_defaults(func=info)

# validate
parser_validate = subparsers.add_parser("validate")
parser_validate.add_argument("path")
parser_validate.add_argument("--warnings", action="store_true")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the role of this argument to trigger the validation of the recommended attributes?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes. We have various terminology: There are Spec rules that SHOULD be followed (so it's recommended that you do). These are covered by a strict schema and if you don't then this gives you warnings (if you use that flag)!
It would be nice to settle on ONE term that could cover all these cases?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. Trying to look into the terminology used by others, I came back to SHACL which we have not decided to adopt at this stage but we might want to come back to in the near future.

A relevant concept is the idea of severity defined here and used by the validation report. In particular, looking at the pySHACL library, the --allow-warnings option effectively controls the severity level beyond which the result will be considered as invalid.

parser_validate.set_defaults(func=validate)

# download
parser_download = subparsers.add_parser("download")
parser_download.add_argument("path")
Expand Down
6 changes: 3 additions & 3 deletions ome_zarr/data.py
Original file line number Diff line number Diff line change
Expand Up @@ -147,19 +147,19 @@ def create_zarr(
"channels": [
{
"color": "FF0000",
"window": {"start": 0, "end": 255},
"window": {"start": 0, "end": 255, "min": 0, "max": 255},
"label": "Red",
"active": True,
},
{
"color": "00FF00",
"window": {"start": 0, "end": 255},
"window": {"start": 0, "end": 255, "min": 0, "max": 255},
"label": "Green",
"active": True,
},
{
"color": "0000FF",
"window": {"start": 0, "end": 255},
"window": {"start": 0, "end": 255, "min": 0, "max": 255},
"label": "Blue",
"active": True,
},
Expand Down
37 changes: 37 additions & 0 deletions ome_zarr/reader.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,8 +8,11 @@
import dask.array as da
import numpy as np
from dask import delayed
from jsonschema import validate
from jsonschema.validators import validator_for

from .io import ZarrLocation
from .schemas import LocalRefResolver, get_schema, get_strict_schema
from .types import JSONDict

LOGGER = logging.getLogger("ome_zarr.reader")
Expand Down Expand Up @@ -104,6 +107,12 @@ def load(self, spec_type: Type["Spec"]) -> Optional["Spec"]:
return spec
return None

def validate(self, warnings: bool) -> None:
# Validation for a node is delegated to each spec
# e.g. Labels may have spec for multiscales and labels
for spec in self.specs:
spec.validate(warnings)

def add(
self,
zarr: ZarrLocation,
Expand Down Expand Up @@ -175,6 +184,10 @@ def __init__(self, node: Node) -> None:
def lookup(self, key: str, default: Any) -> Any:
return self.zarr.root_attrs.get(key, default)

def validate(self, warnings: bool = False) -> None:
# If not implemented, ignore for now
pass


class Labels(Spec):
"""Relatively small specification for the well-known "labels" group which only
Expand Down Expand Up @@ -315,6 +328,30 @@ def array(self, resolution: str, version: str) -> da.core.Array:
# data.shape is (t, c, z, y, x) by convention
return self.zarr.load(resolution)

def validate(self, warnings: bool = False) -> None:
multiscales = self.lookup("multiscales", [])
version = multiscales[0].get("version", "0.1")
will-moore marked this conversation as resolved.
Show resolved Hide resolved
print("Validating Multiscales spec at", self.zarr)
sbesson marked this conversation as resolved.
Show resolved Hide resolved
print("Using Multiscales schema version", version)
will-moore marked this conversation as resolved.
Show resolved Hide resolved
image_schema = get_schema(version)

# Always do a validation with the MUST rules
# Will throw ValidationException if it fails
json_data = self.zarr.root_attrs
validate(instance=json_data, schema=image_schema)
sbesson marked this conversation as resolved.
Show resolved Hide resolved

# If we're also checking for SHOULD rules,
# we want to iterate all errors and show as "Warnings"
if warnings:
strict_schema = get_strict_schema(version)
cls = validator_for(strict_schema)
cls.check_schema(strict_schema)
# Use our local resolver subclass to resolve local documents
localResolver = LocalRefResolver.from_schema(strict_schema)
validator = cls(strict_schema, resolver=localResolver)
for error in validator.iter_errors(json_data):
print("WARNING", error.message)
will-moore marked this conversation as resolved.
Show resolved Hide resolved


class OMERO(Spec):
@staticmethod
Expand Down
43 changes: 43 additions & 0 deletions ome_zarr/schemas.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
import json
import os
from typing import Dict

from jsonschema import RefResolver


class LocalRefResolver(RefResolver):
def resolve_remote(self, url: str) -> Dict:
# Use remote URL to generate local path
rel_path = url.replace("https://ngff.openmicroscopy.org/", "../schemas/")
sbesson marked this conversation as resolved.
Show resolved Hide resolved
curr_dir = os.path.dirname(__file__)
path = os.path.join(curr_dir, rel_path)
path = os.path.normpath(path)
# Load local document and cache it
document = load_json(path)
self.store[url] = document
return document


def load_json(path: str) -> Dict:
with open(path) as f:
document = json.loads(f.read())
return document


def get_schema(version: str) -> Dict:
curr_dir = os.path.dirname(__file__)
# The paths here match the paths in the ngff repo (and public schemas)
path = os.path.join(
curr_dir, f"../schemas/{version}/schemas/json_schema/image.schema"
)
path = os.path.normpath(path)
return load_json(path)


def get_strict_schema(version: str) -> Dict:
sbesson marked this conversation as resolved.
Show resolved Hide resolved
curr_dir = os.path.dirname(__file__)
path = os.path.join(
curr_dir, f"../schemas/{version}/schemas/json_schema/strict_image.schema"
)
path = os.path.normpath(path)
return load_json(path)
20 changes: 20 additions & 0 deletions ome_zarr/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,26 @@ def info(path: str, stats: bool = False) -> Iterator[Node]:
yield node


def validate(path: str, warnings: bool) -> Iterator[Node]:
"""
Validate OME-NGFF data

All :class:`Nodes <ome_utils.reader.Node>` that are found from the given path will
be visited recursively.
"""
zarr = parse_url(path)
assert zarr, f"not a zarr: {zarr}"
reader = Reader(zarr)
for node in reader():
if not node.specs:
print(f"not an ome-zarr node: {node}")
continue

if hasattr(node, "validate"):
node.validate(warnings)
yield node
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of the logic is duplicate from def info i.e. the graph is traversal with an action being perform at each level node.print vs node.validate.

This raises the question of whether the two commands should be combined e.g. to satisfy the use cases where someone will want to inspect an OME-Zarr, validate it and output a minimal report if the validation passes? There is likely an argument to doing both actions independently (validation or display information) but could that be equally achieved via flags (--validate/--no-validate and --report/--no-report)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, if you just do this: does it report by default (and validate)?

$ ome_zarr info image.zarr

Then you can disable both with: ?

$ ome_zarr info image.zarr --no-validate --no-report

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just my 2cents here: I think the cleanest design would be to implement def visit(path: str, func: callable) that evaluates func for each node. This could then be used to implement both info and validate without much code duplication, i.e. both info and validate would just use visit with an appropriate function.

This has the added advantage that it allows further customization by users. If you go for this functionality I would also adopt the "usual" syntax for visit in python: If func returns something other than None, visit terminates and returns it.



def download(input_path: str, output_dir: str = ".") -> None:
"""Download an OME-Zarr from the given path.

Expand Down
Binary file added schemas/.DS_Store
Binary file not shown.
Binary file added schemas/0.1/schemas/.DS_Store
Binary file not shown.
Binary file added schemas/0.1/schemas/json_schema/.DS_Store
Binary file not shown.
112 changes: 112 additions & 0 deletions schemas/0.1/schemas/json_schema/image.schema
Original file line number Diff line number Diff line change
@@ -0,0 +1,112 @@
{
will-moore marked this conversation as resolved.
Show resolved Hide resolved
"$schema": "https://json-schema.org/draft/2020-12/schema",
"$id": "https://ngff.openmicroscopy.org/0.1/schemas/json_schema/image.schema",
"title": "NGFF Image",
"description": "JSON from OME-NGFF .zattrs",
"type": "object",
"properties": {
"multiscales": {
"description": "The multiscale datasets for this image",
"type": "array",
"items": {
"type": "object",
"properties": {
"name": {
"type": "string"
},
"datasets": {
"type": "array",
"minItems": 1,
"items": {
"type": "object",
"properties": {
"path": {
"type": "string"
}
},
"required": ["path"]
}
},
"version": {
"type": "string",
"enum": [
"0.1"
]
},
"metadata": {
"type": "object",
"properties": {
"method": {
"type": "string"
},
"version": {
"type": "string"
}
}
}
},
"required": [
"datasets"
]
},
"minItems": 1,
"uniqueItems": true
},
"omero": {
"type": "object",
"properties": {
"channels": {
"type": "array",
"items": {
"type": "object",
"properties": {
"window": {
"type": "object",
"properties": {
"end": {
"type": "number"
},
"max": {
"type": "number"
},
"min": {
"type": "number"
},
"start": {
"type": "number"
}
},
"required": [
"start",
"min",
"end",
"max"
]
},
"label": {
"type": "string"
},
"family": {
"type": "string"
},
"color": {
"type": "string"
},
"active": {
"type": "boolean"
}
},
"required": [
"window",
"color"
]
}
}
},
"required": [
"channels"
]
}
},
"required": [ "multiscales" ]
}
Loading