Use config to select implementation #1982

brokkoli71 · 2024-06-20T14:26:54Z

Using the config (https://github.com/pytroll/donfig), the user can specify now the implementation of all codecs, the CodecPipeline, Buffer and NDBuffer.
For each of these objects, the codec registry can deal with multiple different implementations and will use the one selected by the config.

Further changes:

All calls on classes Buffer and NDBuffer now get called on selected Implementation
Registry was expanded to register codec-pipelines, buffers and ndbuffers
Moved registry.py from zarr.codecs.registry to zarr.registry

# Conflicts: # src/zarr/array.py # src/zarr/config.py # src/zarr/group.py # src/zarr/metadata.py # tests/v3/test_config.py

# Conflicts: # src/zarr/abc/codec.py # src/zarr/array.py # src/zarr/codecs/pipeline.py # src/zarr/codecs/sharding.py # src/zarr/codecs/transpose.py # src/zarr/metadata.py # tests/v3/test_indexing.py

…o use-config-to-select-codecs

# Conflicts: # src/zarr/store/remote.py # tests/v3/test_buffer.py

normanrz · 2024-06-27T13:12:20Z

@madsbk I was wondering what you think about overriding the default_buffer_prototype via config. Do you think that is a good idea or unneccessary?

# Conflicts: # src/zarr/codecs/registry.py # src/zarr/indexing.py

madsbk · 2024-06-28T11:04:21Z

@madsbk I was wondering what you think about overriding the default_buffer_prototype via config. Do you think that is a good idea or unneccessary?

I think it would a good idea, or maybe add a default attribute to each AsyncArray instance, which would be set to the config value if not specified when creating the array?

brokkoli71 · 2024-06-28T20:55:46Z

@madsbk

I think it would a good idea, or maybe add a default attribute to each AsyncArray instance, which would be set to the config value if not specified when creating the array?

Do you mean to have additionally to the BufferPrototype parameter in e.g. setitem another fallback BufferPrototype stored in the AsyncArray instance which might get set upon creation of the array? So the decision of which buffer to use would be like:

prototype in setitem → prototype in AsyncArray instance → config → numpy
(with "→" being the fallback if previous was not set)

madsbk · 2024-06-29T09:05:18Z

Yes exactly, but I see your point, it might be a bit too many fall backs :)

In any case, if we allow modification of default_buffer_prototype, I think we need an another constant like numpy_buffer_prototype that is always backed by a numpy array for internal use. E.g. when reading the shard index, we always want to use numpy: https://github.com/zarr-developers/zarr-python/blob/v3/src/zarr/codecs/sharding.py#L610

brokkoli71 · 2024-07-01T11:30:29Z

good point! @madsbk

jhamman

@brokkoli71 - thank for pushing this forward. Most everything is looking great! Just a few comments and requests for documentation.

src/zarr/abc/codec.py

src/zarr/config.py

tests/v3/test_buffer.py

tests/v3/test_config.py

src/zarr/config.py

src/zarr/registry.py

normanrz · 2024-07-08T09:53:51Z

src/zarr/registry.py

+    entry_points = get_entry_points()
+    for e in entry_points.select(group="zarr.codecs"):
+        __lazy_load_codecs[e.name] = e
+    for e in entry_points.select(group="zarr"):


What happens in multiple libraries in my env declare a buffer class? How will I be able to pick the right one?

do you mean multiple implementations per entrypoint?

oh, multiple implementations of the same codec would overwrite each other in the __lazy_load_codecs dict

I think that needs to change. The idea is that multiple libraries can provide implementations for a specific codec (e.g. CPU- and GPU-based gzip) via entrypoints and the user can select one of these implementations via config.

I was also concerned about the other entrypoints (e.g. buffer, pipeline). It would be nice if libraries could declare more than one of each. Again, the user could then select the class via the config through the fully-qualified name.

hmm how would one specify that in entry_points.txt?
my ideas were

seperate different implementations comma-separated like

[zarr.codecs] gzip = package:EntrypointGzipCodec1, package:EntrypointGzipCodec2 [zarr] buffer = package:TestEntrypointBuffer, package:AnotherTestEntrypointBuffer

but that would deviate from the PEP 508 standard of entry_points.txt

provide names for different implementations (which do not have an effect) and have a group for buffer, ndbuffer, pipeline and every Codec

[zarr.codecs.gzip] some_name = package:EntrypointGzipCodec1 another = package:EntrypointGzipCodec2 [zarr.buffer] some_name = package:TestEntrypointBuffer another = package:AnotherTestEntrypointBuffer

I'd prefer the second but I dont like having random names for each implementation that will not be used at parsing
any other ideas?

Would it be possible to support both single items (e.g. zarr.buffer = package:TestEntrypointBuffer) and groups (e.g. zarr.buffer = { gpu_buffer = package:TestEntrypointBuffer, cpu_buffer = package:AnotherTestEntrypointBuffer })?

good idea! allowed syntax is now all of the following:

[zarr.codecs] gzip = package:EntrypointGzipCodec1 [zarr.codecs.gzip] some_name = package:EntrypointGzipCodec2 another = package:EntrypointGzipCodec3 [zarr] buffer = package:TestBuffer1 [zarr.buffer] xyz = package:TestBuffer2 abc = package:TestBuffer3

src/zarr/config.py

# Conflicts: # tests/v3/conftest.py

normanrz · 2024-07-10T13:44:53Z

src/zarr/config.py

            "json_indent": 2,
+            "codec_pipeline": {
+                "path": "zarr.codecs.pipeline.BatchedCodecPipeline",


Seeing this, I wonder if we should use an object with a path (or class?) for codecs as well. While we don't need it right now, this could be useful in the future for backwards-compat. We could also add that in the future, though.
cc @jhamman @d-v-b

Sorry for the back and forth on this!

What I am proposing is

{ "array": {"order": "C"}, "async": {"concurrency": None, "timeout": None}, "json_indent": 2, "codec_pipeline": { "path": "zarr.codecs.pipeline.BatchedCodecPipeline", "batch_size": 1, }, "codecs": { "blosc": { "path": "zarr.codecs.blosc.BloscCodec" }, "gzip": { "path": "zarr.codecs.gzip.GzipCodec" }, "zstd": { "path": "zarr.codecs.zstd.ZstdCodec" }, "bytes": { "path": "zarr.codecs.bytes.BytesCodec" }, "endian": { "path": "zarr.codecs.bytes.BytesCodec" }, # compatibility with earlier versions of ZEP1 "crc32c": { "path": "zarr.codecs.crc32c_.Crc32cCodec" }, "sharding_indexed": { "path": "zarr.codecs.sharding.ShardingCodec" }, "transpose": { "path": "zarr.codecs.transpose.TransposeCodec" }, }, "buffer": "zarr.buffer.Buffer", "ndbuffer": "zarr.buffer.NDBuffer", }

src/zarr/registry.py

Co-authored-by: Norman Rzepka <code@normanrz.com>

# Conflicts: # src/zarr/metadata.py

brokkoli71 added 27 commits June 17, 2024 16:46

make codec pipeline implementation configurable

bdf58a6

add test_config_codec_pipeline_class_in_env

e2a5e11

make codec implementation configurable

311cfcc

remove snake case support for class names in config

364c403

use registry for codec pipeline config

9e94e36

typing

11f184d

load codec pipeline from entrypoints

216a5d4

test if configured codec implementation and codec pipeline is used

2a3b7ea

make ndbuffer implementation configurable

02d1f6e

fix circular import

6467149

change class method calls on NDBuffer to use get_ndbuffer_class()

460f853

make buffer implementation configurable

acc7f17

Merge branch 'refs/heads/master' into use-config-to-select-codecs

bbb8822

# Conflicts: # src/zarr/array.py # src/zarr/config.py # src/zarr/group.py # src/zarr/metadata.py # tests/v3/test_config.py

format

b060722

fix tests

1ca197d

ignore mypy in tests

26329f6

add test to lazy load (nd)buffer from entrypoint

9601a6f

better assertion message

ffe5832

Merge branch 'refs/heads/master' into use-config-to-select-codecs

556eed6

# Conflicts: # src/zarr/abc/codec.py # src/zarr/array.py # src/zarr/codecs/pipeline.py # src/zarr/codecs/sharding.py # src/zarr/codecs/transpose.py # src/zarr/metadata.py # tests/v3/test_indexing.py

fix merge

d07a127

fix merge

7448f36

Merge remote-tracking branch 'origin/use-config-to-select-codecs' int…

5142d95

…o use-config-to-select-codecs

formatting

57ad3b4

fix mypy

0b2cf9a

fix ruff formatting

4f6d690

Merge branch 'refs/heads/master' into use-config-to-select-codecs

3b34b60

# Conflicts: # src/zarr/store/remote.py # tests/v3/test_buffer.py

fix merge

96676b7

normanrz self-requested a review June 27, 2024 13:09

normanrz assigned brokkoli71 Jun 27, 2024

normanrz added this to the 3.0.0 milestone Jun 27, 2024

brokkoli71 added 2 commits June 27, 2024 15:32

Merge branch 'refs/heads/master' into use-config-to-select-codecs

8b2a60d

# Conflicts: # src/zarr/codecs/registry.py # src/zarr/indexing.py

fix mypy

c098d9a

use numpy_buffer_prototype for reading shard index

97e004b

jhamman reviewed Jul 1, 2024

View reviewed changes

src/zarr/abc/codec.py Outdated Show resolved Hide resolved

src/zarr/config.py Outdated Show resolved Hide resolved

tests/v3/test_buffer.py Outdated Show resolved Hide resolved

tests/v3/test_config.py Outdated Show resolved Hide resolved

src/zarr/config.py Outdated Show resolved Hide resolved

jhamman added the V3 label Jul 1, 2024

jhamman mentioned this pull request Jul 3, 2024

[v3] Release 3.0.0.alpha.1 #2008

Closed

brokkoli71 added 3 commits July 4, 2024 16:15

Merge branch 'refs/heads/master' into use-config-to-select-codecs

efbab6b

rename buffer and entrypoint test-classes

01ab484

document interaction registry and config

9627157

normanrz reviewed Jul 8, 2024

View reviewed changes

brokkoli71 added 9 commits July 8, 2024 13:46

change config prefix from zarr_python to zarr

5e83002

use fully_qualified_name for implementation config

cc5f93c

Merge branch 'refs/heads/master' into use-config-to-select-codecs

885329f

# Conflicts: # tests/v3/conftest.py

refactor registry dicts

ae1023c

fix default_buffer_prototype access in tests

2d89931

allow multiple implementations per entry_point

168efff

add tests for multiple implementations per entry_point

a13e7de

fix DeprecationWarning: SelectableGroups in registry.py

56335e4

fix DeprecationWarning: EntryPoints list interface in registry.py

ca27b1d

brokkoli71 requested a review from normanrz July 10, 2024 09:37

normanrz reviewed Jul 10, 2024

View reviewed changes

brokkoli71 and others added 2 commits July 10, 2024 17:38

clarify _collect_entrypoints docstring

d470ec6

Co-authored-by: Norman Rzepka <code@normanrz.com>

Merge branch 'refs/heads/master' into use-config-to-select-codecs

d210403

# Conflicts: # src/zarr/metadata.py

normanrz merged commit cbc0887 into zarr-developers:v3 Jul 26, 2024
24 checks passed

Uh oh!

Use config to select implementation #1982

Use config to select implementation #1982

Uh oh!

Conversation

brokkoli71 commented Jun 20, 2024

Uh oh!

normanrz commented Jun 27, 2024

Uh oh!

madsbk commented Jun 28, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

brokkoli71 commented Jun 28, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

madsbk commented Jun 29, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

brokkoli71 commented Jul 1, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jhamman left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

madsbk commented Jun 28, 2024 •

edited

Loading

brokkoli71 commented Jun 28, 2024 •

edited

Loading

madsbk commented Jun 29, 2024 •

edited

Loading

brokkoli71 commented Jul 1, 2024 •

edited

Loading