-
-
Notifications
You must be signed in to change notification settings - Fork 327
Use config to select implementation #1982
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use config to select implementation #1982
Conversation
# Conflicts: # src/zarr/array.py # src/zarr/config.py # src/zarr/group.py # src/zarr/metadata.py # tests/v3/test_config.py
# Conflicts: # src/zarr/abc/codec.py # src/zarr/array.py # src/zarr/codecs/pipeline.py # src/zarr/codecs/sharding.py # src/zarr/codecs/transpose.py # src/zarr/metadata.py # tests/v3/test_indexing.py
…o use-config-to-select-codecs
# Conflicts: # src/zarr/store/remote.py # tests/v3/test_buffer.py
@madsbk I was wondering what you think about overriding the default_buffer_prototype via config. Do you think that is a good idea or unneccessary? |
# Conflicts: # src/zarr/codecs/registry.py # src/zarr/indexing.py
I think it would a good idea, or maybe add a default attribute to each |
Do you mean to have additionally to the prototype in |
Yes exactly, but I see your point, it might be a bit too many fall backs :) In any case, if we allow modification of |
good point! @madsbk |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@brokkoli71 - thank for pushing this forward. Most everything is looking great! Just a few comments and requests for documentation.
src/zarr/registry.py
Outdated
entry_points = get_entry_points() | ||
for e in entry_points.select(group="zarr.codecs"): | ||
__lazy_load_codecs[e.name] = e | ||
for e in entry_points.select(group="zarr"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens in multiple libraries in my env declare a buffer
class? How will I be able to pick the right one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you mean multiple implementations per entrypoint?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, multiple implementations of the same codec would overwrite each other in the __lazy_load_codecs dict
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that needs to change. The idea is that multiple libraries can provide implementations for a specific codec (e.g. CPU- and GPU-based gzip) via entrypoints and the user can select one of these implementations via config.
I was also concerned about the other entrypoints (e.g. buffer, pipeline). It would be nice if libraries could declare more than one of each. Again, the user could then select the class via the config through the fully-qualified name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm how would one specify that in entry_points.txt
?
my ideas were
- seperate different implementations comma-separated like
[zarr.codecs]
gzip = package:EntrypointGzipCodec1, package:EntrypointGzipCodec2
[zarr]
buffer = package:TestEntrypointBuffer, package:AnotherTestEntrypointBuffer
but that would deviate from the PEP 508 standard of entry_points.txt
- provide names for different implementations (which do not have an effect) and have a group for buffer, ndbuffer, pipeline and every Codec
[zarr.codecs.gzip]
some_name = package:EntrypointGzipCodec1
another = package:EntrypointGzipCodec2
[zarr.buffer]
some_name = package:TestEntrypointBuffer
another = package:AnotherTestEntrypointBuffer
I'd prefer the second but I dont like having random names for each implementation that will not be used at parsing
any other ideas?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be possible to support both single items (e.g. zarr.buffer = package:TestEntrypointBuffer
) and groups (e.g. zarr.buffer = { gpu_buffer = package:TestEntrypointBuffer, cpu_buffer = package:AnotherTestEntrypointBuffer }
)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good idea! allowed syntax is now all of the following:
[zarr.codecs]
gzip = package:EntrypointGzipCodec1
[zarr.codecs.gzip]
some_name = package:EntrypointGzipCodec2
another = package:EntrypointGzipCodec3
[zarr]
buffer = package:TestBuffer1
[zarr.buffer]
xyz = package:TestBuffer2
abc = package:TestBuffer3
# Conflicts: # tests/v3/conftest.py
"json_indent": 2, | ||
"codec_pipeline": { | ||
"path": "zarr.codecs.pipeline.BatchedCodecPipeline", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the back and forth on this!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I am proposing is
{
"array": {"order": "C"},
"async": {"concurrency": None, "timeout": None},
"json_indent": 2,
"codec_pipeline": {
"path": "zarr.codecs.pipeline.BatchedCodecPipeline",
"batch_size": 1,
},
"codecs": {
"blosc": { "path": "zarr.codecs.blosc.BloscCodec" },
"gzip": { "path": "zarr.codecs.gzip.GzipCodec" },
"zstd": { "path": "zarr.codecs.zstd.ZstdCodec" },
"bytes": { "path": "zarr.codecs.bytes.BytesCodec" },
"endian": { "path": "zarr.codecs.bytes.BytesCodec" }, # compatibility with earlier versions of ZEP1
"crc32c": { "path": "zarr.codecs.crc32c_.Crc32cCodec" },
"sharding_indexed": { "path": "zarr.codecs.sharding.ShardingCodec" },
"transpose": { "path": "zarr.codecs.transpose.TransposeCodec" },
},
"buffer": "zarr.buffer.Buffer",
"ndbuffer": "zarr.buffer.NDBuffer",
}
Co-authored-by: Norman Rzepka <code@normanrz.com>
# Conflicts: # src/zarr/metadata.py
fixes #1878
Using the config (https://github.com/pytroll/donfig), the user can specify now the implementation of all codecs, the CodecPipeline, Buffer and NDBuffer.
For each of these objects, the codec registry can deal with multiple different implementations and will use the one selected by the config.
Further changes:
zarr.codecs.registry
tozarr.registry