Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Try to converge on internal types consistency. #574

Closed
wants to merge 1 commit into from

Conversation

Carreau
Copy link
Contributor

@Carreau Carreau commented Jul 1, 2020

While it is convenient for user on high level API to be able to pass
arround many types (array, bytes, or even dictionary, list, strings),
It can make it relatively hard to follow internally if the types are not
consistant.

In particular this can relatively complicate weaving the v3 spec into
the core of Zarr and add a number of conditional if we are not sure of
the type we get.

This try to be a little more consistent in what the implementation
emits right now this try to make sure that all the core only try to
converge toward bytes, so that store users can expect to alway get bytes
back from Zarr-Python stores, and as much as possible give bytes back,
and mostly affects the ConsolidatedMetadataStore which used to – unlike
other store – return objects instead of bytes, as well as the tests
directly testing the stores.

We of course can't be too strict as some existing store may have stored
some values as not-bytes (pickle), and we still want to support this for
the time being,but with this changes, we can add many assert isinstance(x, bytes_like) in many places ans still have the test suite
to pass.

In particular in all the decode/encode metadata functions, and Group
class.

One extra change in test is the removal of an old conditional import of
h5py in the test suite which is unnecessary since introduction of the
pytest's importorskip for h5py tests.

[Description of PR]

TODO:

  • Add unit tests and/or doctests in docstrings
  • Add docstrings and API docs for any new/modified user-facing classes and functions
  • New/modified features documented in docs/tutorial.rst
  • Changes documented in docs/release.rst
  • AppVeyor and Travis CI passes
  • Test coverage is 100% (Coveralls passes)

While it is convenient for user on high level API to be able to pass
arround many types (array, bytes, or even dictionary, list, strings),
It can make it relatively hard to follow internally if the types are not
consistant.

In particular this can relatively complicate weaving the v3 spec into
the core of Zarr and add a number of conditional if we are not sure of
the type we get.

This try to be a little more consistent in what the implementation
_emits_ right now this try to make sure that all the core only try to
converge toward bytes, so that store users can expect to alway get bytes
back from Zarr-Python stores, and as much as possible give bytes back,
and mostly affects the ConsolidatedMetadataStore which used to – unlike
other store – return objects instead of bytes, as well as the tests
directly testing the stores.

We of course can't be too strict as some existing store may have stored
some values as not-bytes (pickle), and we still want to support this for
the time being,but with this changes, we can add many `assert
isinstance(x, bytes_like)` in many places ans still have the test suite
to pass.

In particular in all the decode/encode metadata functions, and Group
class.

One extra change in test is the removal of an old conditional import of
h5py in the test suite which is unnecessary since introduction of the
pytest's `importorskip` for h5py tests.
@Carreau
Copy link
Contributor Author

Carreau commented Jul 1, 2020

Extra notes, returning object with Consolidated metadata mean that you actually get reference to mutable objects and have a risk of modifying the internal state of the consolidated store if you don't pay attention.

Would it be ok in a separate PR to add a "strict" mode (opt-in), that would trigger a bunch of assert isinstance(xxx, bytes_likes) or similar in a couple of places to help with type stability ?

@jakirkham
Copy link
Member

Sorry for not raising this yesterday. I'm a bit unsure about this as we have uses cases where we would like to handle other non-bytes objects like mmap ( #377 ) and CuPy ndarrays ( zarr-developers/numcodecs#212 ) ( #501 ).

@Carreau
Copy link
Contributor Author

Carreau commented Jul 6, 2020

I think it would be ok to ensure only bytes-like in python. My main problem is to poke at it to figure out whether it's already decoded or not.

Copy link
Member

@alimanfoo alimanfoo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @Carreau, in general agree with spirit of trying to achieve type consistency, but a couple of points worth some discussion here.

@@ -1489,7 +1489,7 @@ def _set_basic_selection_zd(self, selection, value, fields=None):
chunk[selection] = value

# encode and store
cdata = self._encode_chunk(chunk)
cdata = ensure_bytes(self._encode_chunk(chunk))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

N.B., ensuring bytes may introduce a memory copy, which may be unnecessary. Memory copies can noticeably affect performance. Ensuring we have something that supports the buffer interface would be better.

Comment on lines +2514 to +2516
self.meta_store = {}
for k, v in meta["metadata"].items():
self.meta_store[k] = json_dumps(v)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW this was originally how this worked, but this introduces an extra unnecessary JSON encode/decode round trip into the processing of consolidated metadata. In practice that may not be a performance issue, but it seemed awkward to have this extra JSON encode/decode if it wasn't necessary. Hence I introduced a special case which avoided the necessity for this, see the parse_metadata function in the original PR.

@Carreau Carreau marked this pull request as draft September 9, 2020 17:13
@joshmoore
Copy link
Member

I assume this is now handled by #898 et al. cc: @grlee77

@joshmoore joshmoore closed this May 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants