Skip to content

Add links to numcodecs docs in tutorial #1535

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Oct 31, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -331,6 +331,7 @@ def setup(app):
intersphinx_mapping = {
"python": ("https://docs.python.org/", None),
"numpy": ("https://numpy.org/doc/stable/", None),
"numcodecs": ("https://numcodecs.readthedocs.io/en/stable/", None),
}


Expand Down
8 changes: 8 additions & 0 deletions docs/release.rst
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,12 @@ Release notes
Unreleased
----------

Docs
~~~~

* Add links to ``numcodecs`` docs in the tutorial.
By :user:`David Stansby <dstansby>` :issue:`1535`.

Maintenance
~~~~~~~~~~~

Expand All @@ -33,6 +39,8 @@ Maintenance
* Allow ``black`` code formatter to be run with any Python version.
By :user:`David Stansby <dstansby>` :issue:`1549`.



.. _release_2.16.1:

2.16.1
Expand Down
17 changes: 9 additions & 8 deletions docs/tutorial.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1175,8 +1175,9 @@ A fixed-length unicode dtype is also available, e.g.::
For variable-length strings, the ``object`` dtype can be used, but a codec must be
provided to encode the data (see also :ref:`tutorial_objects` below). At the time of
writing there are four codecs available that can encode variable length string
objects: :class:`numcodecs.VLenUTF8`, :class:`numcodecs.JSON`, :class:`numcodecs.MsgPack`.
and :class:`numcodecs.Pickle`. E.g. using ``VLenUTF8``::
objects: :class:`numcodecs.vlen.VLenUTF8`, :class:`numcodecs.json.JSON`,
:class:`numcodecs.msgpacks.MsgPack`. and :class:`numcodecs.pickles.Pickle`.
E.g. using ``VLenUTF8``::

>>> import numcodecs
>>> z = zarr.array(text_data, dtype=object, object_codec=numcodecs.VLenUTF8())
Expand All @@ -1201,8 +1202,8 @@ is a short-hand for ``dtype=object, object_codec=numcodecs.VLenUTF8()``, e.g.::
'Helló, világ!', 'Zdravo svete!', 'เฮลโลเวิลด์'], dtype=object)

Variable-length byte strings are also supported via ``dtype=object``. Again an
``object_codec`` is required, which can be one of :class:`numcodecs.VLenBytes` or
:class:`numcodecs.Pickle`. For convenience, ``dtype=bytes`` (or ``dtype=str`` on Python
``object_codec`` is required, which can be one of :class:`numcodecs.vlen.VLenBytes` or
:class:`numcodecs.pickles.Pickle`. For convenience, ``dtype=bytes`` (or ``dtype=str`` on Python
2.7) can be used as a short-hand for ``dtype=object, object_codec=numcodecs.VLenBytes()``,
e.g.::

Expand All @@ -1218,7 +1219,7 @@ e.g.::
b'\xe0\xb9\x80\xe0\xb8\xae\xe0\xb8\xa5\xe0\xb9\x82\xe0\xb8\xa5\xe0\xb9\x80\xe0\xb8\xa7\xe0\xb8\xb4\xe0\xb8\xa5\xe0\xb8\x94\xe0\xb9\x8c'], dtype=object)

If you know ahead of time all the possible string values that can occur, you could
also use the :class:`numcodecs.Categorize` codec to encode each unique string value as an
also use the :class:`numcodecs.categorize.Categorize` codec to encode each unique string value as an
integer. E.g.::

>>> categorize = numcodecs.Categorize(greetings, dtype=object)
Expand All @@ -1245,7 +1246,7 @@ The best codec to use will depend on what type of objects are present in the arr

At the time of writing there are three codecs available that can serve as a general
purpose object codec and support encoding of a mixture of object types:
:class:`numcodecs.JSON`, :class:`numcodecs.MsgPack`. and :class:`numcodecs.Pickle`.
:class:`numcodecs.json.JSON`, :class:`numcodecs.msgpacks.MsgPack`. and :class:`numcodecs.pickles.Pickle`.

For example, using the JSON codec::

Expand All @@ -1258,7 +1259,7 @@ For example, using the JSON codec::
array([42, 'foo', list(['bar', 'baz', 'qux']), {'a': 1, 'b': 2.2}, None], dtype=object)

Not all codecs support encoding of all object types. The
:class:`numcodecs.Pickle` codec is the most flexible, supporting encoding any type
:class:`numcodecs.pickles.Pickle` codec is the most flexible, supporting encoding any type
of Python object. However, if you are sharing data with anyone other than yourself, then
Pickle is not recommended as it is a potential security risk. This is because malicious
code can be embedded within pickled data. The JSON and MsgPack codecs do not have any
Expand All @@ -1270,7 +1271,7 @@ Ragged arrays

If you need to store an array of arrays, where each member array can be of any length
and stores the same primitive type (a.k.a. a ragged array), the
:class:`numcodecs.VLenArray` codec can be used, e.g.::
:class:`numcodecs.vlen.VLenArray` codec can be used, e.g.::

>>> z = zarr.empty(4, dtype=object, object_codec=numcodecs.VLenArray(int))
>>> z
Expand Down