Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Doc updates for 3.0 release #2568

Open
wants to merge 28 commits into
base: main
Choose a base branch
from

Conversation

jhamman
Copy link
Member

@jhamman jhamman commented Dec 18, 2024

This PR includes a fairly large reorganization of the Zarr-Python documentation in preparation for the 3.0 release. Major changes include:

  • new top level quickstart page
  • break up tutorial.rst. See user-guide/{parts}.rst
    • use ipython directives to run examples at doc build time
    • new user guide sections for advanced topics (async, extensions, etc.)
  • add new site section for developers (contributors guide, roadmap, etc.)

TODOs

  • Get feedback and decide on what to do with the docs in todo.rst. These are mostly features that have not been ported to v3.
  • I would like to migrate the content from Doc/v3 migration guide #2102 into user-guide/whatsnew_v3.rst.

Closes: #1765, #1767, #1770, #1798, #2550

docs/user-guide/todo.rst Outdated Show resolved Hide resolved
@@ -7,7 +7,7 @@ build:

sphinx:
configuration: docs/conf.py
fail_on_warning: true
fail_on_warning: false
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

todo: revert this change before merging!

in the following ways:

1. Writing custom stores
2. Writing custom codecs
Copy link
Member Author

@jhamman jhamman Dec 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@normanrz - could you suggest a short bit of text on how the codec system is meant to work. I will do the same for stores.

docs/index.rst Outdated Show resolved Hide resolved
@dstansby
Copy link
Contributor

This is amazing! But 🙏 please could this PR be split up into smaller bits where possible? That way it will be much easier to review

@dstansby
Copy link
Contributor

could this PR be split up into smaller bits where possible?

To add, I'd be happy to pull out independent bits (e.g., removing the license page) myself if that would be helpful?

@jhamman
Copy link
Member Author

jhamman commented Dec 18, 2024

could this PR be split up into smaller bits where possible?

This is possible but I think the size here is appropriate given the intent and scope of the work. I'm fine with #2570 but before I split things out any further, I'd like to get feedback on the structure of the new docs. I wrote very few new lines here, it is mostly a reshuffle of existing content.


To clarify, at this stage, I'm looking for "30%" feedback. What do you think of the new structure? Do you think this will make the docs easier to navigate, maintain, and/or extend? What other sections should we add (in follow up PRs)?

@jhamman jhamman changed the title docs: major refactor in preparation for the 3.0 release Doc updates for 3.0 release Dec 18, 2024
@dstansby
Copy link
Contributor

Had a look, and I'm 👍 👍 for splitting the tutorial up into sub-pages. I think I'm 👍 on renaming it to "guide" too. So I think at a high level this looks great!

Sorry for my negativity further up - I got a bit spooked by some of the content changes and didn't look past them to the tutorial rearrangement.

@dstansby dstansby added the documentation Improvements to the documentation label Dec 18, 2024
Copy link
Member

@maxrjones maxrjones left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is wonderful @jhamman! I really like the new organization and finally understand vindex vs. oindex 😅

I left a few in-line suggestions and have some general suggestions below, but this would be good to go after a bit of cleanup even if you don't take any of them.

  • I'm not sure the comparisons to h5py add a lot of value. I would put them as Notes after rather than in the main part of the explanation, or just remove them to not assume familiarity with h5py.
  • I think using more subsections in groups would be helpful, similar to arrays (i.e., Creating groups, Persistent Groups)
  • I think Attributes would be better off as a separate section rather than nested under "working with groups" since it pertains to groups and arrays
  • Since storage is implicit in the arrays and groups sections, I recommend renaming "storage section" to "working with storage backends" or "working with storage"
  • I assume the TODOs in array compression are known issues?
  • I thought extensibility was a major reason for Zarr V3? I think it's worth being as comprehensive as possible in the motivations list of the migration guide since the major version bump will place work on users.
  • In the v3 migration guide, it would be extra nice if you could include a pointer to those who rely on the deprecated stores on what to do. Perhaps a link to the extension page?
  • It's worth being super clear up-front that the migration page isn't comprehensive. E.g., aren't there some small but meaningful changes like listdir() -> list_dir?
  • There are some existing references to RemoteStore that are beyond where GH supports in-line suggestions

zarr.load('data/example.zarr')

Please note that there are a number of other options for persistent array
storage, see the section on :ref:`tutorial_storage` below.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
storage, see the section on :ref:`tutorial_storage` below.
storage, see the :ref:`tutorial_storage` guide for more details.

Indexing with a mask array
~~~~~~~~~~~~~~~~~~~~~~~~~~

.. Items can also be extracted by providing a Boolean mask. E.g.:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
.. Items can also be extracted by providing a Boolean mask. E.g.:
Items can also be extracted by providing a Boolean mask. E.g.:

This sentence wasn't being rendered in the docs

Comment on lines 90 to 91
The :class:`zarr.storage.FsspecStore` a in-memory store that allows for serialization of
Zarr data (metadata and chunks) to a dictionary.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The :class:`zarr.storage.FsspecStore` a in-memory store that allows for serialization of
Zarr data (metadata and chunks) to a dictionary.
The :class:`zarr.storage.MemoryStore` allows for serialization of Zarr data (metadata and chunks) to an im-memory dictionary.

>>> zarr.open(store=store)
<Group file://data/foo/bar>
store = zarr.storage.LocalStore("data/foo/bar", read_only=True)
zarr.open(store=store, mode='r')

Zip Store
~~~~~~~~~

The :class:`zarr.storage.ZipStore` stores the contents of a Zarr hierarchy in a single
Zip file. The `Zip Store specification_` is currently in draft form.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs a link in the form of Zip Store specification <insert link>_

logical layout as the ``LocalStore``, except the store is assumed to be on a remote storage system
such as cloud object storage (e.g. AWS S3, Google Cloud Storage, Azure Blob Store). The
:class:`zarr.storage.RemoteStore` is backed by `Fsspec_` and can support any Fsspec backend
that implements the `AbstractFileSystem` API,
:class:`zarr.storage.FsspecStore` is backed by `Fsspec_` and can support any Fsspec backend
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
:class:`zarr.storage.FsspecStore` is backed by `Fsspec_` and can support any Fsspec backend
:class:`zarr.storage.FsspecStore` is backed by `Fsspec <https://filesystem-spec.readthedocs.io/en/latest/>`_ and can support any Fsspec backend


The goals described above necessitated some breaking changes to the API (hence the
major version update), but we have attempted to maintain ~95% backwards compatibility
in the most widely used parts of the API. This in the :class:`zarr.Array` and
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
in the most widely used parts of the API. This in the :class:`zarr.Array` and
in the most widely used parts of the API. This includes the :class:`zarr.Array` and

The Array class
~~~~~~~~~~~~~~~

1. Disallow direct construction - use :func:`zarr.open_array` or :func:`zarr.create_array`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
1. Disallow direct construction - use :func:`zarr.open_array` or :func:`zarr.create_array`
1. Disallow direct construction - use :func:`zarr.open_array` or :func:`zarr.Group.create_array`

The Group class
~~~~~~~~~~~~~~~

1. Disallow direct construction - use :func:`zarr.open_group` or :func:`zarr.create_group`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
1. Disallow direct construction - use :func:`zarr.open_group` or :func:`zarr.create_group`
1. Disallow direct construction - use :func:`zarr.open_group` or :func:`zarr.Group.create_group`


- The following options in the top-level API have not been ported to Zarr-Python 3 yet.
If these options are important to you, please open a
`GitHub issue <https://github.com/zarr-developers/zarr-python/issues/new>` describing
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
`GitHub issue <https://github.com/zarr-developers/zarr-python/issues/new>` describing
`GitHub issue <https://github.com/zarr-developers/zarr-python/issues/new>`_ describing

Migration Guide
---------------

The following sections provide details on the most important changes in Zarr-Python 3.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should add that zarr_format=3 is the new default.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another thing that is missing, is the new zarr.config mechanism

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added some text for these items. Feel free to edit liberally.


.. ipython:: python

z1 = zarr.open('data/example.zarr', mode='w', shape=(10000, 10000), chunks=(1000, 1000), dtype='i4')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we replace all uses of zarr.open, zarr.array, zarr.group in favor of zarr.{create,open}_{array,group} once #2463 lands?


.. _tutorial_compress:

Compressors
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should add a section for sharding.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

top-level note on the differences between V2 and V3 would be good. (I'm sure this is on your todo list, just a reminder)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is sharding? 😆

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a Sharding section with a "coming soon" label.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like someone has already added some docs for the config in main in the meantime.

Comment on lines 64 to 66
z[:, :] = np.random.random((100, 100))

print("Array stored at 'data/example-1.zarr'")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
z[:, :] = np.random.random((100, 100))
print("Array stored at 'data/example-1.zarr'")
# write to the array
z[:, :] = np.random.random((100, 100))


import zarr

z = zarr.zeros((10000, 10000), chunks=(1000, 1000), dtype='i4')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
z = zarr.zeros((10000, 10000), chunks=(1000, 1000), dtype='i4')
z = zarr.open((10000, 10000), chunks=(1000, 1000), dtype='i4')

I don't think the first example on this page should initialize the data at creation. The next section talks about writing values, so simply open or whatever.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also I think the "persistent storage" bit should be moved up here. Most new users are looking for a way to store data on disk, starting off with MemoryStore is confusing (from my personal experience)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. The main point of using zarr over numpy is persistence


.. _tutorial_compress:

Compressors
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

top-level note on the differences between V2 and V3 would be good. (I'm sure this is on your todo list, just a reminder)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The link to the spec in the first line is broken.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements to the documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[v3] Documentation: Update developer docs
6 participants