zarr-developers · MSanKeys963 · Aug 17, 2023 · Feb 4, 2024 · Feb 4, 2024 · Mar 26, 2024
diff --git a/.gitignore b/.gitignore
@@ -6,3 +6,5 @@ docs/_build
 
 # pycharm
 .idea
+
+.DS_Store
diff --git a/docs/conventions/index.rst b/docs/conventions/index.rst
@@ -0,0 +1,141 @@
+===========
+Conventions
+===========
+
+Why Conventions?
+~~~~~~~~~~~~~~~~
+
+Zarr Conventions provide a mechanism to standardize metadata and layout of Zarr data
+in order to meet domain-specific application needs without changes to the
+core Zarr data model and specification, and without specification extensions.
+
+Conventions must fit completely within the Zarr data / metadata model of groups, arrays, and attributes thereof, requiring
+no changes or extension to the specification.
+A Zarr implementation itself should not even be aware of the existence of the convention.
+The line between a convention and an extension may be blurry in some cases.
+The key distinction lies in the implementation: the responsibility for interpreting a *convention* relies completely with downstream,
+domain-specific software, while an *extension* must be handled by the Zarr implementation itself.
+A good rule of thumb is that a user should be able to safely ignore the convention and still be able to interact with the data via the core Zarr library,
+even if some domain-specific context or functionality is missing.
+If the data are completely meaningless or unintelligible without the convention, then it should be an extension instead.
+
+Conventions can also help users switch between different storage libraries more flexibly.
+Since Zarr and HDF5 implement nearly identical data models, a single convention can be applied to both formats.
+This allows downstream software to maintain better separation of concerns between storage and domain-specific logic.
+
+Conventions are modular and composable. A single group or array can conform to multiple conventions.
+
+
+Describing Conventions
+~~~~~~~~~~~~~~~~~~~~~~
+
+Conventions Document
+--------------------
+
+Conventions are described by a *convention document*.
+TODO: say more about the structure and format of this document
+
+Explicit Conventions
+--------------------
+
+The preferred way of identifying the presence of a convention in a Zarr group or array is via the attribute `zarr_conventions`.
+This attribute must be an array of strings; each string is an identifier for the convention.
+Multiple conventions may be present.
+
+For example, a group metadata JSON document with conventions present might look like this
+
+.. code-block:: json
+
+   {
+      "zarr_format": 3,
+      "node_type": "group",
+      "attributes": {
+         "zarr_conventions": ["units-v1", "foo"],
-         "zarr_conventions": ["units-v1", "foo"],
+         "zarr_conventions": {
+           "units": {
+              "version": 1, 
+              "homepage": " ... URL which has potential to describe what that is about ...",
+              "schema_url": "... hosted somewhere ..."
+            },
+          "foo":  {}
+          },
-         "zarr_conventions": ["units-v1", "foo"],
+         "zarr_conventions": {
+           "units": {
+              "version": 1, 
+              "homepage": " ... URL which has potential to describe what that is about ...",
+              "schema_url": "... hosted somewhere ..."
+            },
+          "foo":  {}
+          },
+      }
+   }
+
+where `units-v1` and `bar` are the convention identifiers.
+
+
+Legacy Conventions
+------------------
+
+A legacy convention is a convention already in use that predates this ZEP.
+Data conforming to legacy conventions will not have the `zarr_conventions` attribute.
+The conventions document must therefore specify how software can identify the presence of the convention through a series of rules or tests.
+
+For those comfortable with the terminology, legacy conventions can be thought of as a "conformance class" and a corresponding "conformance test".
+
+Namespacing
+-----------
+
+Conventions may choose to store their attributes on a specific namespace.
+This ZEP does not specify how namespacing works; that is up to the convention.
+For example, the namespace may be specified as a prefix on attributes, e.g.
+
+.. code-block:: json
+
+   {
+      "attributes": {"units-v1:units": "m^2"}
+   }
+
+
+or via a nested JSON object, e.g.
+
+.. code-block:: json
+
+   {
+      "attributes": {"units-v1": {"units: "m^2"}}
+   }
+
+The use of namespacing is optional and is up to the convention to decide.
+
+
+Proposing Conventions
+~~~~~~~~~~~~~~~~~~~~~
+
+New conventions are proposed via a pull-request to the `zarr-specs` repo which adds a new conventions document.
+If the convention is already documented elsewhere, the convention document can just contain a reference to the external documentation.
+The author of the PR is expected to convene the relevant domain community to review and discuss the ZEP.
+This includes posting a link to the PR on relevant forums, mailing lists, and social-media platforms.
+
+The goal of the discussion is to reach a _consensus_ among the domain community regarding the convention.
+The Zarr steering council, together with the PR author, will determine if a consensus has been reached, at which point the PR
+can be merged and the convention published on the website.
+If a consensus cannot be reached, the steering council may still decide to publish the convention, accompanied by a
+disclaimer that it is not a consensus, and noting any objections that were raised during the discussion.
+
+It is also possible that multiple, competing conventions exist in the same domain. While not ideal, it's not up to
+the Zarr community to resolve such domain-specific debates.
+These conventions should still be documented in a central location, which hopefully helps move towards alignment.
+
+Conventions should be versioned using incremental integers, starting from 1.
+Or, if the community already has an existing versioning system for their convention, that can be used instead (e.g. CF conventions).
+The community is free to update their convention via a pull request using the same consensus process described above.
+The conventions document should include a changelog.
+Details of how to manage changes and backwards compatibility are left to the domain community.
+
+
+Existing Conventions
+~~~~~~~~~~~~~~~~~~~~
+
+
+This page lists the Zarr conventions. The proposal to formalize the conventions is introduced in `ZEP0004 <https://zarr.dev/zeps/draft/ZEP0004.html>`_.
+
+Some of the widely used conventions are:
+
+- `GDAL <https://gdal.org/drivers/raster/zarr.html>`_
+- `OME-NGFF <https://ngff.openmicroscopy.org/>`_
+- `NCZarr <https://docs.unidata.ucar.edu/nug/current/nczarr_head.html>`_
+- `Xarray <https://docs.xarray.dev/en/stable/internals/zarr-encoding-spec.html>`_
+
+Any new conventions accepted by the `ZEP <https://zarr.dev/zeps/active/ZEP0000.html>`_ process will be listed here.
+
+.. toctree::
+   :glob:
+   :maxdepth: 1
+   :titlesonly:
+   :caption: Contents:
+
+   xarray
+
diff --git a/docs/conventions/xarray.rst b/docs/conventions/xarray.rst
@@ -0,0 +1,99 @@
+======================
+Xarray Zarr Convention
+======================
+
++---------------------+----------------------+
+| Convention Type     | Legacy               |
++---------------------+----------------------+
+| Zarr Spec Versions  | V2                   |
++---------------------+----------------------+
+| Status              | Active               |
++---------------------+----------------------+
+| Active Dates        | 2018 - present       |
++---------------------+----------------------+
+| Version             | 1                    |
++---------------------+----------------------+
+
+See also `Zarr Encoding Specification <https://docs.xarray.dev/en/latest/internals/zarr-encoding-spec.html>`_
+in the Xarray docs.
+
+
+Description
+-----------
+
+`Xarray`_ is a Python library for working with labeled multi-dimensional arrays.
+Xarray was originally designed to read only `NetCDF`_ files, but has since added support for
+other formats.
+In implementing support for the `Zarr <https://zarr.dev>`_ storage format, Xarray developers
+made some *ad hoc* choices about how to store NetCDF-style data in Zarr.
+These choices have become a de facto convention for mapping the Zarr data model to the
+`NetCDF data model <https://docs.unidata.ucar.edu/netcdf-c/current/netcdf_data_model.html>`_
+
+First, Xarray can only read and write Zarr groups. There is currently no support
+for reading / writing individual Zarr arrays. Zarr groups are mapped to
+Xarray ``Dataset`` objects, which correspond to NetCDF-4 / HDF5 groups.
+
+Second, from Xarray's point of view, the key difference between
+NetCDF and Zarr is that all NetCDF arrays have *dimension names* while Zarr
+arrays do not. Therefore, in order to store NetCDF data in Zarr, Xarray must
+somehow encode and decode the name of each array's dimensions.
+
+To accomplish this, Xarray developers decided to define a special Zarr array
+attribute: ``_ARRAY_DIMENSIONS``. The value of this attribute is a list of
+dimension names (strings), for example ``["time", "lon", "lat"]``. When writing
+data to Zarr, Xarray sets this attribute on all variables based on the variable
+dimensions. When reading a Zarr group, Xarray looks for this attribute on all
+arrays, raising an error if it can't be found. The attribute is used to define
+the variable dimension names and then removed from the attributes dictionary
+returned to the user.
+
+Because of these choices, Xarray cannot read arbitrary array data, but only
+Zarr data with valid ``_ARRAY_DIMENSIONS`` attributes on each array.
+
+After decoding the ``_ARRAY_DIMENSIONS`` attribute and assigning the variable
+dimensions, Xarray proceeds to [optionally] decode each variable using its
+standard `CF Conventions`_ decoding machinery used for NetCDF data.
+
+Finally, it's worth noting that Xarray writes (and attempts to read)
+"consolidated metadata" by default (the ``.zmetadata`` file), which is another
+non-standard Zarr extension, albeit one implemented upstream in Zarr-Python.
+
+.. _Xarray: http://xarray.dev
+.. _NetCDF: https://www.unidata.ucar.edu/software/netcdf
+.. _CF Conventions: http://cfconventions.org
+
+
+Identifying the Presence of this Convention
+-------------------------------------------
+
+In implementing this convention, Xarray developers made the unfortunate choice of not
+including any explicit identifier in the Zarr metadata. Therefore, the only way to
+determine whether the convention is being used is to attempt to examine contents of the
+Zarr dataset and look for the following properties:
+
+* A single flat group containing one or more arrays
+* The presence of the ``_ARRAY_DIMENSIONS`` attribute on each array, whose contents are
+  a list of dimension names (strings)
+* If the dimension name corresponds to another array name within the group, that array is
+  assumed to be a dimension coordinate. Dimension coordinates arrays must be 1D
+  and have the same length as the corresponding dimension.
+
+
+CF Conventions
+--------------
+
+It is common for data stored in Zarr using the Xarray convention to also follow
+the `Climate and Forecast (CF) Metadata Conventions <CF Conventions>`_.
+
+A high-level description of these conventions, quoted from the CF Documentation is as follows:
+
+    The NetCDF library [NetCDF] is designed to read and write data that has been structured
+    according to well-defined rules and is easily ported across various computer platforms.
+    The netCDF interface enables but does not require the creation of self-describing datasets.
+    The purpose of the CF conventions is to require conforming datasets to contain sufficient
+    metadata that they are self-describing in the sense that each variable in the file has an
+    associated description of what it represents, including physical units if appropriate,
+    and that each value can be located in space (relative to earth-based coordinates) and time.
+
+The CF Conventions are massive and cover a wide range of topics. Readers should consult the
+`CF Conventions`_ documentation for more information.
diff --git a/docs/index.rst b/docs/index.rst
@@ -7,7 +7,7 @@ A good starting point is the :ref:`zarr-core-specification-v3.0`.
 .. toctree::
 
    Home <https://zarr.dev>
-   specs
+   conventions
    ZEPs <https://zarr.dev/zeps>
    Implementations <https://github.com/zarr-developers/zarr_implementations>
 

diff --git a/docs/specs.rst b/docs/specs.rst
@@ -12,6 +12,12 @@ Specifications
    v3/stores
    v3/array-storage-transformers
 
+.. toctree::
+   :maxdepth: 1
+   :caption: Conventions
+
+   Conventions <conventions/index.rst>
+
 .. toctree::
    :maxdepth: 1
    :caption: v2