-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update Terminology page to account for multidimensional coordinates #3410
Update Terminology page to account for multidimensional coordinates #3410
Conversation
doc/terminology.rst
Outdated
@@ -27,15 +27,15 @@ Terminology | |||
|
|||
---- | |||
|
|||
**Coordinate:** An array that labels a dimension of another ``DataArray``. Loosely, the coordinate array's values can be thought of as tick labels along a dimension. There are two types of coordinate arrays: *dimension coordinates* and *non-dimension coordinates* (see below). A coordinate named ``x`` can be retrieved from ``arr.coords[x]``. A ``DataArray`` can have more coordinates than dimensions because a single dimension can be assigned multiple coordinate arrays. However, only one coordinate array can be a assigned as a particular dimension's dimension coordinate array. As a consequence, ``len(arr.dims) <= len(arr.coords)`` in general. | |||
**Coordinate:** An array that labels a dimension or set of dimensions of another ``DataArray``. In the one-dimensional case, the coordinate array's values can loosely be thought of as tick labels along a dimension, whereas :doc:`multidimensional coordinates are often used when the data's physical coordinates differ from their logical coordinates <examples/multidimensional-coords>`. There are two types of coordinate arrays: *dimension coordinates* and *non-dimension coordinates* (see below). A coordinate named ``x`` can be retrieved from ``arr.coords[x]``. A ``DataArray`` can have more coordinates than dimensions because a single dimension can be assigned multiple coordinate arrays. However, only one coordinate array can be a assigned as a particular dimension's dimension coordinate array. As a consequence, ``len(arr.dims) <= len(arr.coords)`` in general. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are often used when the data's physical coordinates differ from their logical coordinates
What does this mean? Is it climate-focused? (I'm not sure I have something better in mind yet, though)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was taken from https://xarray.pydata.org/en/latest/examples/multidimensional-coords.html, and it was the best I could think of for a concise explanation. I interpreted it to refer to physical coordinates (like latitude and longitude) that do not always line up with the axes ("logical coordinates") of one's data (which is often the case when working with Earth-based data that are on some grid other than latitude/longitude). I'd be glad to change it to something else though if there are any suggestions!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense. I think it's probably too climate focused atm. I think we could either add for example, in climate datasets...
, or remove that clause, or if anyone has suggestions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think in geoscience datasets
may be more appropriate/general than in climate datasets
(speaking as a meteorologist 😁), but I can easily make that change if no other suggestions come up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since no other suggestions came in, I made this update. It also seemed to work better moving this to the Non-dimension coordinate definition (if I should move it back, just let me know).
|
||
---- | ||
|
||
**Dimension coordinate:** A coordinate array assigned to ``arr`` with both a name and dimension name in ``arr.dims``. Dimension coordinates are used for label-based indexing and alignment, like the index found on a :py:class:`pandas.DataFrame` or :py:class:`pandas.Series`. In fact, dimension coordinates use :py:class:`pandas.Index` objects under the hood for efficient computation. Dimension coordinates are marked by ``*`` when printing a ``DataArray`` or ``Dataset``. | ||
**Dimension coordinate:** A one-dimensional coordinate array assigned to ``arr`` with both a name and dimension name in ``arr.dims``. Dimension coordinates are used for label-based indexing and alignment, like the index found on a :py:class:`pandas.DataFrame` or :py:class:`pandas.Series`. In fact, dimension coordinates use :py:class:`pandas.Index` objects under the hood for efficient computation. Dimension coordinates are marked by ``*`` when printing a ``DataArray`` or ``Dataset``. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
0d2a187
to
98044fb
Compare
Looks great! Any other suggestions before we merge? |
* upstream/master: minor lint tweaks (pydata#3429) Hack around pydata#3440 (pydata#3442) Update Terminology page to account for multidimensional coordinates (pydata#3410) Use cftime master for upstream-dev build (pydata#3439)
…e-multiple-dims * upstream/master: minor lint tweaks (pydata#3429) Hack around pydata#3440 (pydata#3442) Update Terminology page to account for multidimensional coordinates (pydata#3410) Use cftime master for upstream-dev build (pydata#3439)
* upstream/master: minor lint tweaks (pydata#3429) Hack around pydata#3440 (pydata#3442) Update Terminology page to account for multidimensional coordinates (pydata#3410) Use cftime master for upstream-dev build (pydata#3439)
* upstream/master: minor lint tweaks (pydata#3429) Hack around pydata#3440 (pydata#3442) Update Terminology page to account for multidimensional coordinates (pydata#3410) Use cftime master for upstream-dev build (pydata#3439) MAGA (Make Azure Green Again) (pydata#3436) Test that Dataset and DataArray resampling are identical (pydata#3412) Avoid multiplication DeprecationWarning in rasterio backend (pydata#3428) Sync with latest version of cftime (v1.0.4) (pydata#3430) Add cftime git tip to upstream-dev + temporarily pin cftime (pydata#3431)
As discussed in #3352, this PR modifies the Terminology page in the docs to briefly address multidimensional coordinates. Sorry for the delay in getting this in!
Also, when attempting to test the doc build, I found that the
doc/environment.yml
file was no longer present, so I updated it toci/requirements/doc.yml
.whats-new.rst
for all changes andapi.rst
for new API