|
1 | 1 | ---
|
2 | 2 | title: 'Xarray indexes: unleash the power of coordinates'
|
3 |
| -date: '2023-08-07' |
| 3 | +date: '2025-06-05' |
4 | 4 | authors:
|
5 | 5 | - name: Benoît Bovy
|
6 | 6 | github: benbovy
|
| 7 | + - name: Scott Henderson |
| 8 | + github: scottyhq |
7 | 9 | summary: 'It is now possible to take full advantage of coordinate data via Xarray explicit and flexible indexes'
|
8 | 10 | ---
|
9 | 11 |
|
10 |
| -_TLDR: Xarray has been through a major refactoring of its internals that makes coordinate-based data selection and alignment (almost) fully customizable, via built-in and/or 3rd party indexes. It also addresses a good amount of long-standing issues with "dimension coordinates" implicitly backed by pandas (multi-)indexes._ |
| 12 | +_TLDR: Xarray has been through a major refactoring of its internals that makes coordinate-based data selection and alignment more customizable, via built-in and/or 3rd party indexes! In this post we highlight a few examples that take advantage of this new superpower_ |
11 | 13 |
|
12 | 14 | ## Introduction
|
13 | 15 |
|
14 |
| -[link to Joe's CZI blog post] |
| 16 | +Xarray is a large project that is constantly evolving to meet needs of users and stay relevant to work with novel data formats and use-cases. One area of improvement identified in the [Development Roadmap](https://docs.xarray.dev/en/stable/roadmap.html#flexible-indexes) is the ability add new coordinate indexing capabilities beyond the original `pandas.Index`. Let's look at a few examples to understand what is now possible! |
15 | 17 |
|
16 |
| -## The concept of "dimension coordinate" and its shortcomings |
| 18 | +TODO: Insert Benoit's awesome schematic from indexing sprint :) |
17 | 19 |
|
18 |
| -Some datasets could not be loaded with Xarray (dimension name and coordinate with same name but different dimensions) |
| 20 | +## Alternatives to pandas.Index |
19 | 21 |
|
20 |
| -Complicated workarounds (swap_dims, etc.) |
| 22 | +Generally-useful index alternatives are already part of Xarray! |
21 | 23 |
|
22 |
| -Limited and/or challenging for data cubes representing arbitrary grids (curvilinear grids, unstructured meshes, etc.). |
| 24 | +### RangeIndex |
23 | 25 |
|
24 |
| -## Better index vs. coordinate separation |
| 26 | +By default a `pandas.Index` calculates all coordinates and holds them in-memory. There are many use-cases where for 1-D coordinates where it's more efficient to store the start,stop,and step and calculate specific coordinate values on-the-fly. THis is what RangeIndex accomplishes: |
25 | 27 |
|
26 |
| -Refactor index logic in `Index` classes. More easily maintainable. May help Pandas become optional dependency in the future? (cf. Xarray-lite). |
| 28 | +```python |
| 29 | +import xarray as xr |
| 30 | +from xarray.indexes import RangeIndex |
27 | 31 |
|
28 |
| -Also allowed to solve lots of issues with multi-indexes, for which each level has now its own real coordinate. |
| 32 | +index = RangeIndex.arange(0.0, 100_000, 0.1, dim='x') |
| 33 | +ds = xr.Dataset(coords=xr.Coordinates.from_xindex(index)) |
| 34 | +ds |
| 35 | +``` |
29 | 36 |
|
30 |
| -Dataset / DataArray section has now an "indexes" section. |
| 37 | +<RawHTML filePath='/posts/flexible-indexes/rangeindex-repr.html' /> |
31 | 38 |
|
32 |
| -## Selection using non-dimension, 1-d coordinates |
33 | 39 |
|
34 |
| -Set an index for non-dimension coordinates! (No more swap_dims anymore or coordinate renaming) |
| 40 | +### IntervalIndex |
35 | 41 |
|
36 |
| -```python |
37 |
| -ds.set_xindex(“non_dim_coord”).sel(non_dim_coord=“something”) |
38 |
| -``` |
39 |
| - |
40 |
| -## Alternatives to pandas.Index |
| 42 | +TODO: Not sure if this one is ready to highlight(https://github.com/pydata/xarray/pull/10296) |
41 | 43 |
|
42 |
| -E.g., Numpy index (much faster to build, much more expensive to query), Geometry index (xvec) |
43 | 44 |
|
44 |
| -Out-of-core index, etc. |
| 45 | +## Third-party custom Indexes |
45 | 46 |
|
46 |
| -...or no index at all! (Create dataset with no default index, `drop_indexes`) |
47 | 47 |
|
48 |
| -## Create custom indexes from arbitrary coordinates and dimensions |
| 48 | +### Xvec GeometryIndex |
49 | 49 |
|
50 |
| -Not limited to 1-dimensional coordinates, even more flexible! |
| 50 | +TODO: Highlight https://xvec.readthedocs.io/en/v0.2.0/generated/xvec.GeometryIndex.html |
51 | 51 |
|
52 |
| -RasterIndex, FunctionalIndex, etc. |
| 52 | +### RasterIndex |
53 | 53 |
|
54 |
| -See xarray discussion for examples |
| 54 | +TODO: Highlight https://github.com/dcherian/rasterix |
55 | 55 |
|
56 | 56 | ## What’s next
|
57 | 57 |
|
58 |
| -Still unfinished [link: indexes next steps GH issue], extension entry points, etc. |
| 58 | + While we're extremely excited about what can *already* be accomplished with the new indexing capabilities, there are plenty of exciting ideas for future work. If you're interested in getting involved we recommend following [this GitHub Issue](https://github.com/pydata/xarray/issues/6293)! |
59 | 59 |
|
60 | 60 | ## Acknowledgments
|
61 | 61 |
|
62 |
| -CZI, Xarray core developers, etc. |
| 62 | +This work would not have been possible without technical input from the Xarray core team and community! |
| 63 | +Several developers received essential funding from a [CZI Essential Open Source Software for Science (EOSS) grant](https://xarray.dev/blog/czi-eoss-grant-conclusion) as well as NASA's Open Source Tools, Frameworks, and Libraries (OSTFL) grant 80NSSC22K0345. |
0 commit comments