-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor index vs. coordinate variable(s) #5636
Conversation
- Pass Variable objects to xarray.Index constructor - The index should create IndexVariable objects (`coords` attribute) - PandasIndex: IndexVariable wraps PandasIndexingAdpater wraps pd.Index
Revert some changes made in pydata#5102 + additional (temporary) fixes.
Hello @benbovy! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:
|
This is now ready for review! The new features (e.g., When cleaning up things I noticed that, e.g., for a dataset There are also some places where the indexes are treated as arrays and that probably need better refactoring than the dirty fixes made here. This is the case for |
Dataset constructor doesn't accept xarray indexes yet. Create new coordinates from the underlying pandas indexes.
During the last meeting on flexible indexes #5452 we discussed (and agreed) about the general approach implemented here for Thanks to everyone who attended to the meeting, and thanks @shoyer and @dcherian for the review/help here. If there's no objection I'm going to merge this PR tomorrow so that I can move forward with the refactoring. |
* main: (31 commits) Refactor index vs. coordinate variable(s) (pydata#5636) pre-commit: autoupdate hook versions (pydata#5685) Flexible Indexes: Avoid len(index) in map_blocks (pydata#5670) Speed up _mapping_repr (pydata#5661) update the link to `scipy`'s intersphinx file (pydata#5665) Bump styfle/cancel-workflow-action from 0.9.0 to 0.9.1 (pydata#5663) pre-commit: autoupdate hook versions (pydata#5660) fix the binder environment (pydata#5650) Update api.rst (pydata#5639) Kwargs to rasterio open (pydata#5609) Bump codecov/codecov-action from 1 to 2.0.2 (pydata#5633) new blank whats-new for v0.19.1 v0.19.0 release notes (pydata#5632) remove deprecations scheduled for 0.19 (pydata#5630) Make typing-extensions optional (pydata#5624) Plots get labels from pint arrays (pydata#5561) Add to_numpy() and as_numpy() methods (pydata#5568) pin fsspec (pydata#5627) pre-commit: autoupdate hook versions (pydata#5617) Add dataarray scatter with 3d support (pydata#4909) ...
* upstream/main: (31 commits) Refactor index vs. coordinate variable(s) (pydata#5636) pre-commit: autoupdate hook versions (pydata#5685) Flexible Indexes: Avoid len(index) in map_blocks (pydata#5670) Speed up _mapping_repr (pydata#5661) update the link to `scipy`'s intersphinx file (pydata#5665) Bump styfle/cancel-workflow-action from 0.9.0 to 0.9.1 (pydata#5663) pre-commit: autoupdate hook versions (pydata#5660) fix the binder environment (pydata#5650) Update api.rst (pydata#5639) Kwargs to rasterio open (pydata#5609) Bump codecov/codecov-action from 1 to 2.0.2 (pydata#5633) new blank whats-new for v0.19.1 v0.19.0 release notes (pydata#5632) remove deprecations scheduled for 0.19 (pydata#5630) Make typing-extensions optional (pydata#5624) Plots get labels from pint arrays (pydata#5561) Add to_numpy() and as_numpy() methods (pydata#5568) pin fsspec (pydata#5627) pre-commit: autoupdate hook versions (pydata#5617) Add dataarray scatter with 3d support (pydata#4909) ...
* upstream/main: (34 commits) Use same bool validator as other inputs (pydata#5703) conditionally disable bottleneck (pydata#5560) Refactor index vs. coordinate variable(s) (pydata#5636) pre-commit: autoupdate hook versions (pydata#5685) Flexible Indexes: Avoid len(index) in map_blocks (pydata#5670) Speed up _mapping_repr (pydata#5661) update the link to `scipy`'s intersphinx file (pydata#5665) Bump styfle/cancel-workflow-action from 0.9.0 to 0.9.1 (pydata#5663) pre-commit: autoupdate hook versions (pydata#5660) fix the binder environment (pydata#5650) Update api.rst (pydata#5639) Kwargs to rasterio open (pydata#5609) Bump codecov/codecov-action from 1 to 2.0.2 (pydata#5633) new blank whats-new for v0.19.1 v0.19.0 release notes (pydata#5632) remove deprecations scheduled for 0.19 (pydata#5630) Make typing-extensions optional (pydata#5624) Plots get labels from pint arrays (pydata#5561) Add to_numpy() and as_numpy() methods (pydata#5568) pin fsspec (pydata#5627) ...
* upstream/main: (307 commits) Use same bool validator as other inputs (pydata#5703) conditionally disable bottleneck (pydata#5560) Refactor index vs. coordinate variable(s) (pydata#5636) pre-commit: autoupdate hook versions (pydata#5685) Flexible Indexes: Avoid len(index) in map_blocks (pydata#5670) Speed up _mapping_repr (pydata#5661) update the link to `scipy`'s intersphinx file (pydata#5665) Bump styfle/cancel-workflow-action from 0.9.0 to 0.9.1 (pydata#5663) pre-commit: autoupdate hook versions (pydata#5660) fix the binder environment (pydata#5650) Update api.rst (pydata#5639) Kwargs to rasterio open (pydata#5609) Bump codecov/codecov-action from 1 to 2.0.2 (pydata#5633) new blank whats-new for v0.19.1 v0.19.0 release notes (pydata#5632) remove deprecations scheduled for 0.19 (pydata#5630) Make typing-extensions optional (pydata#5624) Plots get labels from pint arrays (pydata#5561) Add to_numpy() and as_numpy() methods (pydata#5568) pin fsspec (pydata#5627) ...
pre-commit run --all-files
whats-new.rst
This implements option 3 (sort of) described in #5553 (comment):
xarray.Index
into anxarray.Variable
and keep those two concepts distinct from each other.xarray.Index.from_variables
class constructor accepts a dictionary ofxarray.Variable
objects as argument and may (or should?) also return correspondingxarray.IndexVariable
objects to ensure immutability.PandasIndex
, the new returnedxarray.IndexVariable
wraps the underlyingpd.Index
via aPandasIndexingAdapter
(this reverts some changes made in Flexible indexes: add Index base class and xindexes properties #5102).PandasMultiIndex
, this PR addsPandasMultiIndexingAdapter
so that we can wrap the pandas multi-index in separate coordinate variables objects: one for the dimension + one for each level. The level coordinates data internally hold a reference to the dimension coordinate data to avoid indexing the same underlyingpd.MultiIndex
for each of those coordinates (PandasMultiIndexingAdapter.__getitem__
is memoized for that purpose).This is very much work in progress, I need to update (or revert) all related parts of Xarray's internals, update tests, etc. At this stage any comment on the approach described above is welcome.