Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Skip identical indexes with non-unique values in align? #956

Closed
shoyer opened this issue Aug 9, 2016 · 3 comments
Closed

Skip identical indexes with non-unique values in align? #956

shoyer opened this issue Aug 9, 2016 · 3 comments
Labels

Comments

@shoyer
Copy link
Member

shoyer commented Aug 9, 2016

Currently, when objects with with non-unique (duplicated) values in one of their indexes are passed to align, an error surfaces from pandas:
InvalidIndexError: Reindexing only valid with uniquely valued Index objects

We could certainly give a more informative error here (see this complaint on StackOverflow), but a bigger issue is that this probably isn't strictly necessary. Instead, we could skip indexes for alignment if they are already equal. This is slightly less principled (a non-unique index may indicate something has gone wrong), but certainly more convenient and more inline with how pandas works (e.g., it even allows arithmetic between objects with non-unique indexes, which I believe does not work currently in xarray).

Currently, we do this as a special case when merging arrays and exactly one has labels (see _align_for_merge in #950). But we could probably do this in general, either by default or with a flag to enable it (or turn it off). This would then propagate to every xarray operation that uses align under the covers.

@max-sixty
Copy link
Collaborator

Another option is to fully disallow non-unique indexes. Not sure how big a use case this is, so this might be a non-starter.

But two wondrous features of xarray:

  • Smaller & more defined surface than pandas, so not forced to have all these work arounds
  • Coords that aren't dimensions, so labels are possible in place of indexes

@shoyer
Copy link
Member Author

shoyer commented Aug 10, 2016

I think it's important to preserve the ability to work with non-unique indexes insofar as it is necessary for cleaning your data. That's why we support non-unique indexes in merge, which is internally used anytime you assign to a Dataset. I'm less sure about the use cases for arithmetic and the like.

@shoyer
Copy link
Member Author

shoyer commented Aug 17, 2016

I just checked on xarray v0.7.2 and the example from the StackOverflow post worked. I think we should treat this as a regression -- we were already doing this prior to v0.8.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants