Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

allow incomplete hypercubes in combine_by_coords #3649

Merged
merged 6 commits into from
Dec 24, 2019

Conversation

bolliger32
Copy link
Contributor

Copy link
Member

@TomNicholas TomNicholas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Firstly, thanks for the super clear issue and neat PR @bolliger32 !

I very much like this, my only hesitation is whether or not this should be default behaviour? Imagine if someone passes a bunch of datasets that they think form a hypercube but don't, then with the PR they would get returned a dataset with a panel of NaNs that they would not be explicitly told are there...
On the other hand xarray's other combining functions (e.g. merge and concat) will just insert NaNs.

Perhaps there should be an option to keep the stricter behaviour by passing fill_value=None?

@bolliger32
Copy link
Contributor Author

bolliger32 commented Dec 20, 2019

yeah - that makes sense! So if fill_value is None then we can have the additional check to make sure it's a complete hypercube. The next question, like you mention, is whether we change the default fill_value to None. It seems like that would make the most sense to me. That way, the default is to throw an error with incomplete hypercubes (the previous behavior), and the incompleteness is only allowed if you provide a value to fill with. But I can see your point about having the default be dtypes.NA for consistency with merge and concat. Thoughts?

@dcherian
Copy link
Contributor

Imagine if someone passes a bunch of datasets that they think form a hypercube but don't, then with the PR they would get returned a dataset with a panel of NaNs that they would not be explicitly told are there... On the other hand xarray's other combining functions (e.g. merge and concat) will just insert NaNs.

Yeah this "automatic alignment" behaviour is everywhere so I think it makes sense for it to be the default.

@TomNicholas
Copy link
Member

Yeah this "automatic alignment" behaviour is everywhere so I think it makes sense for it to be the default.

Okay, in that case shall we add an option to opt-in to the stricter check by passing fill_value=None? With a quick note about it in the docstring for that argument?

@bolliger32
Copy link
Contributor Author

Will do!

@bolliger32
Copy link
Contributor Author

Done

Copy link
Member

@TomNicholas TomNicholas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great, thanks!

Copy link
Collaborator

@keewis keewis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks good to me, too.

@TomNicholas TomNicholas merged commit 651f27f into pydata:master Dec 24, 2019
dcherian added a commit to dcherian/xarray that referenced this pull request Dec 30, 2019
* upstream/master:
  added pyinterp to related projects (pydata#3655)
  Allow incomplete hypercubes in combine_by_coords (pydata#3649)
  concat keeps attrs from first variable. (pydata#3637)
  Extend DatetimeAccessor properties and support `.dt` accessor for Timedelta (pydata#3612)
  update readthedocs.yml (pydata#3639)
  silence sphinx warnings round 3 (pydata#3602)
  Fix/quantile wrong errmsg (pydata#3635)
  Provide shape info in shape mismatch error. (pydata#3619)
  Minor doc fixes (pydata#3615)
  Respect user-specified coordinates attribute. (pydata#3487)
  Add Facetgrid.row_labels & Facetgrid.col_labels (pydata#3597)
  Fix pint integration tests (pydata#3600)
  Minor fix to combine_by_coords to allow for the combination of CFTimeIndexes separated by large time intervals (pydata#3543)
dcherian added a commit to dcherian/xarray that referenced this pull request Dec 30, 2019
* upstream/master:
  Add nanmedian for dask arrays (pydata#3604)
  added pyinterp to related projects (pydata#3655)
  Allow incomplete hypercubes in combine_by_coords (pydata#3649)
  concat keeps attrs from first variable. (pydata#3637)
dcherian added a commit to dcherian/xarray that referenced this pull request Dec 30, 2019
…oken

* 'master' of github.com:pydata/xarray:
  Add nanmedian for dask arrays (pydata#3604)
  added pyinterp to related projects (pydata#3655)
  Allow incomplete hypercubes in combine_by_coords (pydata#3649)
  concat keeps attrs from first variable. (pydata#3637)
  Extend DatetimeAccessor properties and support `.dt` accessor for Timedelta (pydata#3612)
  update readthedocs.yml (pydata#3639)
  silence sphinx warnings round 3 (pydata#3602)
  Fix/quantile wrong errmsg (pydata#3635)
  Provide shape info in shape mismatch error. (pydata#3619)
  Minor doc fixes (pydata#3615)
  Respect user-specified coordinates attribute. (pydata#3487)
  Add Facetgrid.row_labels & Facetgrid.col_labels (pydata#3597)
  Fix pint integration tests (pydata#3600)
  Minor fix to combine_by_coords to allow for the combination of CFTimeIndexes separated by large time intervals (pydata#3543)
dcherian added a commit to dcherian/xarray that referenced this pull request Dec 30, 2019
…equiv

* 'master' of github.com:pydata/xarray: (28 commits)
  Add nanmedian for dask arrays (pydata#3604)
  added pyinterp to related projects (pydata#3655)
  Allow incomplete hypercubes in combine_by_coords (pydata#3649)
  concat keeps attrs from first variable. (pydata#3637)
  Extend DatetimeAccessor properties and support `.dt` accessor for Timedelta (pydata#3612)
  update readthedocs.yml (pydata#3639)
  silence sphinx warnings round 3 (pydata#3602)
  Fix/quantile wrong errmsg (pydata#3635)
  Provide shape info in shape mismatch error. (pydata#3619)
  Minor doc fixes (pydata#3615)
  Respect user-specified coordinates attribute. (pydata#3487)
  Add Facetgrid.row_labels & Facetgrid.col_labels (pydata#3597)
  Fix pint integration tests (pydata#3600)
  Minor fix to combine_by_coords to allow for the combination of CFTimeIndexes separated by large time intervals (pydata#3543)
  Fix map_blocks HLG layering (pydata#3598)
  Silence sphinx warnings: Round 2 (pydata#3592)
  2x~5x speed up for isel() in most cases (pydata#3533)
  remove xarray again (pydata#3591)
  fix plotting with transposed nondim coords. (pydata#3441)
  make coarsen reductions consistent with reductions on other classes (pydata#3500)
  ...
dcherian added a commit to dcherian/xarray that referenced this pull request Dec 30, 2019
* upstream/master:
  Add nanmedian for dask arrays (pydata#3604)
  added pyinterp to related projects (pydata#3655)
  Allow incomplete hypercubes in combine_by_coords (pydata#3649)
  concat keeps attrs from first variable. (pydata#3637)
  Extend DatetimeAccessor properties and support `.dt` accessor for Timedelta (pydata#3612)
  update readthedocs.yml (pydata#3639)
  silence sphinx warnings round 3 (pydata#3602)
  Fix/quantile wrong errmsg (pydata#3635)
  Provide shape info in shape mismatch error. (pydata#3619)
  Minor doc fixes (pydata#3615)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

combine_by_coords should allow for missing panels in hypercube
4 participants