Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API for N-dimensional combine #2616

Merged
merged 111 commits into from
Jun 25, 2019
Merged

API for N-dimensional combine #2616

merged 111 commits into from
Jun 25, 2019

Conversation

TomNicholas
Copy link
Member

@TomNicholas TomNicholas commented Dec 17, 2018

Continues the discussion from #2553 about how the API for loading and combining data from multiple datasets should work. (Ultimately part of the solution to #2159)

@shoyer this is for you to see how I envisaged the API would look, based on our discussion in #2553. For now you can ignore all the changes except the ones to the docstrings of auto_combine here, manual_combine here and open_mfdataset here.

Feedback from anyone else is also encouraged, as really the point of this is to make the API as clear as possible to someone who hasn't delved into the code behind auto_combine and open_mfdataset.

It makes sense to first work out the API, then change the internal implementation to match, using the internal functions developed in #2553. Therefore the tasks include:

  • Decide on API for 'auto_combine' and 'open_mfdataset'
  • Appropriate documentation
  • Write internal implementation of manual_combine
  • Write internal implementation of auto-combine
  • Update open_mfdataset to match
  • Write and reorganise tests
  • Automatically ordering of string and datetime coords
  • What's new explaining changes
  • Make sure auto_combine and manual_combine appear on the API page of the docs
  • PEP8 compliance
  • Python 3.5 compatibility
  • AirSpeedVelocity tests for auto_combine
  • Finish all TODOs
  • Backwards-compatible API to start deprecation cycle
  • Add examples from docstrings to main documentation pages

@TomNicholas
Copy link
Member Author

Thanks for the detailed review @shoyer !

I'm loving all the documentation and test coverage here (though I haven't read them all carefully yet!).

Great - the only test I think you should definitely look at is test_manual_concat_too_many_dims_at_once, and my diagnosis of the issue it exposes in #2975.

xarray/core/combine.py Outdated Show resolved Hide resolved
return False
else:
# There is a different problem
raise err
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could raise a default deprecation warning, then if no errors are raised while trying to get through _dimension_coords_exist() raise another warning with more info? Then the user will always get a deprecation warning regardless of what error is thrown?

This sounds like a good solution to me.

Okay, but what other way is there of doing this? We have to distinguish between these errors if we want to be able to check what type of input the user gave and still carry on going.

The alternative is either (1) making dedicated error classes or (2) refactoring into lower level functions with more specific functionality. That said, it's probably not worth bothering with either of these here.

issue some sort of "default" deprecation warning if something unexpected happens

I could raise a default deprecation warning, then if no errors are raised while trying to get through _dimension_coords_exist() raise another warning with more info? Then the user will always get a deprecation warning regardless of what error is thrown?

matching strings in error messages, which can be fragile

Okay, but what other way is there of doing this? We have to distinguish between these errors if we want to be able to check what type of input the user gave and still carry on going.

If ValueError is raised for another reason, the user could get a surprising traceback.

If ValueError is raised for another reason then the user will see that because it will be caught and raised, no?

Do you mean that I should raise the particular error caught by _infer_concat_order_from_coords as a second warning?

I think it's better just to silence the error.

xarray/core/combine.py Outdated Show resolved Hide resolved
xarray/tests/test_concat.py Outdated Show resolved Hide resolved
@shoyer
Copy link
Member

shoyer commented Jun 18, 2019

I left a couple of minor comments, but I think this is pretty much ready to go! We should merge this and let users give it a try :).

@TomNicholas
Copy link
Member Author

Okay I've changed those things. @shoyer if you want to just double check that the new error catching behaviour in dimension_coords_exist() is what you wanted, then this should be good to go!

@shoyer shoyer merged commit 6b33ad8 into pydata:master Jun 25, 2019
@shoyer
Copy link
Member

shoyer commented Jun 25, 2019

OK, in it goes! Thanks @TomNicholas for your perseverance here

@max-sixty
Copy link
Collaborator

Thanks @TomNicholas !

@TomNicholas
Copy link
Member Author

🎉 Thanks for your help!

@shoyer
Copy link
Member

shoyer commented Jun 25, 2019

I started thinking about names for these functions again....

I don't love the non-descriptive nature of combine_auto and combine_manual. All "auto" and "manual" tells me is that one is more automatic than the other, without saying how it's more automatic. This is probably even more confusing because we used "auto" previously to mean some hybrid behavior between what we now call "manual" and "auto".

What about calling these combine_by_coords and combine_nested instead? On open_mfdataset, these would be combine='by_coords' and combine='nested'.

@TomNicholas
Copy link
Member Author

@shoyer I like that suggestion a lot - not sure why we didn't think of something similar before.

How does it work if I push to a merged PR?

@shoyer
Copy link
Member

shoyer commented Jun 25, 2019

Pushing to a merged PR definitely doesn't work :). You should start a new branch after syncing master.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants