-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a perception of a __xarray__ magic method #8413
Comments
Thanks for opening your first issue here at xarray! Be sure to follow the issue template! |
Thanks for the issue @swamidass ! There is precedence for this — for example in Rust there are One thing this could interface with is subtyping — if a I agree that it could start as another library — or just in your own code initially — I don't think there's actually much need for this to be in xarray at the start. Probably the helpful thing here is to get feedback from others, and then coalesce on a standard over time. |
The slice idea is a good one too. Yup, asking for feedback. Any idea if and when Datatree is gonna get rolled in? |
Thanks for the interesting suggestion @swamidass! I might be missing something, but what's the advantage of doing this over the other class just implementing a I'm also wondering if there any precedence for this pattern in Pandas? That might be useful to know as a similar prior example. I guess this suggestion is somewhat similar to pandas using apache arrow...?
When I get around to it / when I get some help 😅 If that's something you're interested in then that would be amazing. It honestly shouldn't actually be particularly hard, mostly just copy-pasting and making sure its all up to xarray code review standards. However this point it may make sense to wait for the |
Is your feature request related to a problem?
I am often moving data from external objects (of all sorts!) into xarray. This is a common use case
Much of this code would be greatly simplified if there was a way of giving non-xarray classes a way of declaring to xarray how these objects can be marshaled into
Describe the solution you'd like
So here is an initial proposal for comment. Much of this could be implemented in a third party library. But doing this in xarray itself would likely be best.
Magic Methods
It would be great to see these magic method signatures become integrated throughout the library:
Conversion Registry
And these extension functions to register converters:
Registering a converter if if cls implements a corresponding xarray*_ method or another converter already registered for cls. Perhaps add an argument that specifies if the converter should or should not be added if their is a clash. Perhaps these functions return the replaced converter so it can be added back in if needed?
Ideally, also, "deregister" versions (.e.g deregister would also be available. So context managers that change marshaling behavior could easily be constructed.
User API
Along with the following new user API functions:
"as_xarray" returns (in order of precedence:
The rationale for putting the registered functions first is that this would enable
"as_dataarrray" would be slimilar, but it would only call x.xarray_dataarray and well known aliases.
"as_dataset" would be slimilar, but it would only call x.xarray_dataset, well known aliases, and perhaps falling back to calling x.xarray_dataarray and converting the return a dataset if it has a name attribute.
"as_datatree" would be slimilar, but it would only call x.xarray_datatree, and perhaps falling back to calling x.xarray_dataarray and wrapping it in a single node datatree. (Though of course at this point this method would probably be implemented by the DataTree package, not xarray)
The design decisions are flexible from my point of view, and might be decided in a way that makes the code base simplest or most usable. There is also a question of whether or not this method should default the backup methods. These decisions also can be deferred entirely by delegating to the converter registry.
Across the Xarray Library
Finally, across the xarray library, there may be places where passing input arguments through as_xarray, as_dataarray, or as_dataset would make a lot of sense. This could be the final thing to do, but cannot be handled by a third party library.
Doing this would give give another pathway for third party libraries to integrate with xarray, with a far easier way than the converter registry or explicit calls to as_* functions.
Describe alternatives you've considered
This can be done with a private library. But it seems to a lot of code that is pretty useful to other use cases.
Most of this (but not all) can accomplished in a 3rd party library, but it wouldn't allow the seamless sort of integration with (for example) xarray use of repr_html to integrate with pandas.
The existing backend hooks work great when we are marshaling from file-based sources. See, for example, tiffslide-xarray (https://github.com/swamidasslab/tiffslide-xarray). This approach is seemless for reading files, but cannot marshal objects. For example, this is possible:
But this doesn't work.
This is an important use case because there are cases where we want to create an xarray like this from objects that are never stored on the filesystem.
Additional context
No response
The text was updated successfully, but these errors were encountered: