You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@zacharyburnett@SorooshMani-NOAA as mentioned in the meeting, these are some notes for extending xarray/pandas. It's just copy paste from a different issue on a private repo, so not everything might be relevant (e.g. my suggestion in the end is probably out of context) but it should give you enough to get going
Using dem and adjust() as an example, AFAIK, there are the following options when it comes to extending the upstream API:
dem_ds.adjust() # Monkey-patch / subclassdem_ds.dem.adjust() # Register a `dem` specific accessor (i.e. different accessor per pyposeidon module)dem_ds.poseidon.adjust() # Register a `poseidon` accessor (i.e. a single accessor for all pyposeidon modules)dem_ds(adjust) # Monkey-patch `__call__()`dem_ds.pipe(adjust) # Use `.pipe()`adjust(dem_ds) # Just convert adjust to a function and be done with it
Directly subclassing/monkey-patching xarray objects should be relatively simple, but the xarray
devs generally discourage it and suggest that accessors are used instead (see next point).
classMyDem(xr.Dataset)
defadjust(self):
...
The problem with registering accessors like e.g. ds.dem.adjust(). is that the accessors are
global. Essentially, each accessor is a namespace. If we want to have different accessors for each
pyposeidon module, then we will be introducing multiple accessors. E.g.
ds.meteo.to_output()
ds.dem.adjust()
Registering a single accessor is IMHV also a problem since all the methods will be available
on all the Dataset objects. What's the point of calling meteo.poseidon.adjust() ?
Monkey-patching __call__() I just plain dislike. No one expects it.
.pipe().
could be a solution if someone is really keen on chaining function calls. E.g.
Nevertheless, it is a somewhat obscure idiom that is also available in pandas. I guess that most
people don't know about it. In practical terms it means that you convert adjust(), to_output()
etc as functions (which is not a bad idea since it will make writing tests for them somewhat
easier). When all things are considered, and since we don't chain a lot of calls, I don't really see it
as superior to a plain adjust(dem_ds)
All things being considered, I would suggest to proceed either with subclassing or with plain functions.
To be more precise, if it was my call, I would just go for adjust(dem_ds). In my experience, keeping things simple and
explicit usually gives more of a benefit in the long run. Furthermore, it makes testing easier + nothing
prevents you to expose a more Object Oriented API in the future.
The text was updated successfully, but these errors were encountered:
@zacharyburnett @SorooshMani-NOAA as mentioned in the meeting, these are some notes for extending xarray/pandas. It's just copy paste from a different issue on a private repo, so not everything might be relevant (e.g. my suggestion in the end is probably out of context) but it should give you enough to get going
Relative links:
Using
dem
andadjust()
as an example, AFAIK, there are the following options when it comes to extending the upstream API:Directly subclassing/monkey-patching
xarray
objects should be relatively simple, but the xarraydevs generally discourage it and suggest that accessors are used instead (see next point).
The problem with registering accessors like e.g.
ds.dem.adjust()
. is that the accessors areglobal. Essentially, each accessor is a namespace. If we want to have different accessors for each
pyposeidon module, then we will be introducing multiple accessors. E.g.
ds.meteo.to_output()
ds.dem.adjust()
Registering a single accessor is IMHV also a problem since all the methods will be available
on all the Dataset objects. What's the point of calling
meteo.poseidon.adjust()
?Monkey-patching
__call__()
I just plain dislike. No one expects it..pipe()
.could be a solution if someone is really keen on chaining function calls. E.g.
Another use case for pipe is if you want to dynamically decide which function to call (on runtime!). E.g.
Nevertheless, it is a somewhat obscure idiom that is also available in pandas. I guess that most
people don't know about it. In practical terms it means that you convert
adjust()
,to_output()
etc as functions (which is not a bad idea since it will make writing tests for them somewhat
easier). When all things are considered, and since we don't chain a lot of calls, I don't really see it
as superior to a plain
adjust(dem_ds)
All things being considered, I would suggest to proceed either with subclassing or with plain functions.
To be more precise, if it was my call, I would just go for
adjust(dem_ds)
. In my experience, keeping things simple andexplicit usually gives more of a benefit in the long run. Furthermore, it makes testing easier + nothing
prevents you to expose a more Object Oriented API in the future.
The text was updated successfully, but these errors were encountered: