Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extending xarray/pandas using accessors #22

Open
pmav99 opened this issue Mar 16, 2022 · 1 comment
Open

Extending xarray/pandas using accessors #22

pmav99 opened this issue Mar 16, 2022 · 1 comment
Labels
enhancement New feature or request

Comments

@pmav99
Copy link
Member

pmav99 commented Mar 16, 2022

@zacharyburnett @SorooshMani-NOAA as mentioned in the meeting, these are some notes for extending xarray/pandas. It's just copy paste from a different issue on a private repo, so not everything might be relevant (e.g. my suggestion in the end is probably out of context) but it should give you enough to get going

Relative links:


Using dem and adjust() as an example, AFAIK, there are the following options when it comes to extending the upstream API:

dem_ds.adjust()             # Monkey-patch / subclass
dem_ds.dem.adjust()         # Register a `dem` specific accessor (i.e. different accessor per pyposeidon module)
dem_ds.poseidon.adjust()    # Register a `poseidon` accessor (i.e. a single accessor for all pyposeidon modules)
dem_ds(adjust)              # Monkey-patch `__call__()`
dem_ds.pipe(adjust)         # Use `.pipe()`
adjust(dem_ds)              # Just convert adjust to a function and be done with it

Directly subclassing/monkey-patching xarray objects should be relatively simple, but the xarray
devs generally discourage it and suggest that accessors are used instead (see next point).

class MyDem(xr.Dataset)
    def adjust(self):
        ...

The problem with registering accessors like e.g. ds.dem.adjust(). is that the accessors are
global. Essentially, each accessor is a namespace. If we want to have different accessors for each
pyposeidon module, then we will be introducing multiple accessors. E.g.

  • ds.meteo.to_output()
  • ds.dem.adjust()

Registering a single accessor is IMHV also a problem since all the methods will be available
on all the Dataset objects. What's the point of calling meteo.poseidon.adjust() ?

Monkey-patching __call__() I just plain dislike. No one expects it.

.pipe().
could be a solution if someone is really keen on chaining function calls. E.g.

ds.pipe(func1, arg1, arg2).pipe(func, kwarg1=1, kwarg2=2)

Another use case for pipe is if you want to dynamically decide which function to call (on runtime!). E.g.

def process(ds, func, *args, **kwargs):
    return ds.pipe(func, *args, **kwargs)

Nevertheless, it is a somewhat obscure idiom that is also available in pandas. I guess that most
people don't know about it. In practical terms it means that you convert adjust(), to_output()
etc as functions (which is not a bad idea since it will make writing tests for them somewhat
easier). When all things are considered, and since we don't chain a lot of calls, I don't really see it
as superior to a plain adjust(dem_ds)


All things being considered, I would suggest to proceed either with subclassing or with plain functions.

To be more precise, if it was my call, I would just go for adjust(dem_ds). In my experience, keeping things simple and
explicit usually gives more of a benefit in the long run. Furthermore, it makes testing easier + nothing
prevents you to expose a more Object Oriented API in the future.

@pmav99 pmav99 closed this as completed Mar 16, 2022
@ghost
Copy link

ghost commented Mar 16, 2022

Thank you, this is very informative!

@ghost ghost reopened this Mar 31, 2022
@ghost ghost added the enhancement New feature or request label Mar 31, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant