-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Description
As a related discussion to #3959, I wanted to see what possibilities exist for a user or API developer building on Xarray to enforce Dataset/DataArray structure through static analysis.
In my specific scenario, I would like to model several different types of data in my domain as Dataset objects, but I'd like to be able to enforce that names and dtypes associated with both data variables and coordinates meet certain constraints.
@keewis mentioned an example of this in #3959 (comment) where it might be possible to use something like a TypedDict
to constrain variable/coord names and array dtypes, but this won't work with TypedDict as it's currently implemented. Another possibility could be generics, and I took a stab at that in #3959 (comment) (though this would certainly be more intrusive).
An example of where this would be useful is in adding extensions through accessors:
@xr.register_dataset_accessor('ext')
def ExtAccessor:
def __init__(self, ds)
self.data = ds
def is_zero(self):
return self.ds['data'] == 0
ds = xr.Dataset(dict(DATA=xr.DataArray([0.0])))
# I'd like to catch that "data" was misspelled as "DATA" and that
# this particular method shouldn't be run against floats prior to runtime
ds.ext.is_zero()
I probably care more about this as someone looking to build an API on top of Xarray, but I imagine typical users would find a solution to this problem beneficial too.
There is a related conversation on doing something like this for Pandas DataFrames at python/typing#28 (comment), so that might be helpful context for possibilities with TypeDict
.