Skip to content

Support static type analysis  #3967

@eric-czech

Description

@eric-czech

As a related discussion to #3959, I wanted to see what possibilities exist for a user or API developer building on Xarray to enforce Dataset/DataArray structure through static analysis.

In my specific scenario, I would like to model several different types of data in my domain as Dataset objects, but I'd like to be able to enforce that names and dtypes associated with both data variables and coordinates meet certain constraints.

@keewis mentioned an example of this in #3959 (comment) where it might be possible to use something like a TypedDict to constrain variable/coord names and array dtypes, but this won't work with TypedDict as it's currently implemented. Another possibility could be generics, and I took a stab at that in #3959 (comment) (though this would certainly be more intrusive).

An example of where this would be useful is in adding extensions through accessors:

@xr.register_dataset_accessor('ext')
def ExtAccessor:
    def __init__(self, ds)
        self.data = ds
    
    def is_zero(self):
        return self.ds['data'] == 0

ds = xr.Dataset(dict(DATA=xr.DataArray([0.0])))
# I'd like to catch that "data" was misspelled as "DATA" and that 
# this particular method shouldn't be run against floats prior to runtime
ds.ext.is_zero() 

I probably care more about this as someone looking to build an API on top of Xarray, but I imagine typical users would find a solution to this problem beneficial too.

There is a related conversation on doing something like this for Pandas DataFrames at python/typing#28 (comment), so that might be helpful context for possibilities with TypeDict.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions