Skip to content
This repository was archived by the owner on Oct 24, 2024. It is now read-only.
This repository was archived by the owner on Oct 24, 2024. It is now read-only.

Add dataset methods at class definition time rather than object instantiation time? #18

Closed
@TomNicholas

Description

@TomNicholas

Currently I'm adding the xarray.Dataset methods to DataTree via a pattern basically like this:

_DATASET_API_TO_COPY = ['isel', '__add__', ...]

class DatasetAPIMixin:
    def _add_dataset_api(self):
        for method_name in _DATASET_API_TO_COPY:
            ds_method = getattr(xarray.Dataset, method_name)

            # Decorate method so that when called it acts over whole subtree
            mapped_method = map_over_subtree(ds_method)

            setattr(self, method_name, mapped_method)


class DataTree(DatasetAPIMixin):
    def __init__(self, *args):
        self._add_dataset_api()

The idea was that the use of Mixins would echo how these methods were defined on xarray.Dataset originally, and also keep a distinction between methods that are actually unique to DataTree objects (such as .groups), and methods that are merely copied over from xarray.Dataset like .isel (albeit with modifications such as mapping over child nodes).

I like my Mixin idea, but one weird thing about this pattern is that the Dataset methods are only added to the DataTree once a dt instance is instantiated, not when the DataTree class is defined. I don't know if this is likely to cause problems, but at the very least it seems inefficient, because we are running the code to loop through and attach all these methods every single time we create a new DataTree object. It's also not really an example of class inheritance right now - the mixins aren't actually doing anything other than being a different place for me to put the definition of _add_dataset_api().


What would be better would be if the dataset methods were actually added at class definition time rather than object instantiation time, and ideally fully defined on the mixin before it is inherited. Then we wouldn't need to call any _add_dataset_api() method on the dt instance because the methods would already be there.

The only way I can think of to actually to do this within the class definitions is using a metaclass.

I could also possibly set the attribute outside of the mixin definition but before the definition of DataTree like this:

class DatasetAPIMixin:
    pass


for method_name in _DATASET_API_TO_COPY:
    ds_method = getattr(xarray.Dataset, method_name)

    # Decorate method so that when called it acts over whole subtree
    mapped_method = map_over_subtree(ds_method)

    setattr(DatasetAPIMixin, method_name, mapped_method)


class DataTree(DatasetAPIMixin):
    ...

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions