Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xarray.concat for datasets fails when object is a child class of Dataset #6827

Closed
4 tasks done
gabicca opened this issue Jul 26, 2022 · 6 comments
Closed
4 tasks done

Comments

@gabicca
Copy link

gabicca commented Jul 26, 2022

What happened?

The following error is thrown:

>       result = type(datasets[0])(result_vars, attrs=result_attrs)
E       TypeError: __init__() got an unexpected keyword argument 'attrs'

This is coming from the change merged as part of this PR:
https://github.com/pydata/xarray/pull/6784/files (line 593)

When "datasets" is a list of class objects which are the child of Dataset, type(datasets[0]) will return that class as the type. This class however, doesn't necessary has attrs in its own init, hence the code breaks.

I'm not convinced that this is a correct behaviour, and if you want to initialize the Dataset class, you should revert that line of change.

What did you expect to happen?

Code to run without any errors.

Minimal Complete Verifiable Example

import xarray

class MyDataset(xarray.Dataset):
    __slots__ = ()

    def __init__(self, a, b, c, d=None):
        super().__init__()

        attrs = {
            'at1': b,
            'at2': d
        }

        super().__init__(a, coords=c, attrs=attrs)


if __name__ == '__main__':
    a = {
        "x": 1,
        "y": 3
    }
    b = ["x", "y"]
    c = {
        "coord1": 0,
        "coord2": 1
    }
    ds1 = MyDataset(a, b, c)
    ds2 = MyDataset(a, b, c)
    ds3 = MyDataset(a, b, c)

    xarray.concat([ds1, ds2, ds3], dim="coord1")

MVCE confirmation

  • Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • Complete example — the example is self-contained, including all data and the text of any traceback.
  • Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • New issue — a search of GitHub Issues suggests this is not a duplicate.

Relevant log output

No response

Anything else we need to know?

No response

Environment

xarray==2022.6.0

python: 3.9.12 | packaged by conda-forge | (main, Mar 24 2022, 23:27:05)

@gabicca gabicca added bug needs triage Issue that has not been reviewed by xarray team member labels Jul 26, 2022
@rhkleijn
Copy link
Contributor

rhkleijn commented Jul 26, 2022

In the example the naming and meaning of the arguments of the __init__ method are different from the Dataset base class.

Subclassing is often more reliable when it adheres to the Liskov Substitution Principle which states that objects of a superclass should be replaceable with objects of its subclasses and in other words: a subclass should only add functionality and not reduce it.

There are some other workarounds though:

If all you need is a convenient custom constructor then you could add a classmethod factory method to your subclass or even just use a plain function which returns a Dataset.

Another option is to convert your MyDataset instances to plain Dataset instances before passing them into xarray.concat, e.g.
xarray.concat([ds.to_dataset() for ds in [ds1, ds2, ds3]], dim="coord1")

where your subclass may define as a convenience method:

def to_dataset(self):
    """Convert to plain Dataset."""
    return xarray.Dataset(self, attrs=self.attrs)

@gabicca
Copy link
Author

gabicca commented Jul 27, 2022

In the example the naming and meaning of the arguments of the __init__ method are different from the Dataset base class.

Subclassing is often more reliable when it adheres to the Liskov Substitution Principle which states that objects of a superclass should be replaceable with objects of its subclasses and in other words: a subclass should only add functionality and not reduce it.

There are some other workarounds though:

If all you need is a convenient custom constructor then you could add a classmethod factory method to your subclass or even just use a plain function which returns a Dataset.

Another option is to convert your MyDataset instances to plain Dataset instances before passing them into xarray.concat, e.g. xarray.concat([ds.to_dataset() for ds in [ds1, ds2, ds3]], dim="coord1")

where your subclass may define as a convenience method:

def to_dataset(self):
    """Convert to plain Dataset."""
    return xarray.Dataset(self, attrs=self.attrs)

Thank you for the answer. I'll look into implementing one of your suggestions (they're much better than what I had in mind).

I still don't understand the change that causes the error, though. If you want to instantiate a Dataset object, which you clearly do according to the args you pass in, why did that line have to be changed from explicitly calling Dataset? (By "you" I don't mean you personally, unless you actually made the change :) )

@dcherian dcherian added regression and removed needs triage Issue that has not been reviewed by xarray team member labels Jul 27, 2022
@Illviljan
Copy link
Contributor

Illviljan commented Jul 28, 2022

Because we want to get out the same type of Dataset (subclassed or not) as we input. That's what the T_Dataset type implies. And since we forced a Dataset at the end that wasn't the case.

@dcherian
Copy link
Contributor

Shall we close? It seems like the current code is what we want.

@gabicca
Copy link
Author

gabicca commented Aug 19, 2022

I'll leave that with you guys to decide. I've reported it, it's been looked at, and it has a record of it. So if the code is as you intended, I don't mind if the issue is closed.

@TomNicholas
Copy link
Member

@gabicca given that we do say in our docs that we don't recommend subclassing xarray objects, I'm going to close this issue as a downstream "subclass at your own risk" problem. If you would like it to be easier to subclass, we would welcome your input here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants