Description
If anybody decorates a stateful class with @register_dataarray_accessor
or @register_dataset_accessor
, the instance will lose its state on any method that invokes _to_temp_dataset
, as well as on a shallow copy.
In [1]: @xarray.register_dataarray_accessor('foo')
...: class Foo:
...: def __init__(self, obj):
...: self.obj = obj
...: self.x = 1
...:
...:
In [2]: a = xarray.DataArray()
In [3]: a.foo.x
Out[3]: 1
In [4]: a.foo.x = 2
In [5]: a.foo.x
Out[5]: 2
In [6]: a.roll().foo.x
Out[6]: 1
In [7]: a.copy(deep=False).foo.x
Out[7]: 1
While in the case of _to_temp_dataset
it could be possible to spend (substantial) effort to retain the state, on the case of copy() it's impossible without modifying the accessor duck API, as one would need to tamper with the accessor instance in place and modify the pointer back to the DataArray/Dataset.
This issue is so glaring that it makes me strongly suspect that nobody saves any state in accessor classes. This kind of use would also be problematic in practical terms, as the accessor object would have a hard time realising when its own state is no longer coherent with the referenced DataArray/Dataset.
This design also carries the problem that it introduces a circular reference in the DataArray/Dataset. This means that, after someone invokes an accessor method on his DataArray/Dataset, then the whole object - including the numpy buffers! - won't be instantly collected when it's dereferenced by the user, and it will have to instead wait for the next gc
pass. This could cause huge increases in RAM usage overnight in a user application, which would be very hard to logically link to a change that just added a custom method.
Finally, with #3250, this statefulness forces us to increase the RAM usage of all datasets and dataarrays by an extra slot, for all users, even if this feature is quite niche.
Proposed solution
Get rid of accessor caching altogether, and just recreate the accessor object from scratch every time it is invoked. In the documentation, clarify that the __init__
method should not perform anything computationally intensive.