Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

__slots__ #3250

Merged
merged 13 commits into from
Aug 29, 2019
Merged

__slots__ #3250

merged 13 commits into from
Aug 29, 2019

Conversation

crusaderky
Copy link
Contributor

@crusaderky crusaderky commented Aug 23, 2019

What changes:

  • Most classes now define __slots__
  • removed _initialized property
  • Enforced checks that all subclasses must also define __slots__. For third-party subclasses, this is for now a DeprecationWarning and should be changed into a hard crash later on.
  • 22% reduction in RAM usage
  • 5% performance speedup for a DataArray method that performs a _to_temp_dataset roundtrip

DISCUSS: support for third party subclasses is very poor at the moment (#1097). Should we skip the deprecation altogether?

Performance benchmark:

import timeit
import psutil
import xarray

a = xarray.DataArray([1, 2], dims=['x'], coords={'x': [10, 20]})
RUNS = 10000
t = timeit.timeit("a.roll(x=1, roll_coords=True)", globals=globals(), number=RUNS)
print("{:.0f} us".format(t / RUNS * 1e6))

p = psutil.Process()
N = 100000
rss0 = p.memory_info().rss
x = [
    xarray.DataArray([1, 2], dims=['x'], coords={'x': [10, 20]})
    for _ in range(N)
]
rss1 = p.memory_info().rss
print("{:.0f} bytes".format((rss1 - rss0) / N))

Output:

test env master slots
DataArray.roll py35-min 332 us 360 us
DataArray.roll py37 354 us 337 us
RAM usage of a DataArray py35-min 2755 bytes 2074 bytes
RAM usage of a DataArray py37 1970 bytes 1532 bytes

The performance degradation on Python 3.5 is caused by the deprecation mechanism - see changes to common.py.

I honestly never realised that xarray objects are measured in kilobytes (vs. 32 bytes of underlying buffers!)

@crusaderky crusaderky requested a review from shoyer August 23, 2019 12:16
@max-sixty
Copy link
Collaborator

V interesting!

Is there any downside? IIRC objects don't have a __dict__, so attributes can't be set, which I don't think matters in this case at all? Is there any risk that downstream packages break?

Copy link
Member

@shoyer shoyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm glad to see you were able to get rid of _initialized and speed things up!

It might be (harmless) premature optimization to use __slots__ for every internal xarray class. But certainly doing it for class involved in constructing every xarray object (e.g., Variable/DataArray) makes sense.

xarray/core/common.py Outdated Show resolved Hide resolved
This check is only triggered in Python 3.6+.
"""
if not hasattr(object.__new__(cls), "__dict__"):
cls.__setattr__ = cls._setattr_slots
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To clarify, in the future (Python 3.6+ only, post deprecation warning) will we be able to remove assignment to cls.__setattr__ from __init_subclass__? I'd rather avoid dynamically building methods if possible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Post deprecation warning and after dropping support for py35, there will only be __setattr__, statically implemented with what is now the body of _setattr_slots. __init_subclass__ will retain only the health check.

xarray/core/indexing.py Outdated Show resolved Hide resolved
@crusaderky
Copy link
Contributor Author

@max-sixty any package that for any reason expects xarray objects to have a __dict__ will break overnight. Any package that subclasses from xarray will get a warning.

@shoyer
Copy link
Member

shoyer commented Aug 23, 2019 via email

@crusaderky
Copy link
Contributor Author

@shoyer this is theoretically done, however the design I had to come up with to preserve the current statefulness of user-defined accessors displeases me and I'd rather clean it up. See #3268 for design decision.

@crusaderky
Copy link
Contributor Author

Added automated unit tests for the automated check on __slots__ for all objects inside the xarray library. I could not find a straightforward way to automatically test that objects defined outside of the library log FutureWarning instead - I tested it by hand.

@crusaderky
Copy link
Contributor Author

crusaderky commented Aug 27, 2019

Python 3.7:

In [1]: class A(xarray.DataArray): 
   ...:     pass 
   ...:                                                                                                                                                                                                                                                        
miniconda3/envs/xarray-py37/bin/ipython:1: FutureWarning: xarray subclass A should explicitly define __slots__

In [2]: a = A()                                                                                                                                                                                                                                                

In [3]: a.x = 1                                                                                                                                                                                                                                                
miniconda3/envs/xarray-py37/bin/ipython:1: FutureWarning: Setting attribute 'x' on a 'A' object. Explicitly define __slots__ to suppress this warning for legitimate custom attributes and raise an error when attempting variables assignments.

In [4]: a.x                                                                                                                                                                                                                                                   
Out[4]: 1

In [5]: a.__dict__                                                                                                                                                                                                                                             
Out[5]: {'x': 1}

Python 3.5: Same, but without the first FutureWarning.

@crusaderky crusaderky changed the title WIP: __slots__ __slots__ Aug 27, 2019
xarray/core/dataset.py Outdated Show resolved Hide resolved
@crusaderky
Copy link
Contributor Author

Ready for merge if there are no additional comments

@shoyer shoyer merged commit 41fecd8 into pydata:master Aug 29, 2019
@shoyer
Copy link
Member

shoyer commented Aug 29, 2019

thanks @crusaderky !

@max-sixty
Copy link
Collaborator

👍 Thanks @crusaderky

@crusaderky
Copy link
Contributor Author

👍

@crusaderky crusaderky deleted the slots branch August 30, 2019 12:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants