Description
Feature or enhancement
Improve the performance of asdict/astuple in common cases by making a shortcut for common types that are unaffected by deepcopy in the inner loop. Also special casing for the default dict_factory=dict
to construct the dictionary directly.
The goal here is to improve performance in common cases without significantly impacting less common cases, while not changing the API or output in any way.
Pitch
In cases where a dataclass contains a lot of data of common python types (eg: bool/str/int/float) currently the inner loops for asdict
and astuple
require the values to be compared to check if they are dataclasses, namedtuples, lists, tuples, and then dictionaries before passing them to deepcopy
. This proposes to special case and shortcut objects of types where deepcopy
returns the object unchanged.
It is much faster for these cases to instead check for them at the first opportunity and shortcut their return, skipping the recursive call and all of the other comparisons. In the case where this is being used to prepare an object to serialize to JSON this can be quite significant as this covers most of the remaining types handled by the stdlib json
module.
Note: Anything that skips deepcopy with this alteration is already unchanged asdeepcopy(obj) is obj
is always True for these types.
Currently when constructing the dict
for a dataclass, a list of tuples is created and passed to the dict_factory
constructor. In the case where the dict_factory
constructor is the default - dict
- it is faster to construct the dictionary directly.
Previous discussion
Discussed here with a few more details and earlier examples: https://discuss.python.org/t/dataclasses-make-asdict-astuple-faster-by-skipping-deepcopy-for-objects-where-deepcopy-obj-is-obj/24662
Code Details
Types to skip deepcopy
This is the current set of types to be checked for and shortcut returned, ordered in a way that I think makes more sense for dataclasses
than the original ordering copied from the copy
module. These are known to be safe to skip as they are all sent to _deepcopy_atomic
(which returns the original object) in the copy
module.
# Types for which deepcopy(obj) is known to return obj unmodified
# Used to skip deepcopy in asdict and astuple for performance
_ATOMIC_TYPES = {
# Common JSON Serializable types
types.NoneType,
bool,
int,
float,
complex,
bytes,
str,
# Other types that are also unaffected by deepcopy
types.EllipsisType,
types.NotImplementedType,
types.CodeType,
types.BuiltinFunctionType,
types.FunctionType,
type,
range,
property,
# weakref.ref, # weakref is not currently imported by dataclasses directly
}
Function changes
With that added the change is essentially replacing each instance of
_asdict_inner(v, dict_factory)
inside _asdict_inner
, with
v if type(v) in _ATOMIC_TYPES else _asdict_inner(v, dict_factory)
Instances of subclasses of these types are not guaranteed to have deepcopy(obj) is obj
so this checks specifically for instances of the base types.
Performance tests
Test file: https://gist.github.com/DavidCEllis/a2c2ceeeeda2d1ac509fb8877e5fb60d
Results on my development machine (not a perfectly stable test machine, but these differences are large enough).
Main
Current Main python branch:
Dataclasses asdict/astuple speed tests
--------------------------------------
Python v3.12.0alpha6
GIT branch: main
Test Iterations: 10000
List of Int case asdict: 5.80s
Test Iterations: 1000
List of Decimal case asdict: 0.65s
Test Iterations: 1000000
Basic types case asdict: 3.76s
Basic types astuple: 3.48s
Test Iterations: 100000
Opaque types asdict: 2.15s
Opaque types astuple: 2.11s
Test Iterations: 100
Mixed containers asdict: 3.66s
Mixed containers astuple: 3.28s
Modified
Dataclasses asdict/astuple speed tests
--------------------------------------
Python v3.12.0alpha6
GIT branch: faster_dataclasses_serialize
Test Iterations: 10000
List of Int case asdict: 0.53s
Test Iterations: 1000
List of Decimal case asdict: 0.68s
Test Iterations: 1000000
Basic types case asdict: 1.33s
Basic types astuple: 1.28s
Test Iterations: 100000
Opaque types asdict: 2.14s
Opaque types astuple: 2.13s
Test Iterations: 100
Mixed containers asdict: 1.99s
Mixed containers astuple: 1.84s