Skip to content

Dataclasses - Improve the performance of asdict/astuple for common types and default values #103000

Closed
@DavidCEllis

Description

@DavidCEllis

Feature or enhancement

Improve the performance of asdict/astuple in common cases by making a shortcut for common types that are unaffected by deepcopy in the inner loop. Also special casing for the default dict_factory=dict to construct the dictionary directly.

The goal here is to improve performance in common cases without significantly impacting less common cases, while not changing the API or output in any way.

Pitch

In cases where a dataclass contains a lot of data of common python types (eg: bool/str/int/float) currently the inner loops for asdict and astuple require the values to be compared to check if they are dataclasses, namedtuples, lists, tuples, and then dictionaries before passing them to deepcopy. This proposes to special case and shortcut objects of types where deepcopy returns the object unchanged.

It is much faster for these cases to instead check for them at the first opportunity and shortcut their return, skipping the recursive call and all of the other comparisons. In the case where this is being used to prepare an object to serialize to JSON this can be quite significant as this covers most of the remaining types handled by the stdlib json module.

Note: Anything that skips deepcopy with this alteration is already unchanged asdeepcopy(obj) is obj is always True for these types.

Currently when constructing the dict for a dataclass, a list of tuples is created and passed to the dict_factory constructor. In the case where the dict_factory constructor is the default - dict - it is faster to construct the dictionary directly.

Previous discussion

Discussed here with a few more details and earlier examples: https://discuss.python.org/t/dataclasses-make-asdict-astuple-faster-by-skipping-deepcopy-for-objects-where-deepcopy-obj-is-obj/24662

Code Details

Types to skip deepcopy

This is the current set of types to be checked for and shortcut returned, ordered in a way that I think makes more sense for dataclasses than the original ordering copied from the copy module. These are known to be safe to skip as they are all sent to _deepcopy_atomic (which returns the original object) in the copy module.

# Types for which deepcopy(obj) is known to return obj unmodified
# Used to skip deepcopy in asdict and astuple for performance
_ATOMIC_TYPES = {
    # Common JSON Serializable types
    types.NoneType,
    bool,
    int,
    float,
    complex,
    bytes,
    str,
    # Other types that are also unaffected by deepcopy
    types.EllipsisType,
    types.NotImplementedType,
    types.CodeType,
    types.BuiltinFunctionType,
    types.FunctionType,
    type,
    range,
    property,
    # weakref.ref,  # weakref is not currently imported by dataclasses directly
}

Function changes

With that added the change is essentially replacing each instance of

_asdict_inner(v, dict_factory)

inside _asdict_inner, with

v if type(v) in _ATOMIC_TYPES else _asdict_inner(v, dict_factory)

Instances of subclasses of these types are not guaranteed to have deepcopy(obj) is obj so this checks specifically for instances of the base types.

Performance tests

Test file: https://gist.github.com/DavidCEllis/a2c2ceeeeda2d1ac509fb8877e5fb60d

Results on my development machine (not a perfectly stable test machine, but these differences are large enough).

Main

Current Main python branch:

Dataclasses asdict/astuple speed tests
--------------------------------------
Python v3.12.0alpha6
GIT branch: main
Test Iterations: 10000
List of Int case asdict: 5.80s

Test Iterations: 1000
List of Decimal case asdict: 0.65s

Test Iterations: 1000000
Basic types case asdict: 3.76s
Basic types astuple: 3.48s

Test Iterations: 100000
Opaque types asdict: 2.15s
Opaque types astuple: 2.11s

Test Iterations: 100
Mixed containers asdict: 3.66s
Mixed containers astuple: 3.28s

Modified

Modified Branch:

Dataclasses asdict/astuple speed tests
--------------------------------------
Python v3.12.0alpha6
GIT branch: faster_dataclasses_serialize
Test Iterations: 10000
List of Int case asdict: 0.53s

Test Iterations: 1000
List of Decimal case asdict: 0.68s

Test Iterations: 1000000
Basic types case asdict: 1.33s
Basic types astuple: 1.28s

Test Iterations: 100000
Opaque types asdict: 2.14s
Opaque types astuple: 2.13s

Test Iterations: 100
Mixed containers asdict: 1.99s
Mixed containers astuple: 1.84s

Linked PRs

Metadata

Metadata

Assignees

No one assigned

    Labels

    3.12only security fixesperformancePerformance or resource usagestdlibPython modules in the Lib dirtype-featureA feature request or enhancement

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions