Skip to content

Improve performance of dataclasses.asdict by caching field names #138232

@eendebakpt

Description

@eendebakpt

Feature or enhancement

Proposal:

About 8 years ago @ericvsmith asked:

# Might it be worth caching this, per class?

By caching the field names of a dataclass on the class the performance improves:

asdict: Mean +- std dev: [main] 2.33 us +- 0.14 us -> [pr] 1.65 us +- 0.08 us: 1.41x faster
astuple: Mean +- std dev: [main] 2.77 us +- 0.15 us -> [pr] 2.15 us +- 0.09 us: 1.29x faster
f.__getstate__(): Mean +- std dev: [main] 941 ns +- 64 ns -> [pr] 360 ns +- 18 ns: 2.61x faster

Benchmark hidden because not significant (1): instance creation

Geometric mean: 1.47x faster
Test script

(executed on non-pgo build)

import pyperf

setup = """
from dataclasses import dataclass, asdict, astuple
from pickle import dumps

@dataclass
class Simple:
     i : int
     s : str
     l : list

s = Simple(10, 'hi', [3, 1, 4, 1])

@dataclass(frozen=True, slots=True)
class Frozen:
     i : int
     s : str
     l : list

f = Frozen(10, 'hi', [3, 1, 4, 1])
f.__getstate__()

"""

runner = pyperf.Runner()
runner.timeit(name="instance creation", stmt="Simple(10, 'hi', [3, 1, 4, 1])", setup=setup)
runner.timeit(name="asdict", stmt="asdict(s)", setup=setup)
runner.timeit(name="astuple", stmt="astuple(s)", setup=setup)
runner.timeit(name="f.__getstate__()", stmt="f.__getstate__()", setup=setup)

The main downside of caching the field names is that (per dataclass, not per instance) we have an additional private field on the class with a list of strings.

Has this already been discussed elsewhere?

No response given

Links to previous discussion of this feature:

No response

Linked PRs

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions