Skip to content

Use freelist for range object, iterator objects and other often used objects #126703

Open
@eendebakpt

Description

@eendebakpt

Feature or enhancement

Proposal:

We can add freelists for the range object and various iter objects to improve performance. Using the new methods from #121934 the amount of code needed for adding a freelist is quite small. A prototype implementation is here:

main...eendebakpt:cpython:iter_freelists
main...eendebakpt:cpython:range_freelist

@markshannon In faster-cpython/ideas#132 (comment) you noted that freelists should be used for all types. Is adding freelists in a similar style to the already existing freelists a good approach? We could also try to make a more generic construction (for example inside PyObject_New/PyObject_Free), but that would have a much larger impact and I cannot oversee all consequences.

The freelists from the prototype improve performance of range in microbenchmarks:

range(1): Mean +- std dev: [rp_main] 50.0 ns +- 0.4 ns -> [rp_pr] 40.6 ns +- 0.5 ns: 1.23x faster
iter(range(1)): Mean +- std dev: [rp_main] 91.7 ns +- 2.4 ns -> [rp_pr] 68.5 ns +- 0.7 ns: 1.34x faster
list(range(1)): Mean +- std dev: [rp_main] 165 ns +- 1 ns -> [rp_pr] 144 ns +- 1 ns: 1.15x faster
list(iter(range(1))): Mean +- std dev: [rp_main] 240 ns +- 17 ns -> [rp_pr] 229 ns +- 13 ns: 1.05x faster
range(2, 1): Mean +- std dev: [rp_main] 61.6 ns +- 0.6 ns -> [rp_pr] 46.7 ns +- 0.4 ns: 1.32x faster
iter(range(2, 1)): Mean +- std dev: [rp_main] 100 ns +- 4 ns -> [rp_pr] 79.6 ns +- 4.0 ns: 1.26x faster
list(iter(range(2, 1))): Mean +- std dev: [rp_main] 227 ns +- 3 ns -> [rp_pr] 209 ns +- 4 ns: 1.08x faster
for loop length 1: Mean +- std dev: [rp_main] 146 ns +- 4 ns -> [rp_pr] 126 ns +- 6 ns: 1.16x faster
range(10): Mean +- std dev: [rp_main] 51.7 ns +- 2.4 ns -> [rp_pr] 41.0 ns +- 0.9 ns: 1.26x faster
iter(range(10)): Mean +- std dev: [rp_main] 92.7 ns +- 3.5 ns -> [rp_pr] 68.3 ns +- 2.2 ns: 1.36x faster
list(range(10)): Mean +- std dev: [rp_main] 205 ns +- 4 ns -> [rp_pr] 184 ns +- 9 ns: 1.11x faster
list(iter(range(10))): Mean +- std dev: [rp_main] 279 ns +- 7 ns -> [rp_pr] 261 ns +- 5 ns: 1.07x faster
range(2, 10): Mean +- std dev: [rp_main] 57.7 ns +- 1.5 ns -> [rp_pr] 48.5 ns +- 2.2 ns: 1.19x faster
iter(range(2, 10)): Mean +- std dev: [rp_main] 101 ns +- 3 ns -> [rp_pr] 78.1 ns +- 1.1 ns: 1.29x faster
list(iter(range(2, 10))): Mean +- std dev: [rp_main] 284 ns +- 17 ns -> [rp_pr] 262 ns +- 10 ns: 1.08x faster
for loop length 10: Mean +- std dev: [rp_main] 329 ns +- 7 ns -> [rp_pr] 295 ns +- 5 ns: 1.12x faster
range(100): Mean +- std dev: [rp_main] 52.3 ns +- 2.2 ns -> [rp_pr] 41.1 ns +- 1.3 ns: 1.27x faster
iter(range(100)): Mean +- std dev: [rp_main] 91.7 ns +- 3.5 ns -> [rp_pr] 69.6 ns +- 2.1 ns: 1.32x faster
list(range(100)): Mean +- std dev: [rp_main] 653 ns +- 27 ns -> [rp_pr] 603 ns +- 29 ns: 1.08x faster
list(iter(range(100))): Mean +- std dev: [rp_main] 701 ns +- 17 ns -> [rp_pr] 671 ns +- 17 ns: 1.04x faster
range(2, 100): Mean +- std dev: [rp_main] 58.5 ns +- 2.8 ns -> [rp_pr] 48.1 ns +- 2.3 ns: 1.22x faster
iter(range(2, 100)): Mean +- std dev: [rp_main] 99.6 ns +- 1.5 ns -> [rp_pr] 78.4 ns +- 0.8 ns: 1.27x faster
list(iter(range(2, 100))): Mean +- std dev: [rp_main] 710 ns +- 28 ns -> [rp_pr] 674 ns +- 16 ns: 1.05x faster
for loop length 100: Mean +- std dev: [rp_main] 2.06 us +- 0.07 us -> [rp_pr] 1.98 us +- 0.11 us: 1.04x faster
range(400): Mean +- std dev: [rp_main] 71.3 ns +- 2.2 ns -> [rp_pr] 61.7 ns +- 5.1 ns: 1.16x faster
iter(range(400)): Mean +- std dev: [rp_main] 112 ns +- 5 ns -> [rp_pr] 89.5 ns +- 2.8 ns: 1.25x faster
list(iter(range(400))): Mean +- std dev: [rp_main] 3.88 us +- 0.18 us -> [rp_pr] 3.77 us +- 0.08 us: 1.03x faster
range(2, 400): Mean +- std dev: [rp_main] 79.3 ns +- 1.7 ns -> [rp_pr] 66.0 ns +- 3.7 ns: 1.20x faster
iter(range(2, 400)): Mean +- std dev: [rp_main] 119 ns +- 2 ns -> [rp_pr] 99.5 ns +- 3.1 ns: 1.19x faster
for loop length 400: Mean +- std dev: [rp_main] 12.1 us +- 0.7 us -> [rp_pr] 10.9 us +- 0.4 us: 1.11x faster

Benchmark hidden because not significant (2): list(range(400)), list(iter(range(2, 400)))

Geometric mean: 1.16x faster
Benchmark script
import pyperf
runner = pyperf.Runner()

loop = """
def g(n):
    x=0
    for ii in range(n):
        x += 1
"""
        
for s in [1, 10, 100, 400]:
	time = runner.timeit(name=f'range({s})', stmt=f"range({s})")
	time = runner.timeit(name=f'iter(range({s}))', stmt=f"iter(range({s}))")
	time = runner.timeit(name=f'list(range({s}))', stmt=f"list(range({s}))")
	time = runner.timeit(name=f'list(iter(range({s})))', stmt=f"list(iter(range({s})))")

	time = runner.timeit(name=f'range(2, {s})', stmt=f"range(2, {s})")
	time = runner.timeit(name=f'iter(range(2, {s}))', stmt=f"iter(range(2, {s}))")
	time = runner.timeit(name=f'list(iter(range(2, {s})))', stmt=f"list(iter(range(2, {s})))")

	time = runner.timeit(name=f'for loop length {s}', stmt=f"g({s})", setup=loop)

``
</details>

On the pyperformance test suite (actually, a subset of the suite, not all benchmarks run on my system) shows the percentage of successfull freelist allocations increases about 1%.

Main:

Allocations from freelist 2,004,971,371 39.8%
Frees to freelist 2,005,350,418
Allocations 3,034,877,938 60.2%
Allocations to 512 bytes 3,008,791,812 59.7%
Allocations to 4 kbytes 18,648,072 0.4%
Allocations over 4 kbytes 7,438,054 0.1%
Frees 3,142,033,922

PR

Allocations from freelist 2,064,126,652 40.8%
Frees to freelist 2,064,499,538
Allocations 2,989,239,063 59.2%
Allocations to 512 bytes 2,963,207,194 58.6%
Allocations to 4 kbytes 18,596,157 0.4%
Allocations over 4 kbytes 7,435,712 0.1%
Frees 3,096,349,170

(Some quick testing shows that most of the allocations not via freelists are due to python int objects, so that is another candidate to use freelists for)



### Has this already been discussed elsewhere?

No response given

### Links to previous discussion of this feature:

Some references to discussions on freelists:

* https://github.com/python/cpython/pull/121934
* https://github.com/faster-cpython/ideas/discussions/132
* https://github.com/python/cpython/issues/91912
* https://github.com/python/cpython/pull/101453

<!-- gh-linked-prs -->
### Linked PRs
* gh-128368
* gh-128592
* gh-128594
* gh-128619
* gh-128692
* gh-129638
* gh-129921
* gh-132319
* gh-135233
<!-- /gh-linked-prs -->

Metadata

Metadata

Assignees

No one assigned

    Labels

    interpreter-core(Objects, Python, Grammar, and Parser dirs)performancePerformance or resource usagetype-featureA feature request or enhancement

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions