Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpo-47012: speed up iteration of bytes and bytearray #31867

Merged
merged 7 commits into from
Mar 23, 2022

Conversation

kumaraditya303
Copy link
Contributor

@kumaraditya303 kumaraditya303 commented Mar 14, 2022

Benchmark:

from pyperf import Runner, perf_counter

def bench_bytes(loops, length):
    src = b'helloworld' * length
    t0 = perf_counter()
    for _ in range(loops):
        for i in src:
            pass
    return perf_counter() - t0

def bench_bytearray(loops, length):
    src = bytearray(b'hello' * length)
    t0 = perf_counter()
    for _ in range(loops):
        for i in src:
            pass
    return perf_counter() - t0

runner = Runner()
for n in [10_000, 100_000]:
    runner.bench_time_func(f"bytes {n}", bench_bytes, n)
    runner.bench_time_func(f"bytearray {n}", bench_bytearray, n)

Results:

bytes 10000: Mean +- std dev: [base] 829 us +- 38 us -> [patch] 677 us +- 44 us: 1.23x faster
bytearray 10000: Mean +- std dev: [base] 523 us +- 34 us -> [patch] 360 us +- 19 us: 1.45x faster
bytes 100000: Mean +- std dev: [base] 8.33 ms +- 0.38 ms -> [patch] 6.89 ms +- 0.75 ms: 1.21x faster
bytearray 100000: Mean +- std dev: [base] 5.19 ms +- 0.23 ms -> [patch] 3.61 ms +- 0.23 ms: 1.44x faster

Geometric mean: 1.33x faster

https://bugs.python.org/issue47012

@kumaraditya303 kumaraditya303 changed the title speed up iteration of bytes and bytearray bpo-47012: speed up iteration of bytes and bytearray Mar 14, 2022
@kumaraditya303 kumaraditya303 marked this pull request as ready for review March 14, 2022 11:14
@sweeneyde
Copy link
Member

I agree with @animalize that it would be safest to include a preprocessor directive, whether that's with separate #if _PY_NSMALLPOSINTS > 255/#else code, or with a #error directive.

I'm not sure why anyone would compile with fewer than 256 cached small ints. Here, @vstinner added the code

// _PyLong_GetZero() and _PyLong_GetOne() must always be available
#if _PY_NSMALLPOSINTS < 2
#  error "_PY_NSMALLPOSINTS must be greater than 1"
#endif

@vstinner, would there be any downside to requiring all of (0, 1, ..., 255) be in the small int cache?

@vstinner
Copy link
Member

I don't think that anyone ever tuned _PY_NSMALLPOSINTS. The value should be hardcoded. But just for sanity, you can add a static_assert() in code which makes assumptions about its value, just in case if someone changes _PY_NSMALLPOSINTS in the future. For example, I added assertions to ensure that 0 and 1 singletons always exist.

@kumaraditya303
Copy link
Contributor Author

@sweeneyde I have added the compilation guard, but FYI if _PY_NSMALLPOSINTS is changed then it would break deepfreeze and module freezing infra so it is not configurable not to mention that it is declared in internal header.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants