Skip to content

Conversation

@P403n1x87
Copy link
Contributor

@P403n1x87 P403n1x87 commented Oct 17, 2025

We try to use sets for quick lookups and cache the result of some repetitive operations to speed up some operations. We also remove a good deal of asserts embedded within the code that adde extra overhead. Where appropriate, these should be replaced with actual tests.

The current change squeezes about 3/4% performance from a decompile/recompile loop.

Benchmarking

The following round-trip script has been used to benchmark the performance of the proposed changes

from time import monotonic

from bytecode import Bytecode

n = []
for _ in range(30):
    start = monotonic()
    for _ in range(2000):
        Bytecode.from_code(Bytecode.from_code.__code__).to_code()
    n.append(monotonic() - start)


avg = sum(n) / len(n)
sdev = (sum(_**2 for _ in n) / len(n) - avg**2) ** 0.5
print(f"{avg:0.2f} +/- {sdev:0.2f}")

Baseline: 1.10 +/- 0.02
This PR: 1.03 +/- 0.02

@P403n1x87 P403n1x87 force-pushed the perf/improve-performance branch from d1eec34 to abd3d1d Compare October 17, 2025 11:30
@codecov-commenter
Copy link

codecov-commenter commented Oct 17, 2025

Codecov Report

❌ Patch coverage is 82.89474% with 26 lines in your changes missing coverage. Please review.
✅ Project coverage is 91.49%. Comparing base (7f3ed3e) to head (d5a0515).

Files with missing lines Patch % Lines
src/bytecode/concrete.py 83.45% 16 Missing and 6 partials ⚠️
src/bytecode/instr.py 83.33% 3 Missing ⚠️
src/bytecode/cfg.py 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #172      +/-   ##
==========================================
- Coverage   95.43%   91.49%   -3.94%     
==========================================
  Files           7        7              
  Lines        2147     2164      +17     
  Branches      481      489       +8     
==========================================
- Hits         2049     1980      -69     
- Misses         56      122      +66     
- Partials       42       62      +20     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Owner

@MatthieuDartiailh MatthieuDartiailh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for starting this, just one small question. Also you may need some convincing for Mypy to be happy with you without all the assertions.

Comment on lines -38 to +39
BITFLAG2_OPCODES = (_opcode.opmap["LOAD_SUPER_ATTR"],) if PY312 else ()
BITFLAG2_OPCODES = {_opcode.opmap["LOAD_SUPER_ATTR"]} if PY312 else set()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity do you have a micro-benchmark for this. It is not obvious to me that for a single element a set is faster ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The core of the performance gains come from caching and avoiding some of the more expensive asserts. These small sets are more for type coherence than anything else. I have the feeling that all the branching that is done at runtime is also a main "offender" performance-wise. It would be great if sys.version_info checks could be done just once at compile time, possibly at the cost of copy-pasting the same code. I'll see if there are any other low-hanging fruits that we can pick

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact I've got a pretty decent improvement when turning HASJREL etc... into sets. I'll post some benchmarking notes in the description.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that I would love to have the equivalent of compile time specialization.

We try to use sets for quick lookups and cache the result of some
repetitive operations to speed up some operations. We also remove
a good deal of asserts embedded within the code that adde extra
overhead.
@P403n1x87 P403n1x87 force-pushed the perf/improve-performance branch from abd3d1d to 7ac2792 Compare October 17, 2025 14:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants