Skip to content

Specialize FOR_ITER via GET_ITER #32340

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 5 commits into from
Closed

Conversation

sweeneyde
Copy link
Member

@sweeneyde
Copy link
Member Author

Microbenchmarks show very little change to range iteration, but a decent (~1.13x) speedup to list iteration.

sween@DESKTOP-3GGN2FL:~/before_getiter2$ ./python -m pyperf timeit -s "R = range(10**5)" "for i in R: pass"
.....................
Mean +- std dev: 1.16 ms +- 0.03 ms
sween@DESKTOP-3GGN2FL:~/before_getiter2$ ./python -m pyperf timeit -s "R = list(range(10**5))" "for i in R: pass"
.....................
Mean +- std dev: 562 us +- 7 us


sween@DESKTOP-3GGN2FL:~/get_iter2$ ./python -m pyperf timeit -s "R = range(10**5)" "for i in R: pass"
.....................
Mean +- std dev: 1.15 ms +- 0.03 ms
sween@DESKTOP-3GGN2FL:~/get_iter2$ ./python -m pyperf timeit -s "R = list(range(10**5))" "for i in R: pass"
.....................
Mean +- std dev: 495 us +- 7 us

@sweeneyde
Copy link
Member Author

sweeneyde commented Apr 6, 2022

pyperformance results (Geometric mean: 1.00x faster):

Slower (18):
- scimark_sparse_mat_mult: 5.17 ms +- 0.28 ms -> 5.55 ms +- 0.20 ms: 1.07x slower
- logging_format: 7.20 us +- 0.13 us -> 7.69 us +- 0.57 us: 1.07x slower
- logging_simple: 6.49 us +- 0.14 us -> 6.82 us +- 0.27 us: 1.05x slower
- mako: 10.1 ms +- 0.1 ms -> 10.6 ms +- 0.7 ms: 1.05x slower
- fannkuch: 396 ms +- 11 ms -> 412 ms +- 10 ms: 1.04x slower
- pickle_list: 4.44 us +- 0.14 us -> 4.62 us +- 0.17 us: 1.04x slower
- unpack_sequence: 43.0 ns +- 0.7 ns -> 44.5 ns +- 2.1 ns: 1.03x slower
- scimark_monte_carlo: 67.9 ms +- 1.9 ms -> 70.0 ms +- 5.6 ms: 1.03x slower
- nbody: 92.9 ms +- 3.6 ms -> 95.6 ms +- 3.2 ms: 1.03x slower
- pickle_dict: 28.7 us +- 0.8 us -> 29.5 us +- 0.6 us: 1.03x slower
- pidigits: 180 ms +- 2 ms -> 185 ms +- 2 ms: 1.03x slower
- telco: 6.52 ms +- 0.12 ms -> 6.65 ms +- 0.16 ms: 1.02x slower
- unpickle_pure_python: 233 us +- 8 us -> 237 us +- 8 us: 1.02x slower
- raytrace: 312 ms +- 5 ms -> 317 ms +- 15 ms: 1.02x slower
- pathlib: 19.2 ms +- 0.2 ms -> 19.5 ms +- 0.4 ms: 1.01x slower
- pickle_pure_python: 315 us +- 11 us -> 318 us +- 7 us: 1.01x slower
- meteor_contest: 109 ms +- 2 ms -> 109 ms +- 2 ms: 1.01x slower
- scimark_lu: 106 ms +- 2 ms -> 107 ms +- 2 ms: 1.01x slower

Faster (16):
- json_loads: 29.8 us +- 0.8 us -> 27.0 us +- 1.4 us: 1.11x faster
- richards: 55.8 ms +- 3.8 ms -> 52.5 ms +- 3.8 ms: 1.06x faster
- django_template: 40.4 ms +- 1.6 ms -> 38.1 ms +- 1.7 ms: 1.06x faster
- python_startup_no_site: 6.81 ms +- 0.08 ms -> 6.51 ms +- 0.05 ms: 1.05x faster
- pyflate: 456 ms +- 33 ms -> 439 ms +- 11 ms: 1.04x faster
- unpickle_list: 4.97 us +- 0.16 us -> 4.80 us +- 0.15 us: 1.04x faster
- go: 147 ms +- 4 ms -> 142 ms +- 4 ms: 1.04x faster
- regex_v8: 24.5 ms +- 1.8 ms -> 23.9 ms +- 1.3 ms: 1.03x faster
- regex_dna: 218 ms +- 3 ms -> 212 ms +- 10 ms: 1.02x faster
- regex_compile: 143 ms +- 2 ms -> 140 ms +- 4 ms: 1.02x faster
- deltablue: 3.91 ms +- 0.13 ms -> 3.83 ms +- 0.12 ms: 1.02x faster
- html5lib: 67.9 ms +- 3.0 ms -> 66.5 ms +- 2.7 ms: 1.02x faster
- chaos: 75.0 ms +- 2.3 ms -> 74.1 ms +- 1.8 ms: 1.01x faster
- scimark_fft: 357 ms +- 10 ms -> 354 ms +- 6 ms: 1.01x faster
- xml_etree_iterparse: 105 ms +- 1 ms -> 104 ms +- 2 ms: 1.01x faster
- sympy_expand: 522 ms +- 9 ms -> 518 ms +- 7 ms: 1.01x faster

Benchmark hidden because not significant (25): 2to3, chameleon, crypto_pyaes, dulwich_log, float, hexiom, json_dumps, logging_silent, nqueens, pickle, python_startup, regex_effbot, scimark_sor, spectral_norm, sqlalchemy_declarative, sqlalchemy_imperative, sqlite_synth, sympy_integrate, sympy_sum, sympy_str, tornado_http, unpickle, xml_etree_parse, xml_etree_generate, xml_etree_process

Geometric mean: 1.00x faster

@Fidget-Spinner
Copy link
Member

Can I review this or is it still WIP?

@sweeneyde
Copy link
Member Author

Feedback/reviews/discussions are appreciated! I'm just not yet sure if this is a good approach at a high level.

@sweeneyde sweeneyde marked this pull request as ready for review April 6, 2022 14:27
@sweeneyde sweeneyde requested review from a team and markshannon as code owners April 6, 2022 14:27
}
}

TARGET(FOR_ITER_RANGE) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a random thought:

I noticed that for the two instructions, they are similar to the tp_iternext functions of their respective types. Would it make more sense/save more code to just call them here and let PGO inline them? I'm not sure about the perf impact.

@sweeneyde
Copy link
Member Author

Closing this, as it is more complex and conceptually different from other specializations (type checks are cheap anyway).

@sweeneyde sweeneyde closed this Jun 11, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants