-
-
Notifications
You must be signed in to change notification settings - Fork 31.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bpo-43754: Eliminate bindings for partial pattern matches #25229
Conversation
@pablogsal, @markshannon, @gvanrossum: how do you feel about this change? I plan on cleaning this up more in the coming weeks, but I'd really like to get this initial rewrite into 3.10, if possible. Two main reasons:
I know it's a big ask this late in the release cycle (I needed to wait for the AST redesign to finish), but I don't want to merge this without at least one approving review. I'd really appreciate any comments, or even just your thoughts on whether this is a good idea or not. |
Also... since this replaces the
|
Hummm, still the standard deviation is quite high: 6% slower is at the very least suspicious. I will try to reproduce this in the speed.python.org server and my benchmark machine today or tomorrow. |
These are my results:
This looks more reasonable (they are executed with CPU isolation so I assume that's why they are more stable). |
Thanks @pablogsal! That's interesting... I used CPU isolation too. I wonder why mine was so much more unstable. |
Unfortunately, I don't know if I will have time to go through this PR before next week because everything is slighly on fire. It would be great if @markshannon or @serhiy-storchaka could give it a look. I checked and at least we are refleak free:
|
No problem @pablogsal. Feel free to nosy me on anything I might be able to help with. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really like the general idea. My concerns are with the bytecode changes.
Please leave ROT_TWO
, ROT_THREE
and ROT_FOUR
.
Fixed sized stack manipulations are a lot more efficient: https://godbolt.org/z/97qYzY9Pj
The vast majority of ROTATE
s can be translated into one of the above.
Python/compile.c
Outdated
|
||
// Build all of the fail_pop blocks and reset fail_pop. | ||
static int | ||
cleanup_fail_pop(struct compiler *c, pattern_context *pc) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd break this into two, emitting and freeing. cleanup
is ambiguous here, it could be emitting cleanup code or cleaning up the allocated memory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, since the callers allocate the memory, maybe just do the emitting here, and let the callers free the memory?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, not sure what you mean by "the callers allocate the memory". Allocation of fail_pop
happens in ensure_fail_pop
, which itself is almost always called by jump_to_fail_pop
. jump_to_fail_pop
typically gets called many times, often very far from where cleanup_fail_pop
is called.
I'd prefer to keep these steps (emitting and freeing the fail_pop
blocks) together. They are very tightly coupled: we never want to free them without emitting them first, and we never want to emit them without freeing them immediately after. Plus, the freeing part is just two lines that cannot fail.
Regarding the name, how about emit_and_reset_fail_pop
? (I like "reset" better than "free" since we're also zeroing fail_pop_size
).
Thanks for the review @markshannon! I've:
I also made a tweak to no longer store names in reverse order, since I noticed that this interacted badly with sequence patterns: match ...:
case a, b, c:
pass Before: 2 0 LOAD_CONST 0 (Ellipsis)
3 2 MATCH_SEQUENCE
4 POP_JUMP_IF_FALSE 15 (to 30)
6 GET_LEN
8 LOAD_CONST 1 (3)
10 COMPARE_OP 2 (==)
12 POP_JUMP_IF_FALSE 15 (to 30)
14 UNPACK_SEQUENCE 3
16 ROT_THREE
18 ROT_TWO
20 STORE_NAME 0 (c)
22 STORE_NAME 1 (b)
24 STORE_NAME 2 (a)
4 26 LOAD_CONST 2 (None)
28 RETURN_VALUE
>> 30 POP_TOP
32 LOAD_CONST 2 (None)
34 RETURN_VALUE After: 2 0 LOAD_CONST 0 (Ellipsis)
3 2 MATCH_SEQUENCE
4 POP_JUMP_IF_FALSE 13 (to 26)
6 GET_LEN
8 LOAD_CONST 1 (3)
10 COMPARE_OP 2 (==)
12 POP_JUMP_IF_FALSE 13 (to 26)
14 UNPACK_SEQUENCE 3
16 STORE_NAME 0 (a)
18 STORE_NAME 1 (b)
20 STORE_NAME 2 (c)
4 22 LOAD_CONST 2 (None)
24 RETURN_VALUE
>> 26 POP_TOP
28 LOAD_CONST 2 (None)
30 RETURN_VALUE |
If nobody raises any other concerns with this PR before noon tomorrow (PT), I'll go ahead and land this. (I didn't see any refleaks locally, but I'm going to sic the buildbots on this anyways.) |
🤖 New build scheduled with the buildbot fleet by @brandtbucher for commit 99f0ec6 🤖 If you want to schedule another build, you need to add the ":hammer: test-with-buildbots" label again. |
👍 for merging. |
The three s390x refleak buildbot failures look unrelated ( |
This draft PR contains a couple of pattern matching improvements I'm experimenting with. Most significantly, it implements behavior that several people requested during the PEP 634 feedback process: no more name bindings for partial matches.
(One caveat: all names are bound before evaluating the corresponding guard. If the guard fails, these bindings won't be rolled back before matching the next case.)
In short, this is accomplished by storing captured objects on the stack underneath the more immediately important stuff, popping them all on failure or storing them all only once the entire pattern for that case matches. The control flow needs to be greatly simplified in order to make this work correctly, so the patch replaces the current "push bools and jump if false a bunch of times" flow with something much more efficient: jumping straight into a run of
POP_TOP
s on failure. We already do something very similar to this inpattern_helper_sequence_unpack
(the comments in the current code explain it well).Parts of the new code are a bit more complex than before (especially or-patterns, which need to do a bit of juggling on the stack to reorder captured items and avoid popping important data on failure), and other parts can certainly still be improved... but I personally consider it a win to have more intuitive partial match behavior with no performance penalty.
https://bugs.python.org/issue43754