-
-
Notifications
You must be signed in to change notification settings - Fork 31.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cut tricky goto
that isn't needed, in _PyBytes_DecodeEscape.
#15825
Conversation
This is the sort of `goto` that requires the reader to stare hard at the code to unpick what it's doing. On doing so, the answer is... not very much! * It jumps from the bottom of the loop to almost the top; the effect is to bypass the loop condition `s < end` and also the `if`-condition `*s != '\\'`, acting as if both are true. * We've just decremented `s`, after incrementing it in the `switch` condition. So it has the same value as when `s == end` failed. Before that was another increment... and before that we had `s < end`. So `s < end` true, then increment, then `s == end` false... that means `s < end` is still true. * Also this means `s` points to the same character as it did for the `switch` condition. And there was a `case '\\'`, which we didn't hit -- so `*s != '\\'` is also true. * That means this has no effect on the behavior! The most it might do is an optimization -- we get to skip those two checks, because (as just proven above) we know they're true. * But gosh, this is the *invalid escape sequence* path. This does not seem like the kind of code path that calls for extreme optimization tricks. So, take the `goto` and the label out. Perhaps the compiler will notice the exact same facts we showed above, and generate identical code. Or perhaps it won't! That'll be OK. But then, crucially, if some future edit to this loop causes the reasoning above to *stop* holding true... the compiler will adjust this jump accordingly. One of us fallible humans might not.
Out of curiosity, how did you manage to find this? I can follow the logic provided, but I definitely would not have noticed this myself. |
We usually reject pull requests that improve code style/readability without fixing actual bug or improving something. |
Yeah, (My understanding is that there is a sequel that does more.) |
Thanks @benjaminp for the merge!
Indeed -- just sent #16013 , which cuts a bunch of tricky code here that hasn't been used since early Python 3 development in 2007. @asvetlov , I hope you'll agree in the case of that PR that it improves something, even if the something is in reading the code and not running it. (I would say the same is true of this one, but the improvement is certainly much smaller than #16013 .) |
Good question! The general answer is that I was reading a lot of code in the codebase. I came across this function and was curious what it did, so I read through it and looked at its callers. This Very concretely, IIRC I came across this function from exploring where |
…nGH-15825) This is the sort of `goto` that requires the reader to stare hard at the code to unpick what it's doing. On doing so, the answer is... not very much! * It jumps from the bottom of the loop to almost the top; the effect is to bypass the loop condition `s < end` and also the `if`-condition `*s != '\\'`, acting as if both are true. * We've just decremented `s`, after incrementing it in the `switch` condition. So it has the same value as when `s == end` failed. Before that was another increment... and before that we had `s < end`. So `s < end` true, then increment, then `s == end` false... that means `s < end` is still true. * Also this means `s` points to the same character as it did for the `switch` condition. And there was a `case '\\'`, which we didn't hit -- so `*s != '\\'` is also true. * That means this has no effect on the behavior! The most it might do is an optimization -- we get to skip those two checks, because (as just proven above) we know they're true. * But gosh, this is the *invalid escape sequence* path. This does not seem like the kind of code path that calls for extreme optimization tricks. So, take the `goto` and the label out. Perhaps the compiler will notice the exact same facts we showed above, and generate identical code. Or perhaps it won't! That'll be OK. But then, crucially, if some future edit to this loop causes the reasoning above to *stop* holding true... the compiler will adjust this jump accordingly. One of us fallible humans might not.
This is the sort of
goto
that requires the reader to stare hard atthe code to unpick what it's doing.
On doing so, the answer is... not very much!
It jumps from the bottom of the loop to almost the top; the effect
is to bypass the loop condition
s < end
and also theif
-condition*s != '\\'
, acting as if both are true.We've just decremented
s
, after incrementing it in theswitch
condition. So it has the same value as when
s == end
failed.Before that was another increment... and before that we had
s < end
. Sos < end
true, then increment, thens == end
false... that means
s < end
is still true.Also this means
s
points to the same character as it did for theswitch
condition. And there was acase '\\'
, which we didn'thit -- so
*s != '\\'
is also true.That means this has no effect on the behavior! The most it might do
is an optimization -- we get to skip those two checks, because (as
just proven above) we know they're true.
But gosh, this is the invalid escape sequence path. This does not
seem like the kind of code path that calls for extreme optimization
tricks.
So, take the
goto
and the label out.Perhaps the compiler will notice the exact same facts we showed above,
and generate identical code. Or perhaps it won't! That'll be OK.
But then, crucially, if some future edit to this loop causes the
reasoning above to stop holding true... the compiler will adjust
this jump accordingly. One of us fallible humans might not.