-
-
Notifications
You must be signed in to change notification settings - Fork 31.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-118036: Fix a bug with CALL_STAT_INC #117933
Conversation
We were under-counting calls in `_PyEvalFramePushAndInit` because the `CALL_STAT_INC` macro was redefined to a no-op for the Tier 2 interpreter. The fix is a little convoluted. This ought to result in ~37% more "Frames pushed" reported under "Call stats". The new count is the correct one (I presume).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand how this is broken and why this change fixes it, so I'm marking as approved, but I agree it's "weird" / more convoluted than it needs to be. Moving the order of functions in ceval.c
would result in less weirdness (probably just moving _PyEval_EvalFrameDefault
to the bottom since that's where all the macro updates happen) but that would create a lot of churn.
Alternatively, what if you rename REAL_CALL_STAT_INC
to CALL_STAT_INC_ALWAYS
and then just use CALL_STAT_INC_ALWAYS
directly from _PyEvalFramePushAndInit
(which I think is the only call site impacted by this change). That would get rid of the weird dance of #undef
and restoring just CALL_STAT_INC
(but admittedly that would just replace it with another subtlety in _PyEvalFramePushAndInit
).
Let's try a different fix instead.
This should fix the issue with CALL_STAT_INC in a cleaner way (even if the diff is much larger).
Here's another version, where I moved the interpreter loop (and a small assortment of related stuff) to the end of the file. I ran Pystats on a single loop of the Richards benchmark, and found that most stats are approximately the same (not completely) except that "Frames pushed" is about 10% larger, which indicates that the fix works. (The diff is much more annoying to review, but I promise I just moved the stuff from the original lines 606 - 1120 to the bottom of the file, except I had to move |
Benchmark says speed and memory unchanged. |
I think the correct fix for In executor_cases.c.h |
Okay, I'll try that next. |
I'm going to try yet another approach.
The proof will be in the pudding. I'll fire off two benchmark runs, with pystats, one plain, one with Tier 2. (The JIT pystats ought to be similar but I don't want to wait.) |
Benchmark using Tier 1 only shows 36.8% more frames pushed, which is very close to what I measured with the first version of this PR, so I think that suggests this fixes that issue. Everything else I looked at is basically unchanged, suggesting I'm not breaking anything. Still waiting for the Tier 2 benchmark, will update when I see those numbers. |
Benchmark with Tier 2 poses a bit of a mystery, at least the pystats diff. Go to Call stats and open the details box.
@markshannon, @mdboom, @brandtbucher -- could there be some kind of double counting going on? Or is this an expected result? The only change from main is now that we don't undefine |
I think the change is correct, even though the stats are probably still wrong. |
We were under-counting calls in
_PyEvalFramePushAndInit
because theCALL_STAT_INC
macro was redefined to a no-op for the Tier 2 interpreter. The fix is a little convoluted (I had wanted to move the code around, but that would require moving something else around, and in the end I figured it was easier to tweak the macros @markshannon might disagree though?). This ought to result in ~37% more "Frames pushed" reported under "Call stats". The new count is the correct one (I presume).@mdboom can you review? This is one commit from my experiment about removing Tier 2 entirely (gh-117908).
To see the effect, look at these pystat diffs.