Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-118036: Fix a bug with CALL_STAT_INC #117933

Merged
merged 5 commits into from
Apr 18, 2024
Merged

Conversation

gvanrossum
Copy link
Member

@gvanrossum gvanrossum commented Apr 16, 2024

We were under-counting calls in _PyEvalFramePushAndInit because the CALL_STAT_INC macro was redefined to a no-op for the Tier 2 interpreter. The fix is a little convoluted (I had wanted to move the code around, but that would require moving something else around, and in the end I figured it was easier to tweak the macros @markshannon might disagree though?). This ought to result in ~37% more "Frames pushed" reported under "Call stats". The new count is the correct one (I presume).

@mdboom can you review? This is one commit from my experiment about removing Tier 2 entirely (gh-117908).

To see the effect, look at these pystat diffs.

We were under-counting calls in `_PyEvalFramePushAndInit`
because the `CALL_STAT_INC` macro was redefined to a no-op
for the Tier 2 interpreter. The fix is a little convoluted.
This ought to result in ~37% more "Frames pushed" reported
under "Call stats". The new count is the correct one
(I presume).
Copy link
Contributor

@mdboom mdboom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand how this is broken and why this change fixes it, so I'm marking as approved, but I agree it's "weird" / more convoluted than it needs to be. Moving the order of functions in ceval.c would result in less weirdness (probably just moving _PyEval_EvalFrameDefault to the bottom since that's where all the macro updates happen) but that would create a lot of churn.

Alternatively, what if you rename REAL_CALL_STAT_INC to CALL_STAT_INC_ALWAYS and then just use CALL_STAT_INC_ALWAYS directly from _PyEvalFramePushAndInit (which I think is the only call site impacted by this change). That would get rid of the weird dance of #undef and restoring just CALL_STAT_INC (but admittedly that would just replace it with another subtlety in _PyEvalFramePushAndInit).

Let's try a different fix instead.
This should fix the issue with CALL_STAT_INC in a cleaner way
 (even if the diff is much larger).
@gvanrossum
Copy link
Member Author

gvanrossum commented Apr 16, 2024

Here's another version, where I moved the interpreter loop (and a small assortment of related stuff) to the end of the file.

I ran Pystats on a single loop of the Richards benchmark, and found that most stats are approximately the same (not completely) except that "Frames pushed" is about 10% larger, which indicates that the fix works. (The diff is much more annoying to review, but I promise I just moved the stuff from the original lines 606 - 1120 to the bottom of the file, except I had to move #include "ceval_macros" to nearly the top (but after LLTRACE is defined.)

@gvanrossum
Copy link
Member Author

Benchmark says speed and memory unchanged.

@markshannon
Copy link
Member

I think the correct fix for CALL_STAT_INC is to not #undefine it at all.
It doesn't matter whether it happens in tier 1 or tier 2. A call is a call.

In executor_cases.c.h CALL_STAT_INC occurs only once in _PUSH_FRAME and that should be counted.

@gvanrossum
Copy link
Member Author

I think the correct fix for CALL_STAT_INC is to not #undefine it at all.

Okay, I'll try that next.

@gvanrossum
Copy link
Member Author

The proof will be in the pudding. I'll fire off two benchmark runs, with pystats, one plain, one with Tier 2. (The JIT pystats ought to be similar but I don't want to wait.)

@gvanrossum
Copy link
Member Author

Benchmark using Tier 1 only shows 36.8% more frames pushed, which is very close to what I measured with the first version of this PR, so I think that suggests this fixes that issue. Everything else I looked at is basically unchanged, suggesting I'm not breaking anything.

Still waiting for the Tier 2 benchmark, will update when I see those numbers.

@gvanrossum
Copy link
Member Author

Benchmark with Tier 2 poses a bit of a mystery, at least the pystats diff.

Go to Call stats and open the details box.

  • Frames pushed is 40% higher
  • Calls to Python functions inlined is 30% higher (it wasn't in the Tier 1 pystats diff)

@markshannon, @mdboom, @brandtbucher -- could there be some kind of double counting going on? Or is this an expected result? The only change from main is now that we don't undefine CALL_STAT_INC, which means that where that macro is called from Tier 2 it actually updates the count.

@markshannon markshannon changed the title Fix a bug with CALL_STAT_INC GH-118036: Fix a bug with CALL_STAT_INC Apr 18, 2024
@markshannon
Copy link
Member

markshannon commented Apr 18, 2024

I think the change is correct, even though the stats are probably still wrong.
I've created an issue for this (#118036).

@gvanrossum gvanrossum merged commit 40f4d64 into python:main Apr 18, 2024
37 checks passed
@gvanrossum gvanrossum deleted the call-stat-inc branch April 18, 2024 14:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants