-
Notifications
You must be signed in to change notification settings - Fork 5.1k
Fix underestimation of temps size #58969
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
It was possible for us to underestimate the size of temps required. This happened because we used the state of the frame layout to check if the size of temps was computed and that check was wrong. This could lead us to make different decisions about which registers needed to be saved in the prolog and epilog on ARM32. In particular, if the frame size was around the size where a stack probe is necessary, this could be possible. There are also other comments that suggest that this could result in failure while encoding instructions that reference the stack locals. Fix dotnet#58293
Tagging subscribers to this area: @JulieLeeMSFT Issue DetailsIt was possible for us to underestimate the size of temps required. This There are also other comments that suggest that this could result in Fix #58293 cc @dotnet/jit-contrib
|
Some sizable ARM32 diffs. The diffs look to be because the different frame size seen by RA leads to some different spilling choices. coreclr_tests.pmi.Linux.arm.checked.mch:
Detail diffs
libraries.crossgen2.Linux.arm.checked.mch:
Detail diffs
libraries.pmi.Linux.arm.checked.mch:
Detail diffs
libraries_tests.pmi.Linux.arm.checked.mch:
Detail diffs
|
In the issue you state,
Who increases the size of compLclFrameSize? Presumably it happens when runtime/src/coreclr/jit/codegencommon.cpp Lines 6716 to 6720 in 88eb71e
Your change seems fine, but presumably it is causing diffs because it is being conservative in some dimension even when it ends up not being required sometime? |
Correct, it happens because LSRA also does a frame layout, and during this frame layout when we call
Not totally sure I understand the suggested alternative. Indeed this might be conservative. But there are several comments that reference that it may be problematic to underestimate the size of temps during LSRA, e.g. runtime/src/coreclr/jit/compiler.h Lines 3652 to 3659 in 9035d94
runtime/src/coreclr/jit/compiler.cpp Lines 4092 to 4099 in 9035d94
Currently, we are always allocating 0 bytes for temps during the frame layout in LSRA, so any time we need temps we underestimate the size. By the way, the reason I think it usually does not cause problems is that lvaFrameSize used in compRsvdRegCheck is already conservative enough to normally make up for the underestimated size of temps, since it assumes we will need to store all non-volatile registers which is almost never the case. It might be possible to make a case where we do return false from compRsvdRegCheck when actually we will need it if enough registers are in use and enough spill temps are needed. I think the same check is the cause of the diffs, since we are now returning true from compRsvdRegCheck more often than before.
|
I looked at this a bit more, but I think |
I don't see the segfault in preview 7 with the original example, but it's very likely that there just is some slightly different stack usage that causes the edge case to not be hit. The code was last changed in #48199 but that seems to just be a refactoring, before that it has been a while. So this might not be a .NET 6 regression. I'm not sure whether this needs to be backported or not. We could potentially add the temps size where we check for the stack probe. I'm not totally comfortable with this since the code tries to take special care to handle temps during frame layout with comments about why, but it would at least resolve the regressions. |
@BruceForstall @AndyAyersMS @kunalspathak PTAL ASAP before RC2 snap. |
@JulieLeeMSFT I don't see the segfault on .NET 5 with the original program but it is likely that the underlying issue is there and just needs a slightly different program to be hit. So I can't confirm whether or not this is a .NET 6 regression, but I would assume we have been using the frame size this way to check stack probes for a long time and that it therefore is not a regression. |
OK. Then I am marking this as .NET7. |
The comments you reference above in compiler.h: runtime/src/coreclr/jit/compiler.h Lines 3652 to 3659 in 9035d94
refer to the pre-RyuJIT register predictor, which had very different spill temp behavior. In particular, it couldn't guarantee the number of spill temps required before codegen started, whereas LSRA does make that guarantee, so now, final frame layout knows the exact number and size. I guess we still need to be conservative before that point, as the comments for lvaGetMaxSpillTempSize indicate. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
The failure is a Mono job so does not seem related. |
It was possible for us to underestimate the size of temps required. This
happened because we used the state of the frame layout to check if the
size of temps was computed and that check was wrong. This could lead us
to make different decisions about which registers needed to be saved in
the prolog and epilog on ARM32. In particular, if the frame size was
around the size where a stack probe is necessary, this could be
possible.
There are also other comments that suggest that this could result in
failure while encoding instructions that reference the stack locals.
Fix #58293
cc @dotnet/jit-contrib