-
Notifications
You must be signed in to change notification settings - Fork 5.3k
Fix LSRA handling of arm32 double registers and spilling #94947
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix LSRA handling of arm32 double registers and spilling #94947
Conversation
|
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch Issue DetailsIf LSRA uses
|
|
/azp run runtime-coreclr outerloop, runtime-coreclr jitstress, runtime-coreclr libraries-jitstress |
|
Azure Pipelines successfully started running 3 pipeline(s). |
|
/azp run runtime-coreclr superpmi-diffs |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
All the test failures are known or infra |
|
fyi, I found this testing |
|
@kunalspathak PTAL |
src/coreclr/jit/lsra.cpp
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If busyRegs does not contain both registers of the pairs, a more appropriate fix would be to find place where we miss setting 1 of the register and fix it. When we set regMask in regsBusyUntilKill or regsInUseThisLocation, we mostly use getRegMask which takes into account the "double register" as well.
runtime/src/coreclr/jit/lsra.h
Lines 1740 to 1746 in 740ecd0
| #ifdef TARGET_ARM | |
| if (regType == TYP_DOUBLE) | |
| { | |
| assert(genIsValidDoubleReg(reg)); | |
| regMask |= (regMask << 1); | |
| } | |
| #endif // TARGET_ARM |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The model used for arm32 doubles is possibly inconsistent: it seems like only the low register of the pair is set in some (most) cases, whereas in getRegMask both are set (IMO, that function has poorly named given genRegMask exists). Also, it's not clear when LSRA should use genRegMask versus getRegMask.
However, here, busyRegs is correct: only one of two of a double pair are busy. But because we're asking for a double pair, we need to avoid it if the pair is not fully available.
The example is that when we're here allocating a double, the candidates are all the even registers (the odd registers are not candidates, although when we allocate an even register, we implicitly also allocate the odd register). So, say f2 is a candidate, and f3 is a busy register. we need to remove f2 from the candidate set due to f3 being busy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
genRegMaskversusgetRegMask
I think we should use getRegMask(reg, var_type) version and delete genRegMask(reg, var_type). They are doing same thing.
So, say f2 is a candidate, and f3 is a busy register. we need to remove f2 from the candidate set due to f3 being busy.
Correct, thats what I expect, but it should happen at place that added f3 as busy register. Do you know which of regsBusyUntilKill or regsInUseThisLocation does it gets marked as busy? The change looks correct, but I want to make sure we do not accidently mark a register (probably in jitstressregs) not part of current even/odd pair as busy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The case was regsBusyUntilKill.
but it should happen at place that added f3 as busy register
If f3 is marked as busy because it is a float (single-precision) type, then you wouldn't want to mark f2 busy at that point. It's only when we're looking for candidates for a double that we need to mask off both f2 and f3 if f3 is already marked as busy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mind sharing the repro or the JitDump file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sent a link to JitDump offline.
You can also repro locally using:
py src\coreclr\scripts\superpmi.py replay -f coreclr_tests -arch x86 -target_arch arm -target_os linux -jitoption JitOptRepeat#* -jitoption JitOptRepeatCount#2 -jitoption JitStressRegs#3
And looking for:
Assertion failed '!isRegBusy(physRegRecord->regNum, current->registerType)'
(for me, MC#459399)
(I’ve fixed the other asserts in #94250)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to repro and see how f3 ends up in regsBusyUntilKill. I am ok with this fix, but can we add assert((busyRegs & 0x555555550000) == RBM_NONE) which will confirm at least that we never have f2 in this mask and hence we don't accidently remove f1 from the mask?
|
@kunalspathak I updated the comment |
can you also explain the implication of |
src/coreclr/jit/lsra.cpp
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For better understanding this concept, should we create a function or macro something like:
#define RBM_ALLDOUBLE_EVEN(mask) (mask & RBM_ALLDOUBLE_HIGH) >> 1There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm... that seems very confusing to me. RBM_ALLDOUBLE already is the even or "low" registers of the pairs.
This change fixes a bug I noticed just by inspection, and leads to some diffs. I'm still investigating, but I also found some pre-existing corruption in double register output in double register disasm, so I'm continuing down the rabbit hole...
If LSRA uses `regsBusyUntilKill` to eliminate registers from consideration for spilling, for spilling arm32 double type registers, it needs to remove the even registers from the candidates. For example, if `regsBusyUntilKill` contains `f3`, candidates also needs to remove `f2`, since double register candidates only contains the even registers to represent to pair of registers for an arm32 double.
register part of ARM double registers even/odd pairs.
76f0dff to
ef25ee6
Compare
|
I fixed a bug with disasm output of arm double registers d0-d15. There are register allocation diffs due to the @kunalspathak PTAL |
kunalspathak
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
If LSRA uses
regsBusyUntilKillto eliminate registers from consideration for spilling, for spilling arm32 double type registers, it needs to remove the even registers from the candidates. For example, ifregsBusyUntilKillcontainsf3, candidates also needs to removef2, since double register candidates only contains the even registers to represent to pair of registers for an arm32 double.