Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update handling of limited register during consecutive registers allocation #84588

Merged
merged 12 commits into from
Apr 11, 2023

Conversation

kunalspathak
Copy link
Member

Earlier, I added a limit mask to include few registers in case we are allocating during consecutive registers. However, the mask was not sufficient, and we were still running into situation where we won't have register to allocate to one of the consecutive registers or a refPosition that is live at the same location. Updated the handling of such cases:

  1. Fix issue where we might have more than one busy register range available to allocate and in prevRegOpt heuristic, teach it to handle the free registers.
  2. If we see that consecutive registers are not available in JitStressRegs mode, add back a register range so it can allocate for it.
  3. Likewise, if a refposition that is alive at the same location where consecutive registers are live, most of the registers are busy and hence, just do not further limit the registers for such refpositions.
  4. While calculating minRegCount, take into account the refpositions for consecutive registers.
  5. With all of that, we do not need LsraLimitFPSetForConsecutive.

Fixes: #84536

If we find out that there are no candidates free/busy for refPositions
that need consecutive registers, have at least one range of registers
in the candidates such that allocation is possible.
Intially, we were just returning RBM_NONE if we don't find any freeCandidates,
but instead should try if we can find out if there are any busy candidates
that we should try them out.
…osition

If consecutive registers are being allocated, other refpositions that are live
at the same location might not have enough registers left to be assigned because
all registers are busy. As such, introduce a way to track if we are assigning
at the location of consecutive registers, and if yes, do not take jitstressregs
limit into account.
For consecutive register, also include the register count needed for "minimum
register requirement" when limiting the registers.
With other conditions in place, no need to have LsraLimitFPSetForConsecutive.
@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Apr 10, 2023
@ghost ghost assigned kunalspathak Apr 10, 2023
@ghost
Copy link

ghost commented Apr 10, 2023

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch, @kunalspathak
See info in area-owners.md if you want to be subscribed.

Issue Details

Earlier, I added a limit mask to include few registers in case we are allocating during consecutive registers. However, the mask was not sufficient, and we were still running into situation where we won't have register to allocate to one of the consecutive registers or a refPosition that is live at the same location. Updated the handling of such cases:

  1. Fix issue where we might have more than one busy register range available to allocate and in prevRegOpt heuristic, teach it to handle the free registers.
  2. If we see that consecutive registers are not available in JitStressRegs mode, add back a register range so it can allocate for it.
  3. Likewise, if a refposition that is alive at the same location where consecutive registers are live, most of the registers are busy and hence, just do not further limit the registers for such refpositions.
  4. While calculating minRegCount, take into account the refpositions for consecutive registers.
  5. With all of that, we do not need LsraLimitFPSetForConsecutive.

Fixes: #84536

Author: kunalspathak
Assignees: kunalspathak
Labels:

area-CodeGen-coreclr

Milestone: -

Copy link
Member

@BruceForstall BruceForstall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple questions

@@ -12195,6 +12241,10 @@ regMaskTP LinearScan::RegisterSelection::select(Interval* currentInterval,
regMaskTP busyRegs = linearScan->regsBusyUntilKill | linearScan->regsInUseThisLocation;
candidates &= ~busyRegs;

#ifdef DEBUG
inUseOrBusyRegsMask |= ~busyRegs;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be:

Suggested change
inUseOrBusyRegsMask |= ~busyRegs;
inUseOrBusyRegsMask |= busyRegs;

?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Below, I am using this as registerAssignment & inUseOrBusyRegsMask.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should there be a separate one for just inUseRegsMask to help differentiate?

It's confusing to see clearing all the busy regs here given the local name.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have incorporated Bruce's feedback.

@@ -12210,6 +12260,9 @@ regMaskTP LinearScan::RegisterSelection::select(Interval* currentInterval,
(refPosition->delayRegFree && (checkConflictLocation == (refPosition->nodeLocation + 1))))
{
candidates &= ~checkConflictBit;
#ifdef DEBUG
inUseOrBusyRegsMask |= ~checkConflictBit;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
inUseOrBusyRegsMask |= ~checkConflictBit;
inUseOrBusyRegsMask |= checkConflictBit;

?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Below, I am using this as registerAssignment & inUseOrBusyRegsMask.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But what if you | together two different conflict bits? Then all the bits will be set and registerAssignment & inUseOrBusyRegsMask will do nothing. Is that right? Don't you need to | together all the bits first and then registerAssignment & ~inUseOrBusyRegsMask? At the very least, inUseOrBusyRegsMask as a name doesn't make sense since in your usage it's (kind of) the opposite of that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's true.

src/coreclr/jit/lsra.cpp Outdated Show resolved Hide resolved
@kunalspathak
Copy link
Member Author

Seems build failures were fixed by #84595.

@kunalspathak
Copy link
Member Author

kunalspathak commented Apr 11, 2023

superpmi errors is are #84536 and #84631

@tannergooding
Copy link
Member

superpmi errors are #84536

Isn't this PR supposed to fix this failure in particular?

@kunalspathak
Copy link
Member Author

superpmi errors are #84536

Isn't this PR supposed to fix this failure in particular?

Ah, meant to have this comment for the other PR. Will update my comment.

@tannergooding
Copy link
Member

-- As with #84634 (comment), it looks like the SPMI asserts that are still present are from \base\.

This PR, compared to a previous PR has "half" the asserts with older PRs listing failures for both \base\ and \diff\.

@kunalspathak kunalspathak merged commit 7a57f6d into dotnet:main Apr 11, 2023
@kunalspathak kunalspathak deleted the consecutive_jitstressregs branch April 11, 2023 23:54
@ghost ghost locked as resolved and limited conversation to collaborators May 12, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

SPMI replay failure for Arm64 VectorTableLookup
3 participants