Skip to content

Adding additional 8 eGPR. #113988

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 15 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion src/coreclr/jit/compiler.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -3200,7 +3200,7 @@ void Compiler::compInitOptions(JitFlags* jitFlags)
rbmAllInt |= RBM_HIGHINT;
rbmIntCalleeTrash |= RBM_HIGHINT;
cntCalleeTrashInt += CNT_CALLEE_TRASH_HIGHINT;
regIntLast = REG_R23;
regIntLast = REG_R31;
}
#endif // TARGET_AMD64

Expand Down
14 changes: 10 additions & 4 deletions src/coreclr/jit/emit.h
Original file line number Diff line number Diff line change
Expand Up @@ -791,13 +791,15 @@ class emitter
// Note that we use the _idReg1 and _idReg2 fields to hold
// the live gcrefReg mask for the call instructions on x86/x64
//
#if !defined(TARGET_AMD64)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you move this only for x64?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "Space taken up to here" comment, below, is now wrong and needs to be updated, as do the "Space taken up to here" comments that follow.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you move this only for x64?

It's a bit of a weird problem. REGNUM_BITS changing from 6 to 7 results in _idReg1 and _idReg2 taking up an extra bit each. This throws off the instrDesc size for x64 and increases it significantly due to padding. My solution was to move around the position of _idReg1 and _idReg2 to a position for x64 which did not cause addition of as many padding bits.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder why it matters where _idReg1/_idReg2 are placed to determine padding. All type types here have a base type of unsigned which means that bitfields of them should all be able bit packable. It might make sense for performance reasons to avoid splitting _idReg1/_idReg2 across 32-bit (or maybe even byte) boundaries, but that's not why you did it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For reference

  • In main with no changes
    image

  • If I just add new registers with no change in order for x64
    image

  • With the change highlighted by you in the PR
    image

Copy link
Contributor Author

@DeepakRajendrakumaran DeepakRajendrakumaran Apr 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If i leave it as was, _idReg1 cannot stay within 32 bit boundary since it starts at Byte 3, Bit 2(It was not a problem earlier when it was 6 bits). So, it gets moved to Byte 4, Bit 0

Edit - that's my interpretation

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see: a bit field can't extend beyond the limit of the base type (here, unsigned).

btw, is the tool your using to show field offsets public?

Copy link
Contributor Author

@DeepakRajendrakumaran DeepakRajendrakumaran Apr 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The screenshot I used here is from Visual Studio. Click on memory layout from below

image

The memory viewer in Visual Studio Code isn't as intuitive

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have updated the "Space taken up to here" comments

regNumber _idReg1 : REGNUM_BITS; // register num
regNumber _idReg2 : REGNUM_BITS;
#endif

////////////////////////////////////////////////////////////////////////
// Space taken up to here:
// x86: 38 bits
// amd64: 38 bits
// amd64: 26 bits
// arm: 32 bits
// arm64: 46 bits
// loongarch64: 28 bits
Expand All @@ -815,6 +817,10 @@ class emitter
unsigned _idCustom1 : 1;
unsigned _idCustom2 : 1;
unsigned _idCustom3 : 1;
#if defined(TARGET_AMD64)
regNumber _idReg1 : REGNUM_BITS; // register num
regNumber _idReg2 : REGNUM_BITS;
#endif

#define _idBound _idCustom1 /* jump target / frame offset bound */
#define _idTlsGD _idCustom2 /* Used to store information related to TLS GD access on linux */
Expand Down Expand Up @@ -882,7 +888,7 @@ class emitter
////////////////////////////////////////////////////////////////////////
// Space taken up to here:
// x86: 49 bits
// amd64: 49 bits
// amd64: 51 bits
// arm: 48 bits
// arm64: 55 bits
// loongarch64: 46 bits
Expand All @@ -900,7 +906,7 @@ class emitter
#elif defined(TARGET_LOONGARCH64) || defined(TARGET_RISCV64)
#define ID_EXTRA_BITFIELD_BITS (14)
#elif defined(TARGET_XARCH)
#define ID_EXTRA_BITFIELD_BITS (17)
#define ID_EXTRA_BITFIELD_BITS (19)
#else
#error Unsupported or unset target architecture
#endif
Expand Down Expand Up @@ -935,7 +941,7 @@ class emitter
////////////////////////////////////////////////////////////////////////
// Space taken up to here (with/without prev offset, assuming host==target):
// x86: 55/51 bits
// amd64: 56/51 bits
// amd64: 58/53 bits
// arm: 54/50 bits
// arm64: 62/57 bits
// loongarch64: 53/48 bits
Expand Down
6 changes: 3 additions & 3 deletions src/coreclr/jit/emitxarch.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -2529,7 +2529,7 @@ regNumber AbsRegNumber(regNumber reg)
bool IsExtendedReg(regNumber reg)
{
#ifdef TARGET_AMD64
return ((reg >= REG_R8) && (reg <= REG_R23)) || ((reg >= REG_XMM8) && (reg <= REG_XMM31));
return ((reg >= REG_R8) && (reg <= REG_R31)) || ((reg >= REG_XMM8) && (reg <= REG_XMM31));
#else
// X86 JIT operates in 32-bit mode and hence extended reg are not available.
return false;
Expand Down Expand Up @@ -5214,7 +5214,7 @@ inline UNATIVE_OFFSET emitter::emitInsSizeSV(instrDesc* id, code_t code, int var
static bool baseRegisterRequiresSibByte(regNumber base)
{
#ifdef TARGET_AMD64
return base == REG_ESP || base == REG_R12 || base == REG_R20;
return base == REG_ESP || base == REG_R12 || base == REG_R20 || base == REG_R28;
#else
return base == REG_ESP;
#endif
Expand All @@ -5223,7 +5223,7 @@ static bool baseRegisterRequiresSibByte(regNumber base)
static bool baseRegisterRequiresDisplacement(regNumber base)
{
#ifdef TARGET_AMD64
return base == REG_EBP || base == REG_R13 || base == REG_R21;
return base == REG_EBP || base == REG_R13 || base == REG_R21 || base == REG_R29;
#else
return base == REG_EBP;
#endif
Expand Down
4 changes: 1 addition & 3 deletions src/coreclr/jit/emitxarch.h
Original file line number Diff line number Diff line change
Expand Up @@ -41,9 +41,7 @@ inline static bool isHighSimdReg(regNumber reg)
inline static bool isHighGPReg(regNumber reg)
{
#ifdef TARGET_AMD64
// TODO-apx: the definition here is incorrect, we will need to revisit this after we extend the register definition.
// for now, we can simply use REX2 as REX.
return ((reg >= REG_R16) && (reg <= REG_R23));
return ((reg >= REG_R16) && (reg <= REG_R31));
#else
// X86 JIT operates in 32-bit mode and hence extended regs are not available.
return false;
Expand Down
49 changes: 39 additions & 10 deletions src/coreclr/jit/lsra.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -572,7 +572,8 @@ static const regMaskTP LsraLimitUpperSimdSet =
(RBM_XMM16 | RBM_XMM17 | RBM_XMM18 | RBM_XMM19 | RBM_XMM20 | RBM_XMM21 | RBM_XMM22 | RBM_XMM23 | RBM_XMM24 |
RBM_XMM25 | RBM_XMM26 | RBM_XMM27 | RBM_XMM28 | RBM_XMM29 | RBM_XMM30 | RBM_XMM31);
static const regMaskTP LsraLimitExtGprSet =
(RBM_R16 | RBM_R17 | RBM_R18 | RBM_R19 | RBM_R20 | RBM_R21 | RBM_R22 | RBM_R23 | RBM_ETW_FRAMED_EBP);
(RBM_R16 | RBM_R17 | RBM_R18 | RBM_R19 | RBM_R20 | RBM_R21 | RBM_R22 | RBM_R23 | RBM_R24 | RBM_R25 | RBM_R26 |
RBM_R27 | RBM_R28 | RBM_R29 | RBM_R30 | RBM_R31 | RBM_ETW_FRAMED_EBP);
#elif defined(TARGET_ARM)
// On ARM, we may need two registers to set up the target register for a virtual call, so we need
// to have at least the maximum number of arg registers, plus 2.
Expand Down Expand Up @@ -836,6 +837,27 @@ LinearScan::LinearScan(Compiler* theCompiler)
rbmIntCalleeTrash = compiler->rbmIntCalleeTrash;
regIntLast = compiler->regIntLast;
isApxSupported = compiler->canUseApxEncoding();
if (isApxSupported)
{
int size = (int)ACTUAL_REG_COUNT + 1;
regIndices = theCompiler->getAllocator(CMK_LSRA).allocate<regNumber>(size);
for (int i = 0; i < size; i++)
{
regIndices[i] = static_cast<regNumber>(i);
}
}
else
{
regIndices =
new regNumber[]{REG_RAX, REG_RCX, REG_RDX, REG_RBX, REG_RSP, REG_RBP, REG_RSI, REG_RDI,
REG_R8, REG_R9, REG_R10, REG_R11, REG_R12, REG_R13, REG_R14, REG_R15,
REG_XMM0, REG_XMM1, REG_XMM2, REG_XMM3, REG_XMM4, REG_XMM5, REG_XMM6, REG_XMM7,
REG_XMM8, REG_XMM9, REG_XMM10, REG_XMM11, REG_XMM12, REG_XMM13, REG_XMM14, REG_XMM15,
REG_XMM16, REG_XMM17, REG_XMM18, REG_XMM19, REG_XMM20, REG_XMM21, REG_XMM22, REG_XMM23,
REG_XMM24, REG_XMM25, REG_XMM26, REG_XMM27, REG_XMM28, REG_XMM29, REG_XMM30, REG_XMM31,
REG_K0, REG_K1, REG_K2, REG_K3, REG_K4, REG_K5, REG_K6, REG_K7,
REG_COUNT};
}
#endif // TARGET_AMD64

#if defined(TARGET_XARCH)
Expand Down Expand Up @@ -4284,8 +4306,8 @@ void LinearScan::resetAllRegistersState()
resetAvailableRegs();
clearAllNextIntervalRef();
clearAllSpillCost();

for (regNumber reg = REG_FIRST; reg < AVAILABLE_REG_COUNT; reg = REG_NEXT(reg))
int regIndex = REG_FIRST;
for (regNumber reg = REG_FIRST; reg < AVAILABLE_REG_COUNT; NEXT_REGISTER(reg, regIndex))
{
RegRecord* physRegRecord = getRegisterRecord(reg);
#ifdef DEBUG
Expand Down Expand Up @@ -4895,7 +4917,8 @@ void LinearScan::allocateRegistersMinimal()
clearAllNextIntervalRef();
clearAllSpillCost();

for (regNumber reg = REG_FIRST; reg < AVAILABLE_REG_COUNT; reg = REG_NEXT(reg))
int regIndex = REG_FIRST;
for (regNumber reg = REG_FIRST; reg < AVAILABLE_REG_COUNT; NEXT_REGISTER(reg, regIndex))
{
RegRecord* physRegRecord = getRegisterRecord(reg);
physRegRecord->recentRefPosition = nullptr;
Expand Down Expand Up @@ -5558,7 +5581,8 @@ void LinearScan::allocateRegisters()
#endif // FEATURE_PARTIAL_SIMD_CALLEE_SAVE

resetRegState();
for (regNumber reg = REG_FIRST; reg < AVAILABLE_REG_COUNT; reg = REG_NEXT(reg))
int regIndex = REG_FIRST;
for (regNumber reg = REG_FIRST; reg < AVAILABLE_REG_COUNT; NEXT_REGISTER(reg, regIndex))
{
RegRecord* physRegRecord = getRegisterRecord(reg);
physRegRecord->recentRefPosition = nullptr;
Expand Down Expand Up @@ -7877,7 +7901,8 @@ void LinearScan::resolveRegisters()
// are encountered.
if (localVarsEnregistered)
{
for (regNumber reg = REG_FIRST; reg < AVAILABLE_REG_COUNT; reg = REG_NEXT(reg))
int regIndex = REG_FIRST;
for (regNumber reg = REG_FIRST; reg < AVAILABLE_REG_COUNT; NEXT_REGISTER(reg, regIndex))
{
RegRecord* physRegRecord = getRegisterRecord(reg);
Interval* assignedInterval = physRegRecord->assignedInterval;
Expand Down Expand Up @@ -11802,7 +11827,8 @@ bool LinearScan::IsResolutionNode(LIR::Range& containingRange, GenTree* node)
//
void LinearScan::verifyFreeRegisters(regMaskTP regsToFree)
{
for (regNumber reg = REG_FIRST; reg < AVAILABLE_REG_COUNT; reg = REG_NEXT(reg))
int regIndex = REG_FIRST;
for (regNumber reg = REG_FIRST; reg < AVAILABLE_REG_COUNT; NEXT_REGISTER(reg, regIndex))
{
regMaskTP regMask = genRegMask(reg);
// If this isn't available or if it's still waiting to be freed (i.e. it was in
Expand Down Expand Up @@ -11922,7 +11948,8 @@ void LinearScan::verifyFinalAllocation()
}

// Clear register assignments.
for (regNumber reg = REG_FIRST; reg < AVAILABLE_REG_COUNT; reg = REG_NEXT(reg))
int regIndex = REG_FIRST;
for (regNumber reg = REG_FIRST; reg < AVAILABLE_REG_COUNT; NEXT_REGISTER(reg, regIndex))
{
RegRecord* physRegRecord = getRegisterRecord(reg);
physRegRecord->assignedInterval = nullptr;
Expand Down Expand Up @@ -12025,7 +12052,8 @@ void LinearScan::verifyFinalAllocation()
}

// Clear register assignments.
for (regNumber reg = REG_FIRST; reg < AVAILABLE_REG_COUNT; reg = REG_NEXT(reg))
int regIndex = REG_FIRST;
for (regNumber reg = REG_FIRST; reg < AVAILABLE_REG_COUNT; NEXT_REGISTER(reg, regIndex))
{
RegRecord* physRegRecord = getRegisterRecord(reg);
physRegRecord->assignedInterval = nullptr;
Expand Down Expand Up @@ -12350,7 +12378,8 @@ void LinearScan::verifyFinalAllocation()
}

// Clear register assignments.
for (regNumber reg = REG_FIRST; reg < AVAILABLE_REG_COUNT; reg = REG_NEXT(reg))
int regIndex = REG_FIRST;
for (regNumber reg = REG_FIRST; reg < AVAILABLE_REG_COUNT; NEXT_REGISTER(reg, regIndex))
{
RegRecord* physRegRecord = getRegisterRecord(reg);
physRegRecord->assignedInterval = nullptr;
Expand Down
10 changes: 9 additions & 1 deletion src/coreclr/jit/lsra.h
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,13 @@ typedef var_types RegisterType;
#define FloatRegisterType TYP_FLOAT
#define MaskRegisterType TYP_MASK

// NEXT_REGISTER : update reg to next active registers.
#ifdef TARGET_AMD64
#define NEXT_REGISTER(reg, index) index++, reg = regIndices[index]
#else
#define NEXT_REGISTER(reg, index) reg = REG_NEXT(reg)
#endif

//------------------------------------------------------------------------
// regType: Return the RegisterType to use for a given type
//
Expand Down Expand Up @@ -2132,7 +2139,8 @@ class LinearScan : public LinearScanInterface
}
#endif // TARGET_XARCH

unsigned availableRegCount;
unsigned availableRegCount;
regNumber* regIndices;

FORCEINLINE unsigned get_AVAILABLE_REG_COUNT() const
{
Expand Down
79 changes: 52 additions & 27 deletions src/coreclr/jit/register.h
Original file line number Diff line number Diff line change
Expand Up @@ -36,32 +36,41 @@ REGALIAS(RDI, EDI)

#else // !defined(TARGET_X86)

#define GPRMASK(x) (1ULL << (x))
/*
REGDEF(name, rnum, mask, sname) */
REGDEF(RAX, 0, 0x00000001, "rax" )
REGDEF(RCX, 1, 0x00000002, "rcx" )
REGDEF(RDX, 2, 0x00000004, "rdx" )
REGDEF(RBX, 3, 0x00000008, "rbx" )
REGDEF(RSP, 4, 0x00000010, "rsp" )
REGDEF(RBP, 5, 0x00000020, "rbp" )
REGDEF(RSI, 6, 0x00000040, "rsi" )
REGDEF(RDI, 7, 0x00000080, "rdi" )
REGDEF(R8, 8, 0x00000100, "r8" )
REGDEF(R9, 9, 0x00000200, "r9" )
REGDEF(R10, 10, 0x00000400, "r10" )
REGDEF(R11, 11, 0x00000800, "r11" )
REGDEF(R12, 12, 0x00001000, "r12" )
REGDEF(R13, 13, 0x00002000, "r13" )
REGDEF(R14, 14, 0x00004000, "r14" )
REGDEF(R15, 15, 0x00008000, "r15" )
REGDEF(R16, 16, 0x00010000, "r16" )
REGDEF(R17, 17, 0x00020000, "r17" )
REGDEF(R18, 18, 0x00040000, "r18" )
REGDEF(R19, 19, 0x00080000, "r19" )
REGDEF(R20, 20, 0x00100000, "r20" )
REGDEF(R21, 21, 0x00200000, "r21" )
REGDEF(R22, 22, 0x00400000, "r22" )
REGDEF(R23, 23, 0x00800000, "r23" )
REGDEF(RAX, 0, GPRMASK(0), "rax" )
REGDEF(RCX, 1, GPRMASK(1), "rcx" )
REGDEF(RDX, 2, GPRMASK(2), "rdx" )
REGDEF(RBX, 3, GPRMASK(3), "rbx" )
REGDEF(RSP, 4, GPRMASK(4), "rsp" )
REGDEF(RBP, 5, GPRMASK(5), "rbp" )
REGDEF(RSI, 6, GPRMASK(6), "rsi" )
REGDEF(RDI, 7, GPRMASK(7), "rdi" )
REGDEF(R8, 8, GPRMASK(8), "r8" )
REGDEF(R9, 9, GPRMASK(9), "r9" )
REGDEF(R10, 10, GPRMASK(10), "r10" )
REGDEF(R11, 11, GPRMASK(11), "r11" )
REGDEF(R12, 12, GPRMASK(12), "r12" )
REGDEF(R13, 13, GPRMASK(13), "r13" )
REGDEF(R14, 14, GPRMASK(14), "r14" )
REGDEF(R15, 15, GPRMASK(15), "r15" )
REGDEF(R16, 16, GPRMASK(16), "r16" )
REGDEF(R17, 17, GPRMASK(17), "r17" )
REGDEF(R18, 18, GPRMASK(18), "r18" )
REGDEF(R19, 19, GPRMASK(19), "r19" )
REGDEF(R20, 20, GPRMASK(20), "r20" )
REGDEF(R21, 21, GPRMASK(21), "r21" )
REGDEF(R22, 22, GPRMASK(22), "r22" )
REGDEF(R23, 23, GPRMASK(23), "r23" )
REGDEF(R24, 24, GPRMASK(24), "r24" )
REGDEF(R25, 25, GPRMASK(25), "r25" )
REGDEF(R26, 26, GPRMASK(26), "r26" )
REGDEF(R27, 27, GPRMASK(27), "r27" )
REGDEF(R28, 28, GPRMASK(28), "r28" )
REGDEF(R29, 29, GPRMASK(29), "r29" )
REGDEF(R30, 30, GPRMASK(30), "r30" )
REGDEF(R31, 31, GPRMASK(31), "r31" )

REGALIAS(EAX, RAX)
REGALIAS(ECX, RCX)
Expand All @@ -75,11 +84,11 @@ REGALIAS(EDI, RDI)
#endif // !defined(TARGET_X86)

#ifdef TARGET_AMD64
#define XMMBASE 24
#define XMMBASE 32
#define XMMMASK(x) (1ULL << ((x)+XMMBASE))

#define KBASE 56
#define KMASK(x) (1ULL << ((x)+KBASE))
#define KBASE 64
#define KMASK(x) (1ULL << ((x)))

#else // !TARGET_AMD64
#define XMMBASE 8
Expand Down Expand Up @@ -224,6 +233,22 @@ REGDEF(STK, 8+KBASE, 0x0000, "STK" )
#define REG_R22 JITREG_R22
#undef REG_R23
#define REG_R23 JITREG_R23
#undef REG_R24
#define REG_R24 JITREG_R24
#undef REG_R25
#define REG_R25 JITREG_R25
#undef REG_R26
#define REG_R26 JITREG_R26
#undef REG_R27
#define REG_R27 JITREG_R27
#undef REG_R28
#define REG_R28 JITREG_R28
#undef REG_R29
#define REG_R29 JITREG_R29
#undef REG_R30
#define REG_R30 JITREG_R30
#undef REG_R31
#define REG_R31 JITREG_R31
#undef REG_EAX
#define REG_EAX JITREG_EAX
#undef REG_ECX
Expand Down
4 changes: 2 additions & 2 deletions src/coreclr/jit/target.h
Original file line number Diff line number Diff line change
Expand Up @@ -235,9 +235,9 @@ typedef uint64_t regMaskSmall;
#define REG_MASK_ALL_FMT "%016llX"
#endif

#ifdef TARGET_ARM64
#if defined(TARGET_ARM64) || defined(TARGET_AMD64)
#define HAS_MORE_THAN_64_REGISTERS 1
#endif // TARGET_ARM64
#endif // TARGET_ARM64 || TARGET_AMD64

#define REG_LOW_BASE 0
#ifdef HAS_MORE_THAN_64_REGISTERS
Expand Down
Loading
Loading