Skip to content

[AMDGPU] misched: avoid subregister dependencies #140255

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
70 changes: 70 additions & 0 deletions llvm/lib/Target/AMDGPU/GCNSubtarget.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -535,6 +535,71 @@ unsigned GCNSubtarget::getMaxNumVGPRs(const MachineFunction &MF) const {
return getBaseMaxNumVGPRs(F, MFI.getWavesPerEU());
}

bool GCNSubtarget::isRealSchedDependency(MachineInstr *DefI, int DefOpIdx,
MachineInstr *UseI,
int UseOpIdx) const {
// From the (gfx942, for example) ISA:
// "Packed 32-bit instructions operate on 2 dwords at a time and those
// operands must be two-dword aligned (i.e. an even VGPR address). Output
// modifiers are not supported for these instructions. OPSEL and OPSEL_HI work
// to select the first or second DWORD for each source."
// -> We can save dependencies on VGPRs by analyzing the operand selection.
// See also
// https://llvm.org/docs/AMDGPUModifierSyntax.html#amdgpu-synid-op-sel

if (!InstrInfo.isVOP3P(*UseI))
return true;
MachineOperand &DefOp = DefI->getOperand(DefOpIdx);
if (!DefOp.isReg() || !DefOp.getReg().isPhysical())
return true;
Comment on lines +553 to +554
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't understand the isPhysical check. Why would this only apply during post ra scheduling?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regunits, subregs, regsOverlap, etc. are mostly defined on MCRegister and Register::asMCReg() requires a the register to be physical. That's why I thought that most of the stuff I need is only available post-ra (?)

MachineOperand &UseOp = UseI->getOperand(UseOpIdx);
if (!UseOp.isReg() || !UseOp.getReg().isPhysical())
return true;

AMDGPU::OpName UseModName;
if (AMDGPU::getNamedOperandIdx(UseI->getOpcode(), AMDGPU::OpName::src0) ==
UseOpIdx)
UseModName = AMDGPU::OpName::src0_modifiers;
else if (AMDGPU::getNamedOperandIdx(UseI->getOpcode(),
AMDGPU::OpName::src1) == UseOpIdx)
UseModName = AMDGPU::OpName::src1_modifiers;
else if (AMDGPU::getNamedOperandIdx(UseI->getOpcode(),
AMDGPU::OpName::src2) == UseOpIdx)
UseModName = AMDGPU::OpName::src2_modifiers;
else
return true;
MachineOperand *UseOpMod = InstrInfo.getNamedOperand(*UseI, UseModName);
if (!UseOpMod)
return true;
// Check whether all parts of the register are being used (= op_sel and
// op_sel_hi differ). In that case we can return early.
int64_t OpSel = UseOpMod->getImm() & SISrcMods::OP_SEL_0;
int64_t OpSelHi = UseOpMod->getImm() & SISrcMods::OP_SEL_1;
if ((!OpSel || !OpSelHi) && (OpSel || OpSelHi))
return true;

const SIRegisterInfo *TRI = getRegisterInfo();
const MachineRegisterInfo &MRI = UseI->getParent()->getParent()->getRegInfo();
MCRegister DefReg = DefOp.getReg().asMCReg();
MCRegister UseReg = UseOp.getReg().asMCReg();
// We specifically look for a packed 32bit Use and smaller Def.
if (TRI->getRegSizeInBits(UseReg, MRI) != 64 ||
TRI->getRegSizeInBits(DefReg, MRI) > 32)
return true;
Comment on lines +585 to +588
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't need size checks, it's implied by the regunits you will check

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't find any documentation on whether it's guaranteed that regunits for the amdgpu backend are 16bit, that's why I was unsure whether this is sufficient. Is this a premise?

SmallVector<MCRegUnit, 2> DefRegUnits(TRI->regunits(DefReg));
assert(DefRegUnits.size() <= 2 && "unexpected number of register units");
SmallVector<MCRegUnit, 4> UseRegUnits(TRI->regunits(UseReg));
assert(UseRegUnits.size() == 4 && "unexpected number of register units");

auto FindRegunit = [&DefRegUnits](MCRegUnit A, MCRegUnit B) {
return llvm::find_if(DefRegUnits, [A, B](MCRegUnit RU) {
return RU == A || RU == B;
}) != DefRegUnits.end();
};
return OpSel ? FindRegunit(UseRegUnits[2], UseRegUnits[3])
: FindRegunit(UseRegUnits[0], UseRegUnits[1]);
}
Comment on lines +589 to +601
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems unnecessarily cumbersome. It's odd to directly operate on regunits like this, can you query a read of the subregister index


void GCNSubtarget::adjustSchedDependency(
SUnit *Def, int DefOpIdx, SUnit *Use, int UseOpIdx, SDep &Dep,
const TargetSchedModel *SchedModel) const {
Expand All @@ -545,6 +610,11 @@ void GCNSubtarget::adjustSchedDependency(
MachineInstr *DefI = Def->getInstr();
MachineInstr *UseI = Use->getInstr();

if (!isRealSchedDependency(DefI, DefOpIdx, UseI, UseOpIdx)) {
Dep = SDep(Def, SDep::Artificial);
return; // this is not a data dependency anymore
}

if (DefI->isBundle()) {
const SIRegisterInfo *TRI = getRegisterInfo();
auto Reg = Dep.getReg();
Expand Down
6 changes: 6 additions & 0 deletions llvm/lib/Target/AMDGPU/GCNSubtarget.h
Original file line number Diff line number Diff line change
Expand Up @@ -272,6 +272,12 @@ class GCNSubtarget final : public AMDGPUGenSubtargetInfo,
SITargetLowering TLInfo;
SIFrameLowering FrameLowering;

/// Check whether there is a real dependency between the definition and the
/// use. The definition might only affect a subregister that is not actually
/// used.
bool isRealSchedDependency(MachineInstr *DefI, int DefOpIdx,
MachineInstr *UseI, int UseOpIdx) const;

public:
GCNSubtarget(const Triple &TT, StringRef GPU, StringRef FS,
const GCNTargetMachine &TM);
Expand Down
Loading