Skip to content

Commit

Permalink
[MachineCopyPropagation] Check CrossCopyRegClass for cross-class copys
Browse files Browse the repository at this point in the history
On some AMDGPU subtargets, copying to and from AGPR registers using another
AGPR register is not possible. A intermediate VGPR register is needed for AGPR
to AGPR copy. This is an issue when machine copy propagation forwards a
COPY $agpr, replacing a COPY $vgpr which results in $agpr = COPY $agpr. It is
removing a cross class copy that may have been optimized by previous passes and
potentially creating an unoptimized cross class copy later on.

To avoid this issue, check CrossCopyRegClass if a different register class will
be needed for the copy. If so then avoid forwarding the copy when the
destination does not match the desired register class and if the original copy
already matches the desired register class.

Issue seen while attempting to optimize another AGPR to AGPR issue:

Live-ins: $agpr0
$vgpr0 = COPY $agpr0
$agpr1 = V_ACCVGPR_WRITE_B32 $vgpr0
$agpr2 = COPY $vgpr0
$agpr3 = COPY $vgpr0
$agpr4 = COPY $vgpr0

After machine-cp:

$vgpr0 = COPY $agpr0
$agpr1 = V_ACCVGPR_WRITE_B32 $vgpr0
$agpr2 = COPY $agpr0
$agpr3 = COPY $agpr0
$agpr4 = COPY $agpr0

Machine-cp propagated COPY $agpr0 to replace $vgpr0 creating 3 AGPR to AGPR
copys. Later this creates a cross-register copy from AGPR->VGPR->AGPR for each
copy when the prior VGPR->AGPR copy was already optimal.

Reviewed By: lkail, rampitec

Differential Revision: https://reviews.llvm.org/D108011
  • Loading branch information
vangthao95 committed Aug 25, 2021
1 parent 2a35d59 commit 549f6a8
Show file tree
Hide file tree
Showing 4 changed files with 110 additions and 3 deletions.
28 changes: 25 additions & 3 deletions llvm/lib/CodeGen/MachineCopyPropagation.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -414,6 +414,31 @@ bool MachineCopyPropagation::isForwardableRegClassCopy(const MachineInstr &Copy,
if (!UseI.isCopy())
return false;

const TargetRegisterClass *CopySrcRC =
TRI->getMinimalPhysRegClass(CopySrcReg);
const TargetRegisterClass *UseDstRC =
TRI->getMinimalPhysRegClass(UseI.getOperand(0).getReg());
const TargetRegisterClass *CrossCopyRC = TRI->getCrossCopyRegClass(CopySrcRC);

// If cross copy register class is not the same as copy source register class
// then it is not possible to copy the register directly and requires a cross
// register class copy. Fowarding this copy without checking register class of
// UseDst may create additional cross register copies when expanding the copy
// instruction in later passes.
if (CopySrcRC != CrossCopyRC) {
const TargetRegisterClass *CopyDstRC =
TRI->getMinimalPhysRegClass(Copy.getOperand(0).getReg());

// Check if UseDstRC matches the necessary register class to copy from
// CopySrc's register class. If so then forwarding the copy will not
// introduce any cross-class copys. Else if CopyDstRC matches then keep the
// copy and do not forward. If neither UseDstRC or CopyDstRC matches then
// we may need a cross register copy later but we do not worry about it
// here.
if (UseDstRC != CrossCopyRC && CopyDstRC == CrossCopyRC)
return false;
}

/// COPYs don't have register class constraints, so if the user instruction
/// is a COPY, we just try to avoid introducing additional cross-class
/// COPYs. For example:
Expand All @@ -430,9 +455,6 @@ bool MachineCopyPropagation::isForwardableRegClassCopy(const MachineInstr &Copy,
///
/// so we have reduced the number of cross-class COPYs and potentially
/// introduced a nop COPY that can be removed.
const TargetRegisterClass *UseDstRC =
TRI->getMinimalPhysRegClass(UseI.getOperand(0).getReg());

const TargetRegisterClass *SuperRC = UseDstRC;
for (TargetRegisterClass::sc_iterator SuperRCI = UseDstRC->getSuperClasses();
SuperRC; SuperRC = *SuperRCI++)
Expand Down
8 changes: 8 additions & 0 deletions llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -801,6 +801,14 @@ const TargetRegisterClass *SIRegisterInfo::getPointerRegClass(
return &AMDGPU::VGPR_32RegClass;
}

const TargetRegisterClass *
SIRegisterInfo::getCrossCopyRegClass(const TargetRegisterClass *RC) const {
if (isAGPRClass(RC) && !ST.hasGFX90AInsts())
return getEquivalentVGPRClass(RC);

return RC;
}

static unsigned getNumSubRegsForSpillOp(unsigned Op) {

switch (Op) {
Expand Down
7 changes: 7 additions & 0 deletions llvm/lib/Target/AMDGPU/SIRegisterInfo.h
Original file line number Diff line number Diff line change
Expand Up @@ -108,6 +108,13 @@ class SIRegisterInfo final : public AMDGPUGenRegisterInfo {
const TargetRegisterClass *getPointerRegClass(
const MachineFunction &MF, unsigned Kind = 0) const override;

/// Returns a legal register class to copy a register in the specified class
/// to or from. If it is possible to copy the register directly without using
/// a cross register class copy, return the specified RC. Returns NULL if it
/// is not possible to copy between two registers of the specified class.
const TargetRegisterClass *
getCrossCopyRegClass(const TargetRegisterClass *RC) const override;

void buildVGPRSpillLoadStore(SGPRSpillBuilder &SB, int Index, int Offset,
bool IsLoad, bool IsKill = true) const;

Expand Down
70 changes: 70 additions & 0 deletions llvm/test/CodeGen/AMDGPU/agpr-copy-propagation.mir
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
# RUN: llc -march=amdgcn -mcpu=gfx908 %s -o - -run-pass machine-cp -verify-machineinstrs | FileCheck -check-prefix=GFX908 %s
# RUN: llc -march=amdgcn -mcpu=gfx90a %s -o - -run-pass machine-cp -verify-machineinstrs | FileCheck -check-prefix=GFX90A %s

---
name: do_not_propagate_agpr_to_agpr
body: |
bb.0:
successors:
liveins: $agpr0
; GFX908-LABEL: name: do_not_propagate_agpr_to_agpr
; GFX908: renamable $vgpr0 = COPY renamable $agpr0, implicit $exec
; GFX908: renamable $agpr1 = COPY renamable $vgpr0, implicit $exec
; GFX908: renamable $agpr2 = COPY renamable $vgpr0, implicit $exec
; GFX908: S_ENDPGM 0, implicit $vgpr0, implicit $agpr1, implicit $agpr2
; GFX90A-LABEL: name: do_not_propagate_agpr_to_agpr
; GFX90A: renamable $vgpr0 = COPY renamable $agpr0, implicit $exec
; GFX90A: renamable $agpr1 = COPY $agpr0, implicit $exec
; GFX90A: renamable $agpr2 = COPY $agpr0, implicit $exec
; GFX90A: S_ENDPGM 0, implicit $vgpr0, implicit $agpr1, implicit $agpr2
renamable $vgpr0 = COPY renamable $agpr0, implicit $exec
renamable $agpr1 = COPY renamable $vgpr0, implicit $exec
renamable $agpr2 = COPY renamable $vgpr0, implicit $exec
S_ENDPGM 0, implicit $vgpr0, implicit $agpr1, implicit $agpr2
...
---
name: propagate_vgpr_to_agpr
body: |
bb.0:
successors:
liveins: $vgpr0
; GFX908-LABEL: name: propagate_vgpr_to_agpr
; GFX908: renamable $agpr0 = COPY renamable $vgpr0, implicit $exec
; GFX908: renamable $agpr1 = COPY $vgpr0, implicit $exec
; GFX908: renamable $agpr2 = COPY $vgpr0, implicit $exec
; GFX908: S_ENDPGM 0, implicit $agpr0, implicit $agpr1, implicit $agpr2
; GFX90A-LABEL: name: propagate_vgpr_to_agpr
; GFX90A: renamable $agpr0 = COPY renamable $vgpr0, implicit $exec
; GFX90A: renamable $agpr1 = COPY $vgpr0, implicit $exec
; GFX90A: renamable $agpr2 = COPY $vgpr0, implicit $exec
; GFX90A: S_ENDPGM 0, implicit $agpr0, implicit $agpr1, implicit $agpr2
renamable $agpr0 = COPY renamable $vgpr0, implicit $exec
renamable $agpr1 = COPY renamable $agpr0, implicit $exec
renamable $agpr2 = COPY renamable $agpr0, implicit $exec
S_ENDPGM 0, implicit $agpr0, implicit $agpr1, implicit $agpr2
...
---
name: propagate_agpr_to_vgpr
body: |
bb.0:
successors:
liveins: $agpr0
; GFX908-LABEL: name: propagate_agpr_to_vgpr
; GFX908: renamable $vgpr0 = COPY renamable $agpr0, implicit $exec
; GFX908: renamable $vgpr1 = COPY $agpr0, implicit $exec
; GFX908: renamable $vgpr2 = COPY $agpr0, implicit $exec
; GFX908: S_ENDPGM 0, implicit $vgpr0, implicit $vgpr1, implicit $vgpr2
; GFX90A-LABEL: name: propagate_agpr_to_vgpr
; GFX90A: renamable $vgpr0 = COPY renamable $agpr0, implicit $exec
; GFX90A: renamable $vgpr1 = COPY $agpr0, implicit $exec
; GFX90A: renamable $vgpr2 = COPY $agpr0, implicit $exec
; GFX90A: S_ENDPGM 0, implicit $vgpr0, implicit $vgpr1, implicit $vgpr2
renamable $vgpr0 = COPY renamable $agpr0, implicit $exec
renamable $vgpr1 = COPY renamable $vgpr0, implicit $exec
renamable $vgpr2 = COPY renamable $vgpr0, implicit $exec
S_ENDPGM 0, implicit $vgpr0, implicit $vgpr1, implicit $vgpr2
...

0 comments on commit 549f6a8

Please sign in to comment.