AMDGPU: Stop introducing v_accvgpr_write_b32 for reg-to-reg copy #129059

arsenm · 2025-02-27T13:47:12Z

This was trying to hack around the intermediate VGPR requirement
to copy to AGPRs on gfx908. We should still use a copy for all
reg-to-reg cases. This should matter less these days, as we
reserve a VGPR to handle it when required (and no end to end tests
need updating).

This was also an obstacle to handling this fold for input registers
which are larger than 32-bits.

arsenm · 2025-02-27T13:47:27Z

This stack of pull requests is managed by Graphite. Learn more about stacking.

llvmbot · 2025-02-27T13:48:30Z

@llvm/pr-subscribers-backend-amdgpu

Author: Matt Arsenault (arsenm)

Changes

This was trying to hack around the intermediate VGPR requirement
to copy to AGPRs on gfx908. We should still use a copy for all
reg-to-reg cases. This should matter less these days, as we
reserve a VGPR to handle it when required (and no end to end tests
need updating).

This was also an obstacle to handling this fold for input registers
which are larger than 32-bits.

Full diff: https://github.com/llvm/llvm-project/pull/129059.diff

2 Files Affected:

(modified) llvm/lib/Target/AMDGPU/SIFoldOperands.cpp (+2-3)
(modified) llvm/test/CodeGen/AMDGPU/si-fold-operands-agpr-copy-reg-sequence.mir (+9-9)

diff --git a/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp b/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
index 6cb6863068b5f..eb9aabf8b6317 100644
--- a/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
+++ b/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
@@ -1573,9 +1573,8 @@ bool SIFoldOperandsImpl::foldCopyToAGPRRegSequence(MachineInstr *CopyMI) const {
         Vgpr = MRI->createVirtualRegister(&AMDGPU::VGPR_32RegClass);
         BuildMI(MBB, CopyMI, DL, TII->get(AMDGPU::COPY), Vgpr).add(*Def);
       }
-      auto Tmp = MRI->createVirtualRegister(&AMDGPU::AGPR_32RegClass);
-      BuildMI(MBB, CopyMI, DL, TII->get(AMDGPU::V_ACCVGPR_WRITE_B32_e64), Tmp)
-          .addReg(Vgpr);
+      Register Tmp = MRI->createVirtualRegister(&AMDGPU::AGPR_32RegClass);
+      BuildMI(MBB, CopyMI, DL, TII->get(AMDGPU::COPY), Tmp).addReg(Vgpr);
       B.addReg(Tmp);
     }
 
diff --git a/llvm/test/CodeGen/AMDGPU/si-fold-operands-agpr-copy-reg-sequence.mir b/llvm/test/CodeGen/AMDGPU/si-fold-operands-agpr-copy-reg-sequence.mir
index 95112826b7112..493138c933686 100644
--- a/llvm/test/CodeGen/AMDGPU/si-fold-operands-agpr-copy-reg-sequence.mir
+++ b/llvm/test/CodeGen/AMDGPU/si-fold-operands-agpr-copy-reg-sequence.mir
@@ -206,11 +206,11 @@ body:             |
     ; CHECK-LABEL: name: s_mov_b32_999_splat_sgpr_128_copy_vgpr_copy_agpr
     ; CHECK: [[S_MOV_B32_:%[0-9]+]]:sgpr_32 = S_MOV_B32 999
     ; CHECK-NEXT: [[COPY:%[0-9]+]]:vgpr_32 = COPY [[S_MOV_B32_]]
-    ; CHECK-NEXT: [[V_ACCVGPR_WRITE_B32_e64_:%[0-9]+]]:agpr_32 = V_ACCVGPR_WRITE_B32_e64 [[COPY]], implicit $exec
-    ; CHECK-NEXT: [[V_ACCVGPR_WRITE_B32_e64_1:%[0-9]+]]:agpr_32 = V_ACCVGPR_WRITE_B32_e64 [[COPY]], implicit $exec
-    ; CHECK-NEXT: [[V_ACCVGPR_WRITE_B32_e64_2:%[0-9]+]]:agpr_32 = V_ACCVGPR_WRITE_B32_e64 [[COPY]], implicit $exec
-    ; CHECK-NEXT: [[V_ACCVGPR_WRITE_B32_e64_3:%[0-9]+]]:agpr_32 = V_ACCVGPR_WRITE_B32_e64 [[COPY]], implicit $exec
-    ; CHECK-NEXT: [[REG_SEQUENCE:%[0-9]+]]:areg_128 = REG_SEQUENCE [[V_ACCVGPR_WRITE_B32_e64_]], %subreg.sub0, [[V_ACCVGPR_WRITE_B32_e64_1]], %subreg.sub1, [[V_ACCVGPR_WRITE_B32_e64_2]], %subreg.sub2, [[V_ACCVGPR_WRITE_B32_e64_3]], %subreg.sub3
+    ; CHECK-NEXT: [[COPY1:%[0-9]+]]:agpr_32 = COPY [[COPY]]
+    ; CHECK-NEXT: [[COPY2:%[0-9]+]]:agpr_32 = COPY [[COPY]]
+    ; CHECK-NEXT: [[COPY3:%[0-9]+]]:agpr_32 = COPY [[COPY]]
+    ; CHECK-NEXT: [[COPY4:%[0-9]+]]:agpr_32 = COPY [[COPY]]
+    ; CHECK-NEXT: [[REG_SEQUENCE:%[0-9]+]]:areg_128 = REG_SEQUENCE [[COPY1]], %subreg.sub0, [[COPY2]], %subreg.sub1, [[COPY3]], %subreg.sub2, [[COPY4]], %subreg.sub3
     ; CHECK-NEXT: $agpr0_agpr1_agpr2_agpr3 = COPY [[REG_SEQUENCE]]
     ; CHECK-NEXT: S_ENDPGM 0, implicit $agpr0_agpr1_agpr2_agpr3
     %0:sgpr_32 = S_MOV_B32 999
@@ -232,10 +232,10 @@ body:             |
     ; CHECK-NEXT: [[S_MOV_B32_1:%[0-9]+]]:sgpr_32 = S_MOV_B32 1
     ; CHECK-NEXT: [[V_ACCVGPR_WRITE_B32_e64_:%[0-9]+]]:agpr_32 = V_ACCVGPR_WRITE_B32_e64 1, implicit $exec
     ; CHECK-NEXT: [[COPY:%[0-9]+]]:vgpr_32 = COPY [[S_MOV_B32_]]
-    ; CHECK-NEXT: [[V_ACCVGPR_WRITE_B32_e64_1:%[0-9]+]]:agpr_32 = V_ACCVGPR_WRITE_B32_e64 [[COPY]], implicit $exec
-    ; CHECK-NEXT: [[V_ACCVGPR_WRITE_B32_e64_2:%[0-9]+]]:agpr_32 = V_ACCVGPR_WRITE_B32_e64 [[COPY]], implicit $exec
-    ; CHECK-NEXT: [[V_ACCVGPR_WRITE_B32_e64_3:%[0-9]+]]:agpr_32 = V_ACCVGPR_WRITE_B32_e64 1, implicit $exec
-    ; CHECK-NEXT: [[REG_SEQUENCE:%[0-9]+]]:areg_128 = REG_SEQUENCE [[V_ACCVGPR_WRITE_B32_e64_]], %subreg.sub0, [[V_ACCVGPR_WRITE_B32_e64_1]], %subreg.sub1, [[V_ACCVGPR_WRITE_B32_e64_2]], %subreg.sub2, [[V_ACCVGPR_WRITE_B32_e64_3]], %subreg.sub3
+    ; CHECK-NEXT: [[COPY1:%[0-9]+]]:agpr_32 = COPY [[COPY]]
+    ; CHECK-NEXT: [[COPY2:%[0-9]+]]:agpr_32 = COPY [[COPY]]
+    ; CHECK-NEXT: [[V_ACCVGPR_WRITE_B32_e64_1:%[0-9]+]]:agpr_32 = V_ACCVGPR_WRITE_B32_e64 1, implicit $exec
+    ; CHECK-NEXT: [[REG_SEQUENCE:%[0-9]+]]:areg_128 = REG_SEQUENCE [[V_ACCVGPR_WRITE_B32_e64_]], %subreg.sub0, [[COPY1]], %subreg.sub1, [[COPY2]], %subreg.sub2, [[V_ACCVGPR_WRITE_B32_e64_1]], %subreg.sub3
     ; CHECK-NEXT: $agpr0_agpr1_agpr2_agpr3 = COPY [[REG_SEQUENCE]]
     ; CHECK-NEXT: S_ENDPGM 0, implicit $agpr0_agpr1_agpr2_agpr3
     %0:sgpr_32 = S_MOV_B32 999

rampitec

LGTM

arsenm · 2025-03-03T09:13:50Z

Merge activity

Mar 3, 4:13 AM EST: A user started a stack merge that includes this pull request via Graphite.
Mar 3, 4:19 AM EST: Graphite rebased this pull request as part of a merge.
Mar 3, 4:22 AM EST: A user merged this pull request with Graphite.

This was trying to hack around the intermediate VGPR requirement to copy to AGPRs on gfx908. We should still use a copy for all reg-to-reg cases. This should matter less these days, as we reserve a VGPR to handle it when required (and no end to end tests need updating). This was also an obstacle to handling this fold for input registers which are larger than 32-bits.

…m#129059) This was trying to hack around the intermediate VGPR requirement to copy to AGPRs on gfx908. We should still use a copy for all reg-to-reg cases. This should matter less these days, as we reserve a VGPR to handle it when required (and no end to end tests need updating). This was also an obstacle to handling this fold for input registers which are larger than 32-bits.

This was referenced Feb 27, 2025

AMDGPU: Use helper function for use/def chain walk #129052

Merged

AMDGPU: Add mir test for agpr constant reg_sequence handling #129058

Merged

arsenm added the backend:AMDGPU label Feb 27, 2025 — with Graphite App

arsenm requested review from jrbyrnes, kerbowa, Pierre-vh, rampitec and JanekvO February 27, 2025 13:48

arsenm marked this pull request as ready for review February 27, 2025 13:48

Pierre-vh approved these changes Feb 27, 2025

View reviewed changes

arsenm force-pushed the users/arsenm/amdgpu/stop-introducing-accvpr-write-for-reg-copy branch from 12e4b8a to b221f64 Compare February 27, 2025 15:38

arsenm force-pushed the users/arsenm/add-tests-agpr-reg-sequence-fold branch from 3fe0c48 to a20c838 Compare February 27, 2025 15:39

rampitec approved these changes Feb 27, 2025

View reviewed changes

arsenm force-pushed the users/arsenm/add-tests-agpr-reg-sequence-fold branch from a20c838 to 60ac34e Compare March 3, 2025 03:02

arsenm force-pushed the users/arsenm/amdgpu/stop-introducing-accvpr-write-for-reg-copy branch from b221f64 to 2340ede Compare March 3, 2025 03:02

This was referenced Mar 3, 2025

AMDGPU: Try to perform copy to agpr from reg_sequence at the copy #129463

Merged

DAG: Use phi to create vregs instead of the constant input #129464

Merged

arsenm force-pushed the users/arsenm/add-tests-agpr-reg-sequence-fold branch from 60ac34e to da35f84 Compare March 3, 2025 09:15

Base automatically changed from users/arsenm/add-tests-agpr-reg-sequence-fold to main March 3, 2025 09:19

arsenm force-pushed the users/arsenm/amdgpu/stop-introducing-accvpr-write-for-reg-copy branch from 2340ede to f3d76df Compare March 3, 2025 09:19

arsenm merged commit 49a533a into main Mar 3, 2025
6 of 10 checks passed

arsenm deleted the users/arsenm/amdgpu/stop-introducing-accvpr-write-for-reg-copy branch March 3, 2025 09:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

AMDGPU: Stop introducing v_accvgpr_write_b32 for reg-to-reg copy #129059

AMDGPU: Stop introducing v_accvgpr_write_b32 for reg-to-reg copy #129059

Uh oh!

arsenm commented Feb 27, 2025

Uh oh!

arsenm commented Feb 27, 2025 •

edited

Loading

Uh oh!

llvmbot commented Feb 27, 2025

Uh oh!

rampitec left a comment

Uh oh!

arsenm commented Mar 3, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

AMDGPU: Stop introducing v_accvgpr_write_b32 for reg-to-reg copy #129059

AMDGPU: Stop introducing v_accvgpr_write_b32 for reg-to-reg copy #129059

Uh oh!

Conversation

arsenm commented Feb 27, 2025

Uh oh!

arsenm commented Feb 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Feb 27, 2025

Uh oh!

rampitec left a comment

Choose a reason for hiding this comment

Uh oh!

arsenm commented Mar 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merge activity

Uh oh!

Uh oh!

Uh oh!

arsenm commented Feb 27, 2025 •

edited

Loading

arsenm commented Mar 3, 2025 •

edited

Loading