Skip to content

AMDGPU: Stop introducing v_accvgpr_write_b32 for reg-to-reg copy #129059

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

arsenm
Copy link
Contributor

@arsenm arsenm commented Feb 27, 2025

This was trying to hack around the intermediate VGPR requirement
to copy to AGPRs on gfx908. We should still use a copy for all
reg-to-reg cases. This should matter less these days, as we
reserve a VGPR to handle it when required (and no end to end tests
need updating).

This was also an obstacle to handling this fold for input registers
which are larger than 32-bits.

@arsenm arsenm marked this pull request as ready for review February 27, 2025 13:48
@llvmbot
Copy link
Member

llvmbot commented Feb 27, 2025

@llvm/pr-subscribers-backend-amdgpu

Author: Matt Arsenault (arsenm)

Changes

This was trying to hack around the intermediate VGPR requirement
to copy to AGPRs on gfx908. We should still use a copy for all
reg-to-reg cases. This should matter less these days, as we
reserve a VGPR to handle it when required (and no end to end tests
need updating).

This was also an obstacle to handling this fold for input registers
which are larger than 32-bits.


Full diff: https://github.com/llvm/llvm-project/pull/129059.diff

2 Files Affected:

  • (modified) llvm/lib/Target/AMDGPU/SIFoldOperands.cpp (+2-3)
  • (modified) llvm/test/CodeGen/AMDGPU/si-fold-operands-agpr-copy-reg-sequence.mir (+9-9)
diff --git a/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp b/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
index 6cb6863068b5f..eb9aabf8b6317 100644
--- a/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
+++ b/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
@@ -1573,9 +1573,8 @@ bool SIFoldOperandsImpl::foldCopyToAGPRRegSequence(MachineInstr *CopyMI) const {
         Vgpr = MRI->createVirtualRegister(&AMDGPU::VGPR_32RegClass);
         BuildMI(MBB, CopyMI, DL, TII->get(AMDGPU::COPY), Vgpr).add(*Def);
       }
-      auto Tmp = MRI->createVirtualRegister(&AMDGPU::AGPR_32RegClass);
-      BuildMI(MBB, CopyMI, DL, TII->get(AMDGPU::V_ACCVGPR_WRITE_B32_e64), Tmp)
-          .addReg(Vgpr);
+      Register Tmp = MRI->createVirtualRegister(&AMDGPU::AGPR_32RegClass);
+      BuildMI(MBB, CopyMI, DL, TII->get(AMDGPU::COPY), Tmp).addReg(Vgpr);
       B.addReg(Tmp);
     }
 
diff --git a/llvm/test/CodeGen/AMDGPU/si-fold-operands-agpr-copy-reg-sequence.mir b/llvm/test/CodeGen/AMDGPU/si-fold-operands-agpr-copy-reg-sequence.mir
index 95112826b7112..493138c933686 100644
--- a/llvm/test/CodeGen/AMDGPU/si-fold-operands-agpr-copy-reg-sequence.mir
+++ b/llvm/test/CodeGen/AMDGPU/si-fold-operands-agpr-copy-reg-sequence.mir
@@ -206,11 +206,11 @@ body:             |
     ; CHECK-LABEL: name: s_mov_b32_999_splat_sgpr_128_copy_vgpr_copy_agpr
     ; CHECK: [[S_MOV_B32_:%[0-9]+]]:sgpr_32 = S_MOV_B32 999
     ; CHECK-NEXT: [[COPY:%[0-9]+]]:vgpr_32 = COPY [[S_MOV_B32_]]
-    ; CHECK-NEXT: [[V_ACCVGPR_WRITE_B32_e64_:%[0-9]+]]:agpr_32 = V_ACCVGPR_WRITE_B32_e64 [[COPY]], implicit $exec
-    ; CHECK-NEXT: [[V_ACCVGPR_WRITE_B32_e64_1:%[0-9]+]]:agpr_32 = V_ACCVGPR_WRITE_B32_e64 [[COPY]], implicit $exec
-    ; CHECK-NEXT: [[V_ACCVGPR_WRITE_B32_e64_2:%[0-9]+]]:agpr_32 = V_ACCVGPR_WRITE_B32_e64 [[COPY]], implicit $exec
-    ; CHECK-NEXT: [[V_ACCVGPR_WRITE_B32_e64_3:%[0-9]+]]:agpr_32 = V_ACCVGPR_WRITE_B32_e64 [[COPY]], implicit $exec
-    ; CHECK-NEXT: [[REG_SEQUENCE:%[0-9]+]]:areg_128 = REG_SEQUENCE [[V_ACCVGPR_WRITE_B32_e64_]], %subreg.sub0, [[V_ACCVGPR_WRITE_B32_e64_1]], %subreg.sub1, [[V_ACCVGPR_WRITE_B32_e64_2]], %subreg.sub2, [[V_ACCVGPR_WRITE_B32_e64_3]], %subreg.sub3
+    ; CHECK-NEXT: [[COPY1:%[0-9]+]]:agpr_32 = COPY [[COPY]]
+    ; CHECK-NEXT: [[COPY2:%[0-9]+]]:agpr_32 = COPY [[COPY]]
+    ; CHECK-NEXT: [[COPY3:%[0-9]+]]:agpr_32 = COPY [[COPY]]
+    ; CHECK-NEXT: [[COPY4:%[0-9]+]]:agpr_32 = COPY [[COPY]]
+    ; CHECK-NEXT: [[REG_SEQUENCE:%[0-9]+]]:areg_128 = REG_SEQUENCE [[COPY1]], %subreg.sub0, [[COPY2]], %subreg.sub1, [[COPY3]], %subreg.sub2, [[COPY4]], %subreg.sub3
     ; CHECK-NEXT: $agpr0_agpr1_agpr2_agpr3 = COPY [[REG_SEQUENCE]]
     ; CHECK-NEXT: S_ENDPGM 0, implicit $agpr0_agpr1_agpr2_agpr3
     %0:sgpr_32 = S_MOV_B32 999
@@ -232,10 +232,10 @@ body:             |
     ; CHECK-NEXT: [[S_MOV_B32_1:%[0-9]+]]:sgpr_32 = S_MOV_B32 1
     ; CHECK-NEXT: [[V_ACCVGPR_WRITE_B32_e64_:%[0-9]+]]:agpr_32 = V_ACCVGPR_WRITE_B32_e64 1, implicit $exec
     ; CHECK-NEXT: [[COPY:%[0-9]+]]:vgpr_32 = COPY [[S_MOV_B32_]]
-    ; CHECK-NEXT: [[V_ACCVGPR_WRITE_B32_e64_1:%[0-9]+]]:agpr_32 = V_ACCVGPR_WRITE_B32_e64 [[COPY]], implicit $exec
-    ; CHECK-NEXT: [[V_ACCVGPR_WRITE_B32_e64_2:%[0-9]+]]:agpr_32 = V_ACCVGPR_WRITE_B32_e64 [[COPY]], implicit $exec
-    ; CHECK-NEXT: [[V_ACCVGPR_WRITE_B32_e64_3:%[0-9]+]]:agpr_32 = V_ACCVGPR_WRITE_B32_e64 1, implicit $exec
-    ; CHECK-NEXT: [[REG_SEQUENCE:%[0-9]+]]:areg_128 = REG_SEQUENCE [[V_ACCVGPR_WRITE_B32_e64_]], %subreg.sub0, [[V_ACCVGPR_WRITE_B32_e64_1]], %subreg.sub1, [[V_ACCVGPR_WRITE_B32_e64_2]], %subreg.sub2, [[V_ACCVGPR_WRITE_B32_e64_3]], %subreg.sub3
+    ; CHECK-NEXT: [[COPY1:%[0-9]+]]:agpr_32 = COPY [[COPY]]
+    ; CHECK-NEXT: [[COPY2:%[0-9]+]]:agpr_32 = COPY [[COPY]]
+    ; CHECK-NEXT: [[V_ACCVGPR_WRITE_B32_e64_1:%[0-9]+]]:agpr_32 = V_ACCVGPR_WRITE_B32_e64 1, implicit $exec
+    ; CHECK-NEXT: [[REG_SEQUENCE:%[0-9]+]]:areg_128 = REG_SEQUENCE [[V_ACCVGPR_WRITE_B32_e64_]], %subreg.sub0, [[COPY1]], %subreg.sub1, [[COPY2]], %subreg.sub2, [[V_ACCVGPR_WRITE_B32_e64_1]], %subreg.sub3
     ; CHECK-NEXT: $agpr0_agpr1_agpr2_agpr3 = COPY [[REG_SEQUENCE]]
     ; CHECK-NEXT: S_ENDPGM 0, implicit $agpr0_agpr1_agpr2_agpr3
     %0:sgpr_32 = S_MOV_B32 999

@arsenm arsenm force-pushed the users/arsenm/amdgpu/stop-introducing-accvpr-write-for-reg-copy branch from 12e4b8a to b221f64 Compare February 27, 2025 15:38
@arsenm arsenm force-pushed the users/arsenm/add-tests-agpr-reg-sequence-fold branch from 3fe0c48 to a20c838 Compare February 27, 2025 15:39
Copy link
Collaborator

@rampitec rampitec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@arsenm arsenm force-pushed the users/arsenm/add-tests-agpr-reg-sequence-fold branch from a20c838 to 60ac34e Compare March 3, 2025 03:02
@arsenm arsenm force-pushed the users/arsenm/amdgpu/stop-introducing-accvpr-write-for-reg-copy branch from b221f64 to 2340ede Compare March 3, 2025 03:02
Copy link
Contributor Author

arsenm commented Mar 3, 2025

Merge activity

  • Mar 3, 4:13 AM EST: A user started a stack merge that includes this pull request via Graphite.
  • Mar 3, 4:19 AM EST: Graphite rebased this pull request as part of a merge.
  • Mar 3, 4:22 AM EST: A user merged this pull request with Graphite.

@arsenm arsenm force-pushed the users/arsenm/add-tests-agpr-reg-sequence-fold branch from 60ac34e to da35f84 Compare March 3, 2025 09:15
Base automatically changed from users/arsenm/add-tests-agpr-reg-sequence-fold to main March 3, 2025 09:19
This was trying to hack around the intermediate VGPR requirement
to copy to AGPRs on gfx908. We should still use a copy for all
reg-to-reg cases. This should matter less these days, as we
reserve a VGPR to handle it when required (and no end to end tests
need updating).

This was also an obstacle to handling this fold for input registers
which are larger than 32-bits.
@arsenm arsenm force-pushed the users/arsenm/amdgpu/stop-introducing-accvpr-write-for-reg-copy branch from 2340ede to f3d76df Compare March 3, 2025 09:19
@arsenm arsenm merged commit 49a533a into main Mar 3, 2025
6 of 10 checks passed
@arsenm arsenm deleted the users/arsenm/amdgpu/stop-introducing-accvpr-write-for-reg-copy branch March 3, 2025 09:22
jph-13 pushed a commit to jph-13/llvm-project that referenced this pull request Mar 21, 2025
…m#129059)

This was trying to hack around the intermediate VGPR requirement
to copy to AGPRs on gfx908. We should still use a copy for all
reg-to-reg cases. This should matter less these days, as we
reserve a VGPR to handle it when required (and no end to end tests
need updating).

This was also an obstacle to handling this fold for input registers
which are larger than 32-bits.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants