-
Notifications
You must be signed in to change notification settings - Fork 13.6k
[AMDGPU] Fix SIFoldOperandsImpl::canUseImmWithOpSel() for VOP3 packed {B}F16 imms. #142142
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
… {B}F16 imms. VOP 3 instructions ignore opsel source modifiers, so a constant that contains two different {B}F16 imms cannot be encoded into instruction with an src opsel. E.g. without the fix the following instructions s_mov_b32 s0, 0x40003c00 // <half 1.0, half 2.0> v_cvt_scalef32_pk_fp8_f16 v0, s0, v2 loose 2.0 imm and are folded into v_cvt_scalef32_pk_fp8_f16 v1, 1.0, 1.0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR addresses an issue with VOP3 packed instructions where distinct half-precision immediate values were improperly folded, leading to incorrect constant encodings.
- Added multiple test cases for different conversion instructions to verify correct handling of immediate values.
- Updated SIFoldOperandsImpl::canUseImmWithOpSel to reject immediates with differing high and low half-words.
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
File | Description |
---|---|
llvm/test/CodeGen/AMDGPU/llvm.amdgcn.cvt.scalef32.pk.gfx950.ll | Added test functions to verify immediate encoding in various scenarios. |
llvm/lib/Target/AMDGPU/SIFoldOperands.cpp | Modified operand folding logic to ensure distinct half values in immediates are not erroneously folded. |
Comments suppressed due to low confidence (1)
llvm/lib/Target/AMDGPU/SIFoldOperands.cpp:380
- Consider adding explicit parentheses around the shift operation for clarity (e.g., static_cast<uint16_t>(Fold.ImmToFold >> 16)) to ensure the intended operator precedence is clear to future maintainers.
static_cast<uint16_t>(Fold.ImmToFold) != static_cast<uint16_t>(Fold.ImmToFold >> 16)
@llvm/pr-subscribers-backend-amdgpu Author: Daniil Fukalov (dfukalov) ChangesVOP3 instructions ignore opsel source modifiers, so a constant that contains two different {B}F16 imms cannot be encoded into instruction with an src opsel. E.g. without the fix the following instructions
lose
Fixes SWDEV-531672 Full diff: https://github.com/llvm/llvm-project/pull/142142.diff 2 Files Affected:
diff --git a/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp b/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
index cc18d6b4aba10..4f51febe8f92e 100644
--- a/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
+++ b/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
@@ -374,6 +374,12 @@ bool SIFoldOperandsImpl::canUseImmWithOpSel(FoldCandidate &Fold) const {
case AMDGPU::OPERAND_REG_INLINE_C_V2FP16:
case AMDGPU::OPERAND_REG_INLINE_C_V2BF16:
case AMDGPU::OPERAND_REG_INLINE_C_V2INT16:
+ // VOP3 packed instructions ignore op_sel source modifiers, we cannot encode
+ // two different constants.
+ if ((TSFlags & SIInstrFlags::VOP3) && !(TSFlags & SIInstrFlags::VOP3P) &&
+ static_cast<uint16_t>(Fold.ImmToFold) !=
+ static_cast<uint16_t>(Fold.ImmToFold >> 16))
+ return false;
break;
}
diff --git a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.cvt.scalef32.pk.gfx950.ll b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.cvt.scalef32.pk.gfx950.ll
index 1ab27337632b6..4b113d80dd0e9 100644
--- a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.cvt.scalef32.pk.gfx950.ll
+++ b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.cvt.scalef32.pk.gfx950.ll
@@ -601,6 +601,34 @@ define <2 x i16> @test_cvt_scalef32_pk_fp8_f16_word1(<2 x i16> %old, <2 x half>
ret <2 x i16> %ret
}
+define <2 x i16> @test_cvt_scalef32_pk_fp8_f16_imm1(<2 x i16> %old, float %scale) {
+; GCN-LABEL: test_cvt_scalef32_pk_fp8_f16_imm1:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_cvt_scalef32_pk_fp8_f16 v0, 4.0, v1
+; GCN-NEXT: s_setpc_b64 s[30:31]
+ %ret = tail call <2 x i16> @llvm.amdgcn.cvt.scalef32.pk.fp8.f16(<2 x i16> %old, <2 x half> <half 4.0, half 4.0>, float %scale, i1 false)
+ ret <2 x i16> %ret
+}
+
+define <2 x i16> @test_cvt_scalef32_pk_fp8_f16_imm2(<2 x i16> %old, float %scale) {
+; GFX950-SDAG-LABEL: test_cvt_scalef32_pk_fp8_f16_imm2:
+; GFX950-SDAG: ; %bb.0:
+; GFX950-SDAG-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX950-SDAG-NEXT: s_mov_b32 s0, 0x40004400
+; GFX950-SDAG-NEXT: v_cvt_scalef32_pk_fp8_f16 v0, s0, v1
+; GFX950-SDAG-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX950-GISEL-LABEL: test_cvt_scalef32_pk_fp8_f16_imm2:
+; GFX950-GISEL: ; %bb.0:
+; GFX950-GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX950-GISEL-NEXT: v_mov_b32_e32 v2, 0x40004400
+; GFX950-GISEL-NEXT: v_cvt_scalef32_pk_fp8_f16 v0, v2, v1
+; GFX950-GISEL-NEXT: s_setpc_b64 s[30:31]
+ %ret = tail call <2 x i16> @llvm.amdgcn.cvt.scalef32.pk.fp8.f16(<2 x i16> %old, <2 x half> <half 4.0, half 2.0>, float %scale, i1 false)
+ ret <2 x i16> %ret
+}
+
define <2 x i16> @test_cvt_scalef32_pk_fp8_bf16_word0(<2 x i16> %old, <2 x bfloat> %src, float %scale) {
; GCN-LABEL: test_cvt_scalef32_pk_fp8_bf16_word0:
; GCN: ; %bb.0:
@@ -621,6 +649,27 @@ define <2 x i16> @test_cvt_scalef32_pk_fp8_bf16_word1(<2 x i16> %old, <2 x bfloa
ret <2 x i16> %ret
}
+define <2 x i16> @test_cvt_scalef32_pk_fp8_bf16_imm1(<2 x i16> %old, float %scale) {
+; GCN-LABEL: test_cvt_scalef32_pk_fp8_bf16_imm1:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_cvt_scalef32_pk_fp8_bf16 v0, 4.0, v1
+; GCN-NEXT: s_setpc_b64 s[30:31]
+ %ret = tail call <2 x i16> @llvm.amdgcn.cvt.scalef32.pk.fp8.bf16(<2 x i16> %old, <2 x bfloat> <bfloat 4.0, bfloat 4.0>, float %scale, i1 false)
+ ret <2 x i16> %ret
+}
+
+define <2 x i16> @test_cvt_scalef32_pk_fp8_bf16_imm2(<2 x i16> %old, float %scale) {
+; GCN-LABEL: test_cvt_scalef32_pk_fp8_bf16_imm2:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: s_mov_b32 s0, 0x40004080
+; GCN-NEXT: v_cvt_scalef32_pk_fp8_bf16 v0, s0, v1
+; GCN-NEXT: s_setpc_b64 s[30:31]
+ %ret = tail call <2 x i16> @llvm.amdgcn.cvt.scalef32.pk.fp8.bf16(<2 x i16> %old, <2 x bfloat> <bfloat 4.0, bfloat 2.0>, float %scale, i1 false)
+ ret <2 x i16> %ret
+}
+
define <2 x i16> @test_cvt_scalef32_pk_bf8_f16_word0(<2 x i16> %old, <2 x half> %src, float %scale) {
; GCN-LABEL: test_cvt_scalef32_pk_bf8_f16_word0:
; GCN: ; %bb.0:
@@ -641,6 +690,34 @@ define <2 x i16> @test_cvt_scalef32_pk_bf8_f16_word1(<2 x i16> %old, <2 x half>
ret <2 x i16> %ret
}
+define <2 x i16> @test_cvt_scalef32_pk_bf8_f16_imm1(<2 x i16> %old, float %scale) {
+; GCN-LABEL: test_cvt_scalef32_pk_bf8_f16_imm1:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_cvt_scalef32_pk_bf8_f16 v0, 4.0, v1
+; GCN-NEXT: s_setpc_b64 s[30:31]
+ %ret = tail call <2 x i16> @llvm.amdgcn.cvt.scalef32.pk.bf8.f16(<2 x i16> %old, <2 x half> <half 4.0, half 4.0>, float %scale, i1 false)
+ ret <2 x i16> %ret
+}
+
+define <2 x i16> @test_cvt_scalef32_pk_bf8_f16_imm2(<2 x i16> %old, float %scale) {
+; GFX950-SDAG-LABEL: test_cvt_scalef32_pk_bf8_f16_imm2:
+; GFX950-SDAG: ; %bb.0:
+; GFX950-SDAG-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX950-SDAG-NEXT: s_mov_b32 s0, 0x40004400
+; GFX950-SDAG-NEXT: v_cvt_scalef32_pk_bf8_f16 v0, s0, v1
+; GFX950-SDAG-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX950-GISEL-LABEL: test_cvt_scalef32_pk_bf8_f16_imm2:
+; GFX950-GISEL: ; %bb.0:
+; GFX950-GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX950-GISEL-NEXT: v_mov_b32_e32 v2, 0x40004400
+; GFX950-GISEL-NEXT: v_cvt_scalef32_pk_bf8_f16 v0, v2, v1
+; GFX950-GISEL-NEXT: s_setpc_b64 s[30:31]
+ %ret = tail call <2 x i16> @llvm.amdgcn.cvt.scalef32.pk.bf8.f16(<2 x i16> %old, <2 x half> <half 4.0, half 2.0>, float %scale, i1 false)
+ ret <2 x i16> %ret
+}
+
define <2 x i16> @test_cvt_scalef32_pk_bf8_bf16_word0(<2 x i16> %old, <2 x bfloat> %src, float %scale) {
; GCN-LABEL: test_cvt_scalef32_pk_bf8_bf16_word0:
; GCN: ; %bb.0:
@@ -661,6 +738,27 @@ define <2 x i16> @test_cvt_scalef32_pk_bf8_bf16_word1(<2 x i16> %old, <2 x bfloa
ret <2 x i16> %ret
}
+define <2 x i16> @test_cvt_scalef32_pk_bf8_bf16_imm1(<2 x i16> %old, float %scale) {
+; GCN-LABEL: test_cvt_scalef32_pk_bf8_bf16_imm1:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_cvt_scalef32_pk_bf8_bf16 v0, 4.0, v1
+; GCN-NEXT: s_setpc_b64 s[30:31]
+ %ret = tail call <2 x i16> @llvm.amdgcn.cvt.scalef32.pk.bf8.bf16(<2 x i16> %old, <2 x bfloat> <bfloat 4.0, bfloat 4.0>, float %scale, i1 false)
+ ret <2 x i16> %ret
+}
+
+define <2 x i16> @test_cvt_scalef32_pk_bf8_bf16_imm2(<2 x i16> %old, float %scale) {
+; GCN-LABEL: test_cvt_scalef32_pk_bf8_bf16_imm2:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: s_mov_b32 s0, 0x40004080
+; GCN-NEXT: v_cvt_scalef32_pk_bf8_bf16 v0, s0, v1
+; GCN-NEXT: s_setpc_b64 s[30:31]
+ %ret = tail call <2 x i16> @llvm.amdgcn.cvt.scalef32.pk.bf8.bf16(<2 x i16> %old, <2 x bfloat> <bfloat 4.0, bfloat 2.0>, float %scale, i1 false)
+ ret <2 x i16> %ret
+}
+
define <2 x float> @test_cvt_scale_f32_fp4_byte0(i32 %src, float %scale) {
; GCN-LABEL: test_cvt_scale_f32_fp4_byte0:
; GCN: ; %bb.0:
@@ -1236,6 +1334,37 @@ define i32 @test_cvt_scalef32_fp4_f16_byte3(<2 x half> %src0, float %scale, i32
ret i32 %ret
}
+define i32 @test_cvt_scalef32_fp4_f16_imm1(float %scale, i32 %old) {
+; GCN-LABEL: test_cvt_scalef32_fp4_f16_imm1:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_cvt_scalef32_pk_fp4_f16 v1, 4.0, v0
+; GCN-NEXT: v_mov_b32_e32 v0, v1
+; GCN-NEXT: s_setpc_b64 s[30:31]
+ %ret = tail call i32 @llvm.amdgcn.cvt.scalef32.pk.fp4.f16(i32 %old, <2 x half> <half 4.0, half 4.0>, float %scale, i32 0)
+ ret i32 %ret
+}
+
+define i32 @test_cvt_scalef32_fp4_f16_imm2(float %scale, i32 %old) {
+; GFX950-SDAG-LABEL: test_cvt_scalef32_fp4_f16_imm2:
+; GFX950-SDAG: ; %bb.0:
+; GFX950-SDAG-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX950-SDAG-NEXT: s_mov_b32 s0, 0x40004400
+; GFX950-SDAG-NEXT: v_cvt_scalef32_pk_fp4_f16 v1, s0, v0
+; GFX950-SDAG-NEXT: v_mov_b32_e32 v0, v1
+; GFX950-SDAG-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX950-GISEL-LABEL: test_cvt_scalef32_fp4_f16_imm2:
+; GFX950-GISEL: ; %bb.0:
+; GFX950-GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX950-GISEL-NEXT: v_mov_b32_e32 v2, 0x40004400
+; GFX950-GISEL-NEXT: v_cvt_scalef32_pk_fp4_f16 v1, v2, v0
+; GFX950-GISEL-NEXT: v_mov_b32_e32 v0, v1
+; GFX950-GISEL-NEXT: s_setpc_b64 s[30:31]
+ %ret = tail call i32 @llvm.amdgcn.cvt.scalef32.pk.fp4.f16(i32 %old, <2 x half> <half 4.0, half 2.0>, float %scale, i32 0)
+ ret i32 %ret
+}
+
define i32 @test_cvt_scalef32_fp4_bf16_byte0(<2 x bfloat> %src0, float %scale, i32 %old) {
; GCN-LABEL: test_cvt_scalef32_fp4_bf16_byte0:
; GCN: ; %bb.0:
@@ -1283,6 +1412,29 @@ define i32 @test_cvt_scalef32_fp4_bf16_byte3(<2 x bfloat> %src0, float %scale, i
ret i32 %ret
}
+define i32 @test_cvt_scalef32_fp4_bf16_imm1(float %scale, i32 %old) {
+; GCN-LABEL: test_cvt_scalef32_fp4_bf16_imm1:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_cvt_scalef32_pk_fp4_bf16 v1, 4.0, v0
+; GCN-NEXT: v_mov_b32_e32 v0, v1
+; GCN-NEXT: s_setpc_b64 s[30:31]
+ %ret = tail call i32 @llvm.amdgcn.cvt.scalef32.pk.fp4.bf16(i32 %old, <2 x bfloat> <bfloat 4.0, bfloat 4.0>, float %scale, i32 0)
+ ret i32 %ret
+}
+
+define i32 @test_cvt_scalef32_fp4_bf16_imm2(float %scale, i32 %old) {
+; GCN-LABEL: test_cvt_scalef32_fp4_bf16_imm2:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: s_mov_b32 s0, 0x40004080
+; GCN-NEXT: v_cvt_scalef32_pk_fp4_bf16 v1, s0, v0
+; GCN-NEXT: v_mov_b32_e32 v0, v1
+; GCN-NEXT: s_setpc_b64 s[30:31]
+ %ret = tail call i32 @llvm.amdgcn.cvt.scalef32.pk.fp4.bf16(i32 %old, <2 x bfloat> <bfloat 4.0, bfloat 2.0>, float %scale, i32 0)
+ ret i32 %ret
+}
+
define amdgpu_ps void @test_scalef32_pk32_fp6_f32_vv_inreg_src(<16 x float> inreg %src, float %scale, ptr addrspace(1) %out) {
; GFX950-SDAG-LABEL: test_scalef32_pk32_fp6_f32_vv_inreg_src:
; GFX950-SDAG: ; %bb.0:
|
VOP3 instructions ignore opsel source modifiers, so a constant that contains two different {B}F16 imms cannot be encoded into instruction with an src opsel.
E.g. without the fix the following instructions
s_mov_b32 s0, 0x40003c00 // <half 1.0, half 2.0>
v_cvt_scalef32_pk_fp8_f16 v0, s0, v2
lose
2.0
imm and are folded intov_cvt_scalef32_pk_fp8_f16 v1, 1.0, 1.0
Fixes SWDEV-531672