[AMDGPU] Fix SIFoldOperandsImpl::canUseImmWithOpSel() for VOP3 packed {B}F16 imms. #142142

dfukalov · 2025-05-30T13:11:13Z

VOP3 instructions ignore opsel source modifiers, so a constant that contains two different {B}F16 imms cannot be encoded into instruction with an src opsel.

E.g. without the fix the following instructions

s_mov_b32 s0, 0x40003c00 // <half 1.0, half 2.0>
v_cvt_scalef32_pk_fp8_f16 v0, s0, v2

lose 2.0 imm and are folded into

v_cvt_scalef32_pk_fp8_f16 v1, 1.0, 1.0

Fixes SWDEV-531672

… {B}F16 imms. VOP 3 instructions ignore opsel source modifiers, so a constant that contains two different {B}F16 imms cannot be encoded into instruction with an src opsel. E.g. without the fix the following instructions s_mov_b32 s0, 0x40003c00 // <half 1.0, half 2.0> v_cvt_scalef32_pk_fp8_f16 v0, s0, v2 loose 2.0 imm and are folded into v_cvt_scalef32_pk_fp8_f16 v1, 1.0, 1.0

Copilot

Pull Request Overview

This PR addresses an issue with VOP3 packed instructions where distinct half-precision immediate values were improperly folded, leading to incorrect constant encodings.

Added multiple test cases for different conversion instructions to verify correct handling of immediate values.
Updated SIFoldOperandsImpl::canUseImmWithOpSel to reject immediates with differing high and low half-words.

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File	Description
llvm/test/CodeGen/AMDGPU/llvm.amdgcn.cvt.scalef32.pk.gfx950.ll	Added test functions to verify immediate encoding in various scenarios.
llvm/lib/Target/AMDGPU/SIFoldOperands.cpp	Modified operand folding logic to ensure distinct half values in immediates are not erroneously folded.

Comments suppressed due to low confidence (1)

llvm/lib/Target/AMDGPU/SIFoldOperands.cpp:380

Consider adding explicit parentheses around the shift operation for clarity (e.g., static_cast<uint16_t>(Fold.ImmToFold >> 16)) to ensure the intended operator precedence is clear to future maintainers.

static_cast<uint16_t>(Fold.ImmToFold) != static_cast<uint16_t>(Fold.ImmToFold >> 16)

llvmbot · 2025-05-30T13:11:48Z

@llvm/pr-subscribers-backend-amdgpu

Author: Daniil Fukalov (dfukalov)

Changes

VOP3 instructions ignore opsel source modifiers, so a constant that contains two different {B}F16 imms cannot be encoded into instruction with an src opsel.

E.g. without the fix the following instructions

s_mov_b32 s0, 0x40003c00 // <half 1.0, half 2.0>
v_cvt_scalef32_pk_fp8_f16 v0, s0, v2

lose 2.0 imm and are folded into

v_cvt_scalef32_pk_fp8_f16 v1, 1.0, 1.0

Fixes SWDEV-531672

Full diff: https://github.com/llvm/llvm-project/pull/142142.diff

2 Files Affected:

(modified) llvm/lib/Target/AMDGPU/SIFoldOperands.cpp (+6)
(modified) llvm/test/CodeGen/AMDGPU/llvm.amdgcn.cvt.scalef32.pk.gfx950.ll (+152)

diff --git a/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp b/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
index cc18d6b4aba10..4f51febe8f92e 100644
--- a/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
+++ b/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
@@ -374,6 +374,12 @@ bool SIFoldOperandsImpl::canUseImmWithOpSel(FoldCandidate &Fold) const {
   case AMDGPU::OPERAND_REG_INLINE_C_V2FP16:
   case AMDGPU::OPERAND_REG_INLINE_C_V2BF16:
   case AMDGPU::OPERAND_REG_INLINE_C_V2INT16:
+    // VOP3 packed instructions ignore op_sel source modifiers, we cannot encode
+    // two different constants.
+    if ((TSFlags & SIInstrFlags::VOP3) && !(TSFlags & SIInstrFlags::VOP3P) &&
+        static_cast<uint16_t>(Fold.ImmToFold) !=
+            static_cast<uint16_t>(Fold.ImmToFold >> 16))
+      return false;
     break;
   }
 
diff --git a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.cvt.scalef32.pk.gfx950.ll b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.cvt.scalef32.pk.gfx950.ll
index 1ab27337632b6..4b113d80dd0e9 100644
--- a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.cvt.scalef32.pk.gfx950.ll
+++ b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.cvt.scalef32.pk.gfx950.ll
@@ -601,6 +601,34 @@ define <2 x i16> @test_cvt_scalef32_pk_fp8_f16_word1(<2 x i16> %old, <2 x half>
   ret <2 x i16> %ret
 }
 
+define <2 x i16> @test_cvt_scalef32_pk_fp8_f16_imm1(<2 x i16> %old, float %scale) {
+; GCN-LABEL: test_cvt_scalef32_pk_fp8_f16_imm1:
+; GCN:       ; %bb.0:
+; GCN-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT:    v_cvt_scalef32_pk_fp8_f16 v0, 4.0, v1
+; GCN-NEXT:    s_setpc_b64 s[30:31]
+  %ret = tail call <2 x i16> @llvm.amdgcn.cvt.scalef32.pk.fp8.f16(<2 x i16> %old, <2 x half> <half 4.0, half 4.0>, float %scale, i1 false)
+  ret <2 x i16> %ret
+}
+
+define <2 x i16> @test_cvt_scalef32_pk_fp8_f16_imm2(<2 x i16> %old, float %scale) {
+; GFX950-SDAG-LABEL: test_cvt_scalef32_pk_fp8_f16_imm2:
+; GFX950-SDAG:       ; %bb.0:
+; GFX950-SDAG-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX950-SDAG-NEXT:    s_mov_b32 s0, 0x40004400
+; GFX950-SDAG-NEXT:    v_cvt_scalef32_pk_fp8_f16 v0, s0, v1
+; GFX950-SDAG-NEXT:    s_setpc_b64 s[30:31]
+;
+; GFX950-GISEL-LABEL: test_cvt_scalef32_pk_fp8_f16_imm2:
+; GFX950-GISEL:       ; %bb.0:
+; GFX950-GISEL-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX950-GISEL-NEXT:    v_mov_b32_e32 v2, 0x40004400
+; GFX950-GISEL-NEXT:    v_cvt_scalef32_pk_fp8_f16 v0, v2, v1
+; GFX950-GISEL-NEXT:    s_setpc_b64 s[30:31]
+  %ret = tail call <2 x i16> @llvm.amdgcn.cvt.scalef32.pk.fp8.f16(<2 x i16> %old, <2 x half> <half 4.0, half 2.0>, float %scale, i1 false)
+  ret <2 x i16> %ret
+}
+
 define <2 x i16> @test_cvt_scalef32_pk_fp8_bf16_word0(<2 x i16> %old, <2 x bfloat> %src, float %scale) {
 ; GCN-LABEL: test_cvt_scalef32_pk_fp8_bf16_word0:
 ; GCN:       ; %bb.0:
@@ -621,6 +649,27 @@ define <2 x i16> @test_cvt_scalef32_pk_fp8_bf16_word1(<2 x i16> %old, <2 x bfloa
   ret <2 x i16> %ret
 }
 
+define <2 x i16> @test_cvt_scalef32_pk_fp8_bf16_imm1(<2 x i16> %old, float %scale) {
+; GCN-LABEL: test_cvt_scalef32_pk_fp8_bf16_imm1:
+; GCN:       ; %bb.0:
+; GCN-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT:    v_cvt_scalef32_pk_fp8_bf16 v0, 4.0, v1
+; GCN-NEXT:    s_setpc_b64 s[30:31]
+  %ret = tail call <2 x i16> @llvm.amdgcn.cvt.scalef32.pk.fp8.bf16(<2 x i16> %old, <2 x bfloat> <bfloat 4.0, bfloat 4.0>, float %scale, i1 false)
+  ret <2 x i16> %ret
+}
+
+define <2 x i16> @test_cvt_scalef32_pk_fp8_bf16_imm2(<2 x i16> %old, float %scale) {
+; GCN-LABEL: test_cvt_scalef32_pk_fp8_bf16_imm2:
+; GCN:       ; %bb.0:
+; GCN-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT:    s_mov_b32 s0, 0x40004080
+; GCN-NEXT:    v_cvt_scalef32_pk_fp8_bf16 v0, s0, v1
+; GCN-NEXT:    s_setpc_b64 s[30:31]
+  %ret = tail call <2 x i16> @llvm.amdgcn.cvt.scalef32.pk.fp8.bf16(<2 x i16> %old, <2 x bfloat> <bfloat 4.0, bfloat 2.0>, float %scale, i1 false)
+  ret <2 x i16> %ret
+}
+
 define <2 x i16> @test_cvt_scalef32_pk_bf8_f16_word0(<2 x i16> %old, <2 x half> %src, float %scale) {
 ; GCN-LABEL: test_cvt_scalef32_pk_bf8_f16_word0:
 ; GCN:       ; %bb.0:
@@ -641,6 +690,34 @@ define <2 x i16> @test_cvt_scalef32_pk_bf8_f16_word1(<2 x i16> %old, <2 x half>
   ret <2 x i16> %ret
 }
 
+define <2 x i16> @test_cvt_scalef32_pk_bf8_f16_imm1(<2 x i16> %old, float %scale) {
+; GCN-LABEL: test_cvt_scalef32_pk_bf8_f16_imm1:
+; GCN:       ; %bb.0:
+; GCN-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT:    v_cvt_scalef32_pk_bf8_f16 v0, 4.0, v1
+; GCN-NEXT:    s_setpc_b64 s[30:31]
+  %ret = tail call <2 x i16> @llvm.amdgcn.cvt.scalef32.pk.bf8.f16(<2 x i16> %old, <2 x half> <half 4.0, half 4.0>, float %scale, i1 false)
+  ret <2 x i16> %ret
+}
+
+define <2 x i16> @test_cvt_scalef32_pk_bf8_f16_imm2(<2 x i16> %old, float %scale) {
+; GFX950-SDAG-LABEL: test_cvt_scalef32_pk_bf8_f16_imm2:
+; GFX950-SDAG:       ; %bb.0:
+; GFX950-SDAG-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX950-SDAG-NEXT:    s_mov_b32 s0, 0x40004400
+; GFX950-SDAG-NEXT:    v_cvt_scalef32_pk_bf8_f16 v0, s0, v1
+; GFX950-SDAG-NEXT:    s_setpc_b64 s[30:31]
+;
+; GFX950-GISEL-LABEL: test_cvt_scalef32_pk_bf8_f16_imm2:
+; GFX950-GISEL:       ; %bb.0:
+; GFX950-GISEL-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX950-GISEL-NEXT:    v_mov_b32_e32 v2, 0x40004400
+; GFX950-GISEL-NEXT:    v_cvt_scalef32_pk_bf8_f16 v0, v2, v1
+; GFX950-GISEL-NEXT:    s_setpc_b64 s[30:31]
+  %ret = tail call <2 x i16> @llvm.amdgcn.cvt.scalef32.pk.bf8.f16(<2 x i16> %old, <2 x half> <half 4.0, half 2.0>, float %scale, i1 false)
+  ret <2 x i16> %ret
+}
+
 define <2 x i16> @test_cvt_scalef32_pk_bf8_bf16_word0(<2 x i16> %old, <2 x bfloat> %src, float %scale) {
 ; GCN-LABEL: test_cvt_scalef32_pk_bf8_bf16_word0:
 ; GCN:       ; %bb.0:
@@ -661,6 +738,27 @@ define <2 x i16> @test_cvt_scalef32_pk_bf8_bf16_word1(<2 x i16> %old, <2 x bfloa
   ret <2 x i16> %ret
 }
 
+define <2 x i16> @test_cvt_scalef32_pk_bf8_bf16_imm1(<2 x i16> %old, float %scale) {
+; GCN-LABEL: test_cvt_scalef32_pk_bf8_bf16_imm1:
+; GCN:       ; %bb.0:
+; GCN-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT:    v_cvt_scalef32_pk_bf8_bf16 v0, 4.0, v1
+; GCN-NEXT:    s_setpc_b64 s[30:31]
+  %ret = tail call <2 x i16> @llvm.amdgcn.cvt.scalef32.pk.bf8.bf16(<2 x i16> %old, <2 x bfloat> <bfloat 4.0, bfloat 4.0>, float %scale, i1 false)
+  ret <2 x i16> %ret
+}
+
+define <2 x i16> @test_cvt_scalef32_pk_bf8_bf16_imm2(<2 x i16> %old, float %scale) {
+; GCN-LABEL: test_cvt_scalef32_pk_bf8_bf16_imm2:
+; GCN:       ; %bb.0:
+; GCN-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT:    s_mov_b32 s0, 0x40004080
+; GCN-NEXT:    v_cvt_scalef32_pk_bf8_bf16 v0, s0, v1
+; GCN-NEXT:    s_setpc_b64 s[30:31]
+  %ret = tail call <2 x i16> @llvm.amdgcn.cvt.scalef32.pk.bf8.bf16(<2 x i16> %old, <2 x bfloat> <bfloat 4.0, bfloat 2.0>, float %scale, i1 false)
+  ret <2 x i16> %ret
+}
+
 define <2 x float> @test_cvt_scale_f32_fp4_byte0(i32 %src, float %scale) {
 ; GCN-LABEL: test_cvt_scale_f32_fp4_byte0:
 ; GCN:       ; %bb.0:
@@ -1236,6 +1334,37 @@ define i32 @test_cvt_scalef32_fp4_f16_byte3(<2 x half> %src0, float %scale, i32
   ret i32 %ret
 }
 
+define i32 @test_cvt_scalef32_fp4_f16_imm1(float %scale, i32 %old) {
+; GCN-LABEL: test_cvt_scalef32_fp4_f16_imm1:
+; GCN:       ; %bb.0:
+; GCN-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT:    v_cvt_scalef32_pk_fp4_f16 v1, 4.0, v0
+; GCN-NEXT:    v_mov_b32_e32 v0, v1
+; GCN-NEXT:    s_setpc_b64 s[30:31]
+  %ret = tail call i32 @llvm.amdgcn.cvt.scalef32.pk.fp4.f16(i32 %old, <2 x half> <half 4.0, half 4.0>, float %scale, i32 0)
+  ret i32 %ret
+}
+
+define i32 @test_cvt_scalef32_fp4_f16_imm2(float %scale, i32 %old) {
+; GFX950-SDAG-LABEL: test_cvt_scalef32_fp4_f16_imm2:
+; GFX950-SDAG:       ; %bb.0:
+; GFX950-SDAG-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX950-SDAG-NEXT:    s_mov_b32 s0, 0x40004400
+; GFX950-SDAG-NEXT:    v_cvt_scalef32_pk_fp4_f16 v1, s0, v0
+; GFX950-SDAG-NEXT:    v_mov_b32_e32 v0, v1
+; GFX950-SDAG-NEXT:    s_setpc_b64 s[30:31]
+;
+; GFX950-GISEL-LABEL: test_cvt_scalef32_fp4_f16_imm2:
+; GFX950-GISEL:       ; %bb.0:
+; GFX950-GISEL-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX950-GISEL-NEXT:    v_mov_b32_e32 v2, 0x40004400
+; GFX950-GISEL-NEXT:    v_cvt_scalef32_pk_fp4_f16 v1, v2, v0
+; GFX950-GISEL-NEXT:    v_mov_b32_e32 v0, v1
+; GFX950-GISEL-NEXT:    s_setpc_b64 s[30:31]
+  %ret = tail call i32 @llvm.amdgcn.cvt.scalef32.pk.fp4.f16(i32 %old, <2 x half> <half 4.0, half 2.0>, float %scale, i32 0)
+  ret i32 %ret
+}
+
 define i32 @test_cvt_scalef32_fp4_bf16_byte0(<2 x bfloat> %src0, float %scale, i32 %old) {
 ; GCN-LABEL: test_cvt_scalef32_fp4_bf16_byte0:
 ; GCN:       ; %bb.0:
@@ -1283,6 +1412,29 @@ define i32 @test_cvt_scalef32_fp4_bf16_byte3(<2 x bfloat> %src0, float %scale, i
   ret i32 %ret
 }
 
+define i32 @test_cvt_scalef32_fp4_bf16_imm1(float %scale, i32 %old) {
+; GCN-LABEL: test_cvt_scalef32_fp4_bf16_imm1:
+; GCN:       ; %bb.0:
+; GCN-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT:    v_cvt_scalef32_pk_fp4_bf16 v1, 4.0, v0
+; GCN-NEXT:    v_mov_b32_e32 v0, v1
+; GCN-NEXT:    s_setpc_b64 s[30:31]
+  %ret = tail call i32 @llvm.amdgcn.cvt.scalef32.pk.fp4.bf16(i32 %old, <2 x bfloat> <bfloat 4.0, bfloat 4.0>, float %scale, i32 0)
+  ret i32 %ret
+}
+
+define i32 @test_cvt_scalef32_fp4_bf16_imm2(float %scale, i32 %old) {
+; GCN-LABEL: test_cvt_scalef32_fp4_bf16_imm2:
+; GCN:       ; %bb.0:
+; GCN-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT:    s_mov_b32 s0, 0x40004080
+; GCN-NEXT:    v_cvt_scalef32_pk_fp4_bf16 v1, s0, v0
+; GCN-NEXT:    v_mov_b32_e32 v0, v1
+; GCN-NEXT:    s_setpc_b64 s[30:31]
+  %ret = tail call i32 @llvm.amdgcn.cvt.scalef32.pk.fp4.bf16(i32 %old, <2 x bfloat> <bfloat 4.0, bfloat 2.0>, float %scale, i32 0)
+  ret i32 %ret
+}
+
 define amdgpu_ps void @test_scalef32_pk32_fp6_f32_vv_inreg_src(<16 x float> inreg %src, float %scale, ptr addrspace(1) %out) {
 ; GFX950-SDAG-LABEL: test_scalef32_pk32_fp6_f32_vv_inreg_src:
 ; GFX950-SDAG:       ; %bb.0:

dfukalov requested review from jayfoad, arsenm, pravinjagtap and Copilot May 30, 2025 13:11

llvmbot added the backend:AMDGPU label May 30, 2025

Copilot AI reviewed May 30, 2025

View reviewed changes

arsenm approved these changes May 30, 2025

View reviewed changes

dfukalov merged commit 5208f72 into llvm:main May 30, 2025
13 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[AMDGPU] Fix SIFoldOperandsImpl::canUseImmWithOpSel() for VOP3 packed {B}F16 imms. #142142

[AMDGPU] Fix SIFoldOperandsImpl::canUseImmWithOpSel() for VOP3 packed {B}F16 imms. #142142

Uh oh!

dfukalov commented May 30, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

llvmbot commented May 30, 2025

Uh oh!

Uh oh!

Uh oh!

[AMDGPU] Fix SIFoldOperandsImpl::canUseImmWithOpSel() for VOP3 packed {B}F16 imms. #142142

[AMDGPU] Fix SIFoldOperandsImpl::canUseImmWithOpSel() for VOP3 packed {B}F16 imms. #142142

Uh oh!

Conversation

dfukalov commented May 30, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

llvmbot commented May 30, 2025

Uh oh!

Uh oh!

Uh oh!