[AMDGPU][MC] Allow dpp in v_dot2_f32_bf16 for GFX11 and 12 #142451

jwanggit86 · 2025-06-02T17:59:35Z

Allowing the dpp operand in v_dot2_f32_bf16 for GFX11 and 12.

llvmbot · 2025-06-02T17:59:59Z

@llvm/pr-subscribers-mc

@llvm/pr-subscribers-backend-amdgpu

Author: Jun Wang (jwanggit86)

Changes

Allowing the dpp operand in v_dot2_f32_bf16 for GFX11 and 12.

Full diff: https://github.com/llvm/llvm-project/pull/142451.diff

3 Files Affected:

(modified) llvm/lib/Target/AMDGPU/VOP3PInstructions.td (+3-1)
(modified) llvm/test/MC/AMDGPU/gfx11_asm_vop3p.s (+9)
(modified) llvm/test/MC/AMDGPU/gfx12_asm_vop3p.s (+9)

diff --git a/llvm/lib/Target/AMDGPU/VOP3PInstructions.td b/llvm/lib/Target/AMDGPU/VOP3PInstructions.td
index 06ee41acf41ac..ab30dda8bf23e 100644
--- a/llvm/lib/Target/AMDGPU/VOP3PInstructions.td
+++ b/llvm/lib/Target/AMDGPU/VOP3PInstructions.td
@@ -1878,6 +1878,8 @@ defm V_DOT4_F32_BF8_FP8 : VOP3P_Realtriple<GFX12Gen, 0x25>;
 defm V_DOT4_F32_FP8_FP8 : VOP3P_Realtriple<GFX12Gen, 0x26>;
 defm V_DOT4_F32_BF8_BF8 : VOP3P_Realtriple<GFX12Gen, 0x27>;
 
+defm V_DOT2_F32_BF16 : VOP3P_Realtriple<GFX12Gen, 0x1a>;
+
 //===----------------------------------------------------------------------===//
 // GFX11
 //===----------------------------------------------------------------------===//
@@ -1887,7 +1889,7 @@ multiclass VOP3P_Real_gfx11_gfx12<bits<8> op> :
 
 defm V_DOT4_I32_IU8  : VOP3P_Real_gfx11_gfx12<0x16>;
 defm V_DOT8_I32_IU4  : VOP3P_Real_gfx11_gfx12<0x18>;
-defm V_DOT2_F32_BF16 : VOP3P_Real_gfx11_gfx12<0x1a>;
+defm V_DOT2_F32_BF16 : VOP3P_Realtriple<GFX11Gen, 0x1a>;
 
 let AssemblerPredicate = isGFX11Plus in {
   def : AMDGPUMnemonicAlias<"v_dot4_i32_i8", "v_dot4_i32_iu8">;
diff --git a/llvm/test/MC/AMDGPU/gfx11_asm_vop3p.s b/llvm/test/MC/AMDGPU/gfx11_asm_vop3p.s
index 829b0ebb8d8ac..49da767b881f9 100644
--- a/llvm/test/MC/AMDGPU/gfx11_asm_vop3p.s
+++ b/llvm/test/MC/AMDGPU/gfx11_asm_vop3p.s
@@ -45,6 +45,15 @@ v_dot2_f32_bf16 v5, src_scc, vcc_lo, src_scc neg_lo:[1,0,0] neg_hi:[1,0,0]
 v_dot2_f32_bf16 v255, 0xfe0b, vcc_hi, 0.5 neg_lo:[0,1,0] neg_hi:[0,1,0] clamp
 // GFX11: [0xff,0xc2,0x1a,0xcc,0xff,0xd6,0xc0,0x5b,0x0b,0xfe,0x00,0x00]
 
+v_dot2_f32_bf16_e64_dpp v1, v2, v3, v4 quad_perm:[0,1,2,3] row_mask:0xf bank_mask:0xf
+// GFX11: [0x01,0x40,0x1a,0xcc,0xfa,0x06,0x12,0x1c,0x02,0xe4,0x00,0xff]
+
+v_dot2_f32_bf16_e64_dpp v1, v2, v3, v4 quad_perm:[0,1,2,3] row_mask:0x0 bank_mask:0x0
+// GFX11: [0x01,0x40,0x1a,0xcc,0xfa,0x06,0x12,0x1c,0x02,0xe4,0x00,0x00]
+
+v_dot2_f32_bf16_e64_dpp v1, v2, v3, v4 dpp8:[7,6,5,4,3,2,1,0]
+// GFX11: [0x01,0x40,0x1a,0xcc,0xe9,0x06,0x12,0x1c,0x02,0x77,0x39,0x05]
+
 v_dot2_f32_f16 v5, v1, v2, s3
 // GFX11: [0x05,0x40,0x13,0xcc,0x01,0x05,0x0e,0x18]
 
diff --git a/llvm/test/MC/AMDGPU/gfx12_asm_vop3p.s b/llvm/test/MC/AMDGPU/gfx12_asm_vop3p.s
index db9ad3d2a8418..3287251f1183c 100644
--- a/llvm/test/MC/AMDGPU/gfx12_asm_vop3p.s
+++ b/llvm/test/MC/AMDGPU/gfx12_asm_vop3p.s
@@ -45,6 +45,15 @@ v_dot2_f32_bf16 v5, src_scc, vcc_lo, src_scc neg_lo:[1,0,0] neg_hi:[1,0,0]
 v_dot2_f32_bf16 v255, 0xfe0b, vcc_hi, 0.5 neg_lo:[0,0,0] neg_hi:[0,0,0] clamp
 // GFX12: [0xff,0xc0,0x1a,0xcc,0xff,0xd6,0xc0,0x1b,0x0b,0xfe,0x00,0x00]
 
+v_dot2_f32_bf16_e64_dpp v1, v2, v3, v4 quad_perm:[0,1,2,3] row_mask:0xf bank_mask:0xf
+// GFX11: [0x01,0x40,0x1a,0xcc,0xfa,0x06,0x12,0x1c,0x02,0xe4,0x00,0xff]
+
+v_dot2_f32_bf16_e64_dpp v1, v2, v3, v4 quad_perm:[0,1,2,3] row_mask:0x0 bank_mask:0x0
+// GFX11: [0x01,0x40,0x1a,0xcc,0xfa,0x06,0x12,0x1c,0x02,0xe4,0x00,0x00]
+
+v_dot2_f32_bf16_e64_dpp v1, v2, v3, v4 dpp8:[7,6,5,4,3,2,1,0]
+// GFX11: [0x01,0x40,0x1a,0xcc,0xe9,0x06,0x12,0x1c,0x02,0x77,0x39,0x05]
+
 v_dot2_f32_f16 v5, v1, v2, s3
 // GFX12: [0x05,0x40,0x13,0xcc,0x01,0x05,0x0e,0x18]

rampitec

Missing dpp8 and disasm tests.

rampitec · 2025-06-02T19:33:35Z

llvm/lib/Target/AMDGPU/VOP3PInstructions.td

@@ -1878,6 +1878,8 @@ defm V_DOT4_F32_BF8_FP8 : VOP3P_Realtriple<GFX12Gen, 0x25>;
 defm V_DOT4_F32_FP8_FP8 : VOP3P_Realtriple<GFX12Gen, 0x26>;
 defm V_DOT4_F32_BF8_BF8 : VOP3P_Realtriple<GFX12Gen, 0x27>;

+defm V_DOT2_F32_BF16 : VOP3P_Realtriple<GFX12Gen, 0x1a>;


There is VOP3P_Realtriple_gfx11_gfx12 already.

Where is it? I couldn't find it.

Oh, I see. Take it from downstream, added 2 weeks ago by Mirko.

Included in latest commit.

rampitec · 2025-06-02T19:35:18Z

llvm/test/MC/AMDGPU/gfx11_asm_vop3p.s

@@ -45,6 +45,15 @@ v_dot2_f32_bf16 v5, src_scc, vcc_lo, src_scc neg_lo:[1,0,0] neg_hi:[1,0,0]
 v_dot2_f32_bf16 v255, 0xfe0b, vcc_hi, 0.5 neg_lo:[0,1,0] neg_hi:[0,1,0] clamp
 // GFX11: [0xff,0xc2,0x1a,0xcc,0xff,0xd6,0xc0,0x5b,0x0b,0xfe,0x00,0x00]

+v_dot2_f32_bf16_e64_dpp v1, v2, v3, v4 quad_perm:[0,1,2,3] row_mask:0xf bank_mask:0xf


Should go into gfx11_asm_vop3p_dpp16.s.

rampitec · 2025-06-02T19:35:35Z

llvm/test/MC/AMDGPU/gfx12_asm_vop3p.s

@@ -45,6 +45,15 @@ v_dot2_f32_bf16 v5, src_scc, vcc_lo, src_scc neg_lo:[1,0,0] neg_hi:[1,0,0]
 v_dot2_f32_bf16 v255, 0xfe0b, vcc_hi, 0.5 neg_lo:[0,0,0] neg_hi:[0,0,0] clamp
 // GFX12: [0xff,0xc0,0x1a,0xcc,0xff,0xd6,0xc0,0x1b,0x0b,0xfe,0x00,0x00]

+v_dot2_f32_bf16_e64_dpp v1, v2, v3, v4 quad_perm:[0,1,2,3] row_mask:0xf bank_mask:0xf


Should go into gfx12_asm_vop3p_dpp16.s.

(1) created VOP3P_Realtriple_gfx11_gfx12 (2) updated asm tests (3) created disasm tests.

jwanggit86 · 2025-06-03T18:00:19Z

Missing dpp8 and disasm tests.

Added.

rampitec

LGTM

[AMDGPU][MC] Allow dpp in v_dot2_f32_bf16 for GFX11 and 12

1cbabd4

Allowing the dpp operand in v_dot2_f32_bf16 for GFX11 and 12.

jwanggit86 requested review from kosarev, Sisyph and rampitec June 2, 2025 17:59

jwanggit86 added backend:AMDGPU mc Machine (object) code labels Jun 2, 2025

rampitec reviewed Jun 2, 2025

View reviewed changes

rampitec requested a review from mbrkusanin June 3, 2025 07:35

This commit:

af512a6

(1) created VOP3P_Realtriple_gfx11_gfx12 (2) updated asm tests (3) created disasm tests.

jwanggit86 requested a review from rampitec June 3, 2025 18:00

rampitec approved these changes Jun 3, 2025

View reviewed changes

Empty commit to trigger build

1b4c538

jwanggit86 force-pushed the allow-dpp-in-v-dot2-f32-bf16-in-gfx11 branch from df4fd57 to 1b4c538 Compare June 4, 2025 19:06

jwanggit86 merged commit 64c094b into llvm:main Jun 5, 2025
11 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[AMDGPU][MC] Allow dpp in v_dot2_f32_bf16 for GFX11 and 12 #142451

[AMDGPU][MC] Allow dpp in v_dot2_f32_bf16 for GFX11 and 12 #142451

Uh oh!

jwanggit86 commented Jun 2, 2025

Uh oh!

llvmbot commented Jun 2, 2025 •

edited

Loading

Uh oh!

rampitec left a comment

Uh oh!

rampitec Jun 2, 2025

Uh oh!

jwanggit86 Jun 2, 2025

Uh oh!

rampitec Jun 2, 2025

Uh oh!

jwanggit86 Jun 3, 2025

Uh oh!

rampitec Jun 2, 2025

Uh oh!

jwanggit86 Jun 3, 2025

Uh oh!

rampitec Jun 2, 2025

Uh oh!

jwanggit86 Jun 3, 2025

Uh oh!

jwanggit86 commented Jun 3, 2025

Uh oh!

rampitec left a comment

Uh oh!

Uh oh!

Uh oh!

[AMDGPU][MC] Allow dpp in v_dot2_f32_bf16 for GFX11 and 12 #142451

[AMDGPU][MC] Allow dpp in v_dot2_f32_bf16 for GFX11 and 12 #142451

Uh oh!

Conversation

jwanggit86 commented Jun 2, 2025

Uh oh!

llvmbot commented Jun 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rampitec left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jwanggit86 commented Jun 3, 2025

Uh oh!

rampitec left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

llvmbot commented Jun 2, 2025 •

edited

Loading