Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[X86][AVX10.2] Include changes for COMX and VGETEXP from rev. 2 #132824

Merged
merged 1 commit into from
Mar 25, 2025

Conversation

e-kud
Copy link
Contributor

@e-kud e-kud commented Mar 24, 2025

Address missing changes:

  • V[,U]COMXSD need to have XD (F3.0F –> F2.0F)
  • V[,U]COMXS[S,H] need to have XS (F2.[0F,MAP5] -> F3.[0F,MAP5])
  • VGETEXPBF16 needs to have T_MAP6 and NP (66.MAP5 -> NP.MAP6)

Ref.: https://cdrdv2.intel.com/v1/dl/getContent/828965

Address missing changes:
- V[,U]COMXSD need to have XD (F3.0F –> F2.0F)
- V[,U]COMXS[S,H] need to have XS (F2.[0F,MAP5] -> F3.[OF,MAP5])
- VGETEXP need to have T_MAP6 and NP (66.MAP5 -> NP.MAP6)

https://cdrdv2-public.intel.com/844829/361050-intel-avx10.2-spec-jan2025-2.pdf
@e-kud e-kud requested review from RKSimon and phoebewang March 24, 2025 20:30
@llvmbot llvmbot added backend:X86 mc Machine (object) code labels Mar 24, 2025
@llvmbot
Copy link
Member

llvmbot commented Mar 24, 2025

@llvm/pr-subscribers-mc

@llvm/pr-subscribers-backend-x86

Author: Evgenii Kudriashov (e-kud)

Changes

Address missing changes:

  • V[,U]COMXSD need to have XD (F3.0F –> F2.0F)
  • V[,U]COMXS[S,H] need to have XS (F2.[0F,MAP5] -> F3.[OF,MAP5])
  • VGETEXP needs to have T_MAP6 and NP (66.MAP5 -> NP.MAP6)

Ref.: https://cdrdv2.intel.com/v1/dl/getContent/828965


Patch is 106.53 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/132824.diff

15 Files Affected:

  • (modified) llvm/lib/Target/X86/X86InstrAVX10.td (+10-10)
  • (modified) llvm/test/CodeGen/X86/avx10_2_512bf16-intrinsics.ll (+2-2)
  • (modified) llvm/test/CodeGen/X86/avx10_2bf16-intrinsics.ll (+10-10)
  • (modified) llvm/test/MC/Disassembler/X86/avx10.2-bf16-32.txt (+27-27)
  • (modified) llvm/test/MC/Disassembler/X86/avx10.2-bf16-64.txt (+27-27)
  • (modified) llvm/test/MC/Disassembler/X86/avx10.2-com-ef-32.txt (+48-48)
  • (modified) llvm/test/MC/Disassembler/X86/avx10.2-com-ef-64.txt (+48-48)
  • (modified) llvm/test/MC/X86/avx10.2-bf16-32-att.s (+27-27)
  • (modified) llvm/test/MC/X86/avx10.2-bf16-32-intel.s (+27-27)
  • (modified) llvm/test/MC/X86/avx10.2-bf16-64-att.s (+27-27)
  • (modified) llvm/test/MC/X86/avx10.2-bf16-64-intel.s (+27-27)
  • (modified) llvm/test/MC/X86/avx10.2-com-ef-32-att.s (+48-48)
  • (modified) llvm/test/MC/X86/avx10.2-com-ef-32-intel.s (+48-48)
  • (modified) llvm/test/MC/X86/avx10.2-com-ef-64-att.s (+48-48)
  • (modified) llvm/test/MC/X86/avx10.2-com-ef-64-intel.s (+48-48)
diff --git a/llvm/lib/Target/X86/X86InstrAVX10.td b/llvm/lib/Target/X86/X86InstrAVX10.td
index 7cbdcbc939eec..2d2bf1f6c725e 100644
--- a/llvm/lib/Target/X86/X86InstrAVX10.td
+++ b/llvm/lib/Target/X86/X86InstrAVX10.td
@@ -1141,7 +1141,7 @@ defm VRSQRT  : avx10_fp14_bf16<0x4E, "vrsqrt", X86rsqrt14, SchedWriteFRsqrt>,
 defm VRCP    : avx10_fp14_bf16<0x4C, "vrcp", X86rcp14, SchedWriteFRcp>,
                                 T_MAP6, PS, EVEX_CD8<16, CD8VF>;
 defm VGETEXP : avx10_fp14_bf16<0x42, "vgetexp", X86fgetexp, SchedWriteFRnd>,
-                                T_MAP5, EVEX_CD8<16, CD8VF>;
+                                T_MAP6, PS, EVEX_CD8<16, CD8VF>;
 
 // VSCALEFBF16
 multiclass avx10_fp_scalef_bf16<bits<8> opc, string OpcodeStr,
@@ -1338,31 +1338,31 @@ multiclass avx10_com_ef_int<bits<8> Opc, X86VectorVTInfo _, SDNode OpNode,
 let Defs = [EFLAGS], Uses = [MXCSR], Predicates = [HasAVX10_2] in {
   defm VUCOMXSDZ  :  avx10_com_ef<0x2e, FR64X, f64, X86ucomi512,
                                   "vucomxsd", f64mem, loadf64, SSEPackedDouble>,
-                                  TB, XS, VEX_LIG, REX_W, EVEX_CD8<64, CD8VT1>;
+                                  TB, XD, VEX_LIG, REX_W, EVEX_CD8<64, CD8VT1>;
   defm VUCOMXSHZ  :  avx10_com_ef<0x2e, FR16X, f16, X86ucomi512,
                                   "vucomxsh", f16mem, loadf16, SSEPackedSingle>,
-                                  T_MAP5, XD, EVEX_CD8<16, CD8VT1>;
+                                  T_MAP5, XS, EVEX_CD8<16, CD8VT1>;
   defm VUCOMXSSZ  :  avx10_com_ef<0x2e, FR32X, f32, X86ucomi512,
                                   "vucomxss", f32mem, loadf32, SSEPackedSingle>,
-                                  TB, XD, VEX_LIG, EVEX_CD8<32, CD8VT1>;
+                                  TB, XS, VEX_LIG, EVEX_CD8<32, CD8VT1>;
   defm VCOMXSDZ   :  avx10_com_ef_int<0x2f, v2f64x_info, X86comi512,
                                       "vcomxsd", SSEPackedDouble>,
-                                      TB, XS, VEX_LIG, REX_W, EVEX_CD8<64, CD8VT1>;
+                                      TB, XD, VEX_LIG, REX_W, EVEX_CD8<64, CD8VT1>;
   defm VCOMXSHZ   :  avx10_com_ef_int<0x2f, v8f16x_info, X86comi512,
                                       "vcomxsh", SSEPackedSingle>,
-                                      T_MAP5, XD, EVEX_CD8<16, CD8VT1>;
+                                      T_MAP5, XS, EVEX_CD8<16, CD8VT1>;
   defm VCOMXSSZ   :  avx10_com_ef_int<0x2f, v4f32x_info, X86comi512,
                                       "vcomxss", SSEPackedSingle>,
-                                      TB, XD, VEX_LIG, EVEX_CD8<32, CD8VT1>;
+                                      TB, XS, VEX_LIG, EVEX_CD8<32, CD8VT1>;
   defm VUCOMXSDZ  :  avx10_com_ef_int<0x2e, v2f64x_info, X86ucomi512,
                                       "vucomxsd", SSEPackedDouble>,
-                                      TB, XS, VEX_LIG, REX_W, EVEX_CD8<64, CD8VT1>;
+                                      TB, XD, VEX_LIG, REX_W, EVEX_CD8<64, CD8VT1>;
   defm VUCOMXSHZ  :  avx10_com_ef_int<0x2e, v8f16x_info, X86ucomi512,
                                       "vucomxsh", SSEPackedSingle>,
-                                      T_MAP5, XD, EVEX_CD8<16, CD8VT1>;
+                                      T_MAP5, XS, EVEX_CD8<16, CD8VT1>;
   defm VUCOMXSSZ  :  avx10_com_ef_int<0x2e, v4f32x_info, X86ucomi512,
                                       "vucomxss", SSEPackedSingle>,
-                                      TB, XD, VEX_LIG, EVEX_CD8<32, CD8VT1>;
+                                      TB, XS, VEX_LIG, EVEX_CD8<32, CD8VT1>;
 }
 
 //-------------------------------------------------
diff --git a/llvm/test/CodeGen/X86/avx10_2_512bf16-intrinsics.ll b/llvm/test/CodeGen/X86/avx10_2_512bf16-intrinsics.ll
index da17b995afedf..cbac76e9de273 100644
--- a/llvm/test/CodeGen/X86/avx10_2_512bf16-intrinsics.ll
+++ b/llvm/test/CodeGen/X86/avx10_2_512bf16-intrinsics.ll
@@ -164,7 +164,7 @@ define <32 x bfloat>@test_int_x86_avx512_mask_getexp_bf16_512(<32 x bfloat> %x0,
 ; X64-LABEL: test_int_x86_avx512_mask_getexp_bf16_512:
 ; X64:       # %bb.0:
 ; X64-NEXT:    kmovd %edi, %k1 # encoding: [0xc5,0xfb,0x92,0xcf]
-; X64-NEXT:    vgetexpbf16 %zmm0, %zmm0 # encoding: [0x62,0xf5,0x7d,0x48,0x42,0xc0]
+; X64-NEXT:    vgetexpbf16 %zmm0, %zmm0 # encoding: [0x62,0xf6,0x7c,0x48,0x42,0xc0]
 ; X64-NEXT:    vmovdqu16 %zmm0, %zmm1 {%k1} # encoding: [0x62,0xf1,0xff,0x49,0x6f,0xc8]
 ; X64-NEXT:    vaddbf16 %zmm0, %zmm1, %zmm0 # encoding: [0x62,0xf5,0x75,0x48,0x58,0xc0]
 ; X64-NEXT:    retq # encoding: [0xc3]
@@ -172,7 +172,7 @@ define <32 x bfloat>@test_int_x86_avx512_mask_getexp_bf16_512(<32 x bfloat> %x0,
 ; X86-LABEL: test_int_x86_avx512_mask_getexp_bf16_512:
 ; X86:       # %bb.0:
 ; X86-NEXT:    kmovd {{[0-9]+}}(%esp), %k1 # encoding: [0xc4,0xe1,0xf9,0x90,0x4c,0x24,0x04]
-; X86-NEXT:    vgetexpbf16 %zmm0, %zmm0 # encoding: [0x62,0xf5,0x7d,0x48,0x42,0xc0]
+; X86-NEXT:    vgetexpbf16 %zmm0, %zmm0 # encoding: [0x62,0xf6,0x7c,0x48,0x42,0xc0]
 ; X86-NEXT:    vmovdqu16 %zmm0, %zmm1 {%k1} # encoding: [0x62,0xf1,0xff,0x49,0x6f,0xc8]
 ; X86-NEXT:    vaddbf16 %zmm0, %zmm1, %zmm0 # encoding: [0x62,0xf5,0x75,0x48,0x58,0xc0]
 ; X86-NEXT:    retl # encoding: [0xc3]
diff --git a/llvm/test/CodeGen/X86/avx10_2bf16-intrinsics.ll b/llvm/test/CodeGen/X86/avx10_2bf16-intrinsics.ll
index 06875dbe7cd23..ba32b2adc7999 100644
--- a/llvm/test/CodeGen/X86/avx10_2bf16-intrinsics.ll
+++ b/llvm/test/CodeGen/X86/avx10_2bf16-intrinsics.ll
@@ -333,7 +333,7 @@ declare <16 x bfloat> @llvm.x86.avx10.mask.getexp.bf16.256(<16 x bfloat>, <16 x
 define <8 x bfloat>@test_int_x86_avx512_getexp_bf16_128(<8 x bfloat> %x0) {
 ; CHECK-LABEL: test_int_x86_avx512_getexp_bf16_128:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vgetexpbf16 %xmm0, %xmm0 # encoding: [0x62,0xf5,0x7d,0x08,0x42,0xc0]
+; CHECK-NEXT:    vgetexpbf16 %xmm0, %xmm0 # encoding: [0x62,0xf6,0x7c,0x08,0x42,0xc0]
 ; CHECK-NEXT:    ret{{[l|q]}} # encoding: [0xc3]
   %res = call <8 x bfloat> @llvm.x86.avx10.mask.getexp.bf16.128(<8 x bfloat> %x0, <8 x bfloat> zeroinitializer, i8 -1)
   ret <8 x bfloat> %res
@@ -343,14 +343,14 @@ define <8 x bfloat>@test_int_x86_avx512_mask_getexp_bf16_128(<8 x bfloat> %x0, <
 ; X64-LABEL: test_int_x86_avx512_mask_getexp_bf16_128:
 ; X64:       # %bb.0:
 ; X64-NEXT:    kmovd %edi, %k1 # encoding: [0xc5,0xfb,0x92,0xcf]
-; X64-NEXT:    vgetexpbf16 %xmm0, %xmm1 {%k1} # encoding: [0x62,0xf5,0x7d,0x09,0x42,0xc8]
+; X64-NEXT:    vgetexpbf16 %xmm0, %xmm1 {%k1} # encoding: [0x62,0xf6,0x7c,0x09,0x42,0xc8]
 ; X64-NEXT:    vmovaps %xmm1, %xmm0 # EVEX TO VEX Compression encoding: [0xc5,0xf8,0x28,0xc1]
 ; X64-NEXT:    retq # encoding: [0xc3]
 ;
 ; X86-LABEL: test_int_x86_avx512_mask_getexp_bf16_128:
 ; X86:       # %bb.0:
 ; X86-NEXT:    kmovb {{[0-9]+}}(%esp), %k1 # encoding: [0xc5,0xf9,0x90,0x4c,0x24,0x04]
-; X86-NEXT:    vgetexpbf16 %xmm0, %xmm1 {%k1} # encoding: [0x62,0xf5,0x7d,0x09,0x42,0xc8]
+; X86-NEXT:    vgetexpbf16 %xmm0, %xmm1 {%k1} # encoding: [0x62,0xf6,0x7c,0x09,0x42,0xc8]
 ; X86-NEXT:    vmovaps %xmm1, %xmm0 # EVEX TO VEX Compression encoding: [0xc5,0xf8,0x28,0xc1]
 ; X86-NEXT:    retl # encoding: [0xc3]
   %res = call <8 x bfloat> @llvm.x86.avx10.mask.getexp.bf16.128(<8 x bfloat> %x0, <8 x bfloat> %x1, i8 %x2)
@@ -361,13 +361,13 @@ define <8 x bfloat>@test_int_x86_avx512_maskz_getexp_bf16_128(<8 x bfloat> %x0,
 ; X64-LABEL: test_int_x86_avx512_maskz_getexp_bf16_128:
 ; X64:       # %bb.0:
 ; X64-NEXT:    kmovd %edi, %k1 # encoding: [0xc5,0xfb,0x92,0xcf]
-; X64-NEXT:    vgetexpbf16 %xmm0, %xmm0 {%k1} {z} # encoding: [0x62,0xf5,0x7d,0x89,0x42,0xc0]
+; X64-NEXT:    vgetexpbf16 %xmm0, %xmm0 {%k1} {z} # encoding: [0x62,0xf6,0x7c,0x89,0x42,0xc0]
 ; X64-NEXT:    retq # encoding: [0xc3]
 ;
 ; X86-LABEL: test_int_x86_avx512_maskz_getexp_bf16_128:
 ; X86:       # %bb.0:
 ; X86-NEXT:    kmovb {{[0-9]+}}(%esp), %k1 # encoding: [0xc5,0xf9,0x90,0x4c,0x24,0x04]
-; X86-NEXT:    vgetexpbf16 %xmm0, %xmm0 {%k1} {z} # encoding: [0x62,0xf5,0x7d,0x89,0x42,0xc0]
+; X86-NEXT:    vgetexpbf16 %xmm0, %xmm0 {%k1} {z} # encoding: [0x62,0xf6,0x7c,0x89,0x42,0xc0]
 ; X86-NEXT:    retl # encoding: [0xc3]
   %res = call <8 x bfloat> @llvm.x86.avx10.mask.getexp.bf16.128(<8 x bfloat> %x0, <8 x bfloat> zeroinitializer, i8 %x2)
   ret <8 x bfloat> %res
@@ -376,7 +376,7 @@ define <8 x bfloat>@test_int_x86_avx512_maskz_getexp_bf16_128(<8 x bfloat> %x0,
 define <16 x bfloat>@test_int_x86_avx512_getexp_bf16_256(<16 x bfloat> %x0) {
 ; CHECK-LABEL: test_int_x86_avx512_getexp_bf16_256:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vgetexpbf16 %ymm0, %ymm0 # encoding: [0x62,0xf5,0x7d,0x28,0x42,0xc0]
+; CHECK-NEXT:    vgetexpbf16 %ymm0, %ymm0 # encoding: [0x62,0xf6,0x7c,0x28,0x42,0xc0]
 ; CHECK-NEXT:    ret{{[l|q]}} # encoding: [0xc3]
   %res = call <16 x bfloat> @llvm.x86.avx10.mask.getexp.bf16.256(<16 x bfloat> %x0, <16 x bfloat> zeroinitializer, i16 -1)
   ret <16 x bfloat> %res
@@ -386,14 +386,14 @@ define <16 x bfloat>@test_int_x86_avx512_mask_getexp_bf16_256(<16 x bfloat> %x0,
 ; X64-LABEL: test_int_x86_avx512_mask_getexp_bf16_256:
 ; X64:       # %bb.0:
 ; X64-NEXT:    kmovd %edi, %k1 # encoding: [0xc5,0xfb,0x92,0xcf]
-; X64-NEXT:    vgetexpbf16 %ymm0, %ymm1 {%k1} # encoding: [0x62,0xf5,0x7d,0x29,0x42,0xc8]
+; X64-NEXT:    vgetexpbf16 %ymm0, %ymm1 {%k1} # encoding: [0x62,0xf6,0x7c,0x29,0x42,0xc8]
 ; X64-NEXT:    vmovaps %ymm1, %ymm0 # EVEX TO VEX Compression encoding: [0xc5,0xfc,0x28,0xc1]
 ; X64-NEXT:    retq # encoding: [0xc3]
 ;
 ; X86-LABEL: test_int_x86_avx512_mask_getexp_bf16_256:
 ; X86:       # %bb.0:
 ; X86-NEXT:    kmovw {{[0-9]+}}(%esp), %k1 # encoding: [0xc5,0xf8,0x90,0x4c,0x24,0x04]
-; X86-NEXT:    vgetexpbf16 %ymm0, %ymm1 {%k1} # encoding: [0x62,0xf5,0x7d,0x29,0x42,0xc8]
+; X86-NEXT:    vgetexpbf16 %ymm0, %ymm1 {%k1} # encoding: [0x62,0xf6,0x7c,0x29,0x42,0xc8]
 ; X86-NEXT:    vmovaps %ymm1, %ymm0 # EVEX TO VEX Compression encoding: [0xc5,0xfc,0x28,0xc1]
 ; X86-NEXT:    retl # encoding: [0xc3]
   %res = call <16 x bfloat> @llvm.x86.avx10.mask.getexp.bf16.256(<16 x bfloat> %x0, <16 x bfloat> %x1, i16 %x2)
@@ -404,13 +404,13 @@ define <16 x bfloat>@test_int_x86_avx512_maskz_getexp_bf16_256(<16 x bfloat> %x0
 ; X64-LABEL: test_int_x86_avx512_maskz_getexp_bf16_256:
 ; X64:       # %bb.0:
 ; X64-NEXT:    kmovd %edi, %k1 # encoding: [0xc5,0xfb,0x92,0xcf]
-; X64-NEXT:    vgetexpbf16 %ymm0, %ymm0 {%k1} {z} # encoding: [0x62,0xf5,0x7d,0xa9,0x42,0xc0]
+; X64-NEXT:    vgetexpbf16 %ymm0, %ymm0 {%k1} {z} # encoding: [0x62,0xf6,0x7c,0xa9,0x42,0xc0]
 ; X64-NEXT:    retq # encoding: [0xc3]
 ;
 ; X86-LABEL: test_int_x86_avx512_maskz_getexp_bf16_256:
 ; X86:       # %bb.0:
 ; X86-NEXT:    kmovw {{[0-9]+}}(%esp), %k1 # encoding: [0xc5,0xf8,0x90,0x4c,0x24,0x04]
-; X86-NEXT:    vgetexpbf16 %ymm0, %ymm0 {%k1} {z} # encoding: [0x62,0xf5,0x7d,0xa9,0x42,0xc0]
+; X86-NEXT:    vgetexpbf16 %ymm0, %ymm0 {%k1} {z} # encoding: [0x62,0xf6,0x7c,0xa9,0x42,0xc0]
 ; X86-NEXT:    retl # encoding: [0xc3]
   %res = call <16 x bfloat> @llvm.x86.avx10.mask.getexp.bf16.256(<16 x bfloat> %x0, <16 x bfloat> zeroinitializer, i16 %x2)
   ret <16 x bfloat> %res
diff --git a/llvm/test/MC/Disassembler/X86/avx10.2-bf16-32.txt b/llvm/test/MC/Disassembler/X86/avx10.2-bf16-32.txt
index a32e55e20e6b7..0db70d290e565 100644
--- a/llvm/test/MC/Disassembler/X86/avx10.2-bf16-32.txt
+++ b/llvm/test/MC/Disassembler/X86/avx10.2-bf16-32.txt
@@ -1719,111 +1719,111 @@
 
 # ATT:   vgetexpbf16 %xmm3, %xmm2
 # INTEL: vgetexpbf16 xmm2, xmm3
-0x62,0xf5,0x7d,0x08,0x42,0xd3
+0x62,0xf6,0x7c,0x08,0x42,0xd3
 
 # ATT:   vgetexpbf16 %xmm3, %xmm2 {%k7}
 # INTEL: vgetexpbf16 xmm2 {k7}, xmm3
-0x62,0xf5,0x7d,0x0f,0x42,0xd3
+0x62,0xf6,0x7c,0x0f,0x42,0xd3
 
 # ATT:   vgetexpbf16 %xmm3, %xmm2 {%k7} {z}
 # INTEL: vgetexpbf16 xmm2 {k7} {z}, xmm3
-0x62,0xf5,0x7d,0x8f,0x42,0xd3
+0x62,0xf6,0x7c,0x8f,0x42,0xd3
 
 # ATT:   vgetexpbf16 %zmm3, %zmm2
 # INTEL: vgetexpbf16 zmm2, zmm3
-0x62,0xf5,0x7d,0x48,0x42,0xd3
+0x62,0xf6,0x7c,0x48,0x42,0xd3
 
 # ATT:   vgetexpbf16 %zmm3, %zmm2 {%k7}
 # INTEL: vgetexpbf16 zmm2 {k7}, zmm3
-0x62,0xf5,0x7d,0x4f,0x42,0xd3
+0x62,0xf6,0x7c,0x4f,0x42,0xd3
 
 # ATT:   vgetexpbf16 %zmm3, %zmm2 {%k7} {z}
 # INTEL: vgetexpbf16 zmm2 {k7} {z}, zmm3
-0x62,0xf5,0x7d,0xcf,0x42,0xd3
+0x62,0xf6,0x7c,0xcf,0x42,0xd3
 
 # ATT:   vgetexpbf16 %ymm3, %ymm2
 # INTEL: vgetexpbf16 ymm2, ymm3
-0x62,0xf5,0x7d,0x28,0x42,0xd3
+0x62,0xf6,0x7c,0x28,0x42,0xd3
 
 # ATT:   vgetexpbf16 %ymm3, %ymm2 {%k7}
 # INTEL: vgetexpbf16 ymm2 {k7}, ymm3
-0x62,0xf5,0x7d,0x2f,0x42,0xd3
+0x62,0xf6,0x7c,0x2f,0x42,0xd3
 
 # ATT:   vgetexpbf16 %ymm3, %ymm2 {%k7} {z}
 # INTEL: vgetexpbf16 ymm2 {k7} {z}, ymm3
-0x62,0xf5,0x7d,0xaf,0x42,0xd3
+0x62,0xf6,0x7c,0xaf,0x42,0xd3
 
 # ATT:   vgetexpbf16  268435456(%esp,%esi,8), %xmm2
 # INTEL: vgetexpbf16 xmm2, xmmword ptr [esp + 8*esi + 268435456]
-0x62,0xf5,0x7d,0x08,0x42,0x94,0xf4,0x00,0x00,0x00,0x10
+0x62,0xf6,0x7c,0x08,0x42,0x94,0xf4,0x00,0x00,0x00,0x10
 
 # ATT:   vgetexpbf16  291(%edi,%eax,4), %xmm2 {%k7}
 # INTEL: vgetexpbf16 xmm2 {k7}, xmmword ptr [edi + 4*eax + 291]
-0x62,0xf5,0x7d,0x0f,0x42,0x94,0x87,0x23,0x01,0x00,0x00
+0x62,0xf6,0x7c,0x0f,0x42,0x94,0x87,0x23,0x01,0x00,0x00
 
 # ATT:   vgetexpbf16  (%eax){1to8}, %xmm2
 # INTEL: vgetexpbf16 xmm2, word ptr [eax]{1to8}
-0x62,0xf5,0x7d,0x18,0x42,0x10
+0x62,0xf6,0x7c,0x18,0x42,0x10
 
 # ATT:   vgetexpbf16  -512(,%ebp,2), %xmm2
 # INTEL: vgetexpbf16 xmm2, xmmword ptr [2*ebp - 512]
-0x62,0xf5,0x7d,0x08,0x42,0x14,0x6d,0x00,0xfe,0xff,0xff
+0x62,0xf6,0x7c,0x08,0x42,0x14,0x6d,0x00,0xfe,0xff,0xff
 
 # ATT:   vgetexpbf16  2032(%ecx), %xmm2 {%k7} {z}
 # INTEL: vgetexpbf16 xmm2 {k7} {z}, xmmword ptr [ecx + 2032]
-0x62,0xf5,0x7d,0x8f,0x42,0x51,0x7f
+0x62,0xf6,0x7c,0x8f,0x42,0x51,0x7f
 
 # ATT:   vgetexpbf16  -256(%edx){1to8}, %xmm2 {%k7} {z}
 # INTEL: vgetexpbf16 xmm2 {k7} {z}, word ptr [edx - 256]{1to8}
-0x62,0xf5,0x7d,0x9f,0x42,0x52,0x80
+0x62,0xf6,0x7c,0x9f,0x42,0x52,0x80
 
 # ATT:   vgetexpbf16  268435456(%esp,%esi,8), %ymm2
 # INTEL: vgetexpbf16 ymm2, ymmword ptr [esp + 8*esi + 268435456]
-0x62,0xf5,0x7d,0x28,0x42,0x94,0xf4,0x00,0x00,0x00,0x10
+0x62,0xf6,0x7c,0x28,0x42,0x94,0xf4,0x00,0x00,0x00,0x10
 
 # ATT:   vgetexpbf16  291(%edi,%eax,4), %ymm2 {%k7}
 # INTEL: vgetexpbf16 ymm2 {k7}, ymmword ptr [edi + 4*eax + 291]
-0x62,0xf5,0x7d,0x2f,0x42,0x94,0x87,0x23,0x01,0x00,0x00
+0x62,0xf6,0x7c,0x2f,0x42,0x94,0x87,0x23,0x01,0x00,0x00
 
 # ATT:   vgetexpbf16  (%eax){1to16}, %ymm2
 # INTEL: vgetexpbf16 ymm2, word ptr [eax]{1to16}
-0x62,0xf5,0x7d,0x38,0x42,0x10
+0x62,0xf6,0x7c,0x38,0x42,0x10
 
 # ATT:   vgetexpbf16  -1024(,%ebp,2), %ymm2
 # INTEL: vgetexpbf16 ymm2, ymmword ptr [2*ebp - 1024]
-0x62,0xf5,0x7d,0x28,0x42,0x14,0x6d,0x00,0xfc,0xff,0xff
+0x62,0xf6,0x7c,0x28,0x42,0x14,0x6d,0x00,0xfc,0xff,0xff
 
 # ATT:   vgetexpbf16  4064(%ecx), %ymm2 {%k7} {z}
 # INTEL: vgetexpbf16 ymm2 {k7} {z}, ymmword ptr [ecx + 4064]
-0x62,0xf5,0x7d,0xaf,0x42,0x51,0x7f
+0x62,0xf6,0x7c,0xaf,0x42,0x51,0x7f
 
 # ATT:   vgetexpbf16  -256(%edx){1to16}, %ymm2 {%k7} {z}
 # INTEL: vgetexpbf16 ymm2 {k7} {z}, word ptr [edx - 256]{1to16}
-0x62,0xf5,0x7d,0xbf,0x42,0x52,0x80
+0x62,0xf6,0x7c,0xbf,0x42,0x52,0x80
 
 # ATT:   vgetexpbf16  268435456(%esp,%esi,8), %zmm2
 # INTEL: vgetexpbf16 zmm2, zmmword ptr [esp + 8*esi + 268435456]
-0x62,0xf5,0x7d,0x48,0x42,0x94,0xf4,0x00,0x00,0x00,0x10
+0x62,0xf6,0x7c,0x48,0x42,0x94,0xf4,0x00,0x00,0x00,0x10
 
 # ATT:   vgetexpbf16  291(%edi,%eax,4), %zmm2 {%k7}
 # INTEL: vgetexpbf16 zmm2 {k7}, zmmword ptr [edi + 4*eax + 291]
-0x62,0xf5,0x7d,0x4f,0x42,0x94,0x87,0x23,0x01,0x00,0x00
+0x62,0xf6,0x7c,0x4f,0x42,0x94,0x87,0x23,0x01,0x00,0x00
 
 # ATT:   vgetexpbf16  (%eax){1to32}, %zmm2
 # INTEL: vgetexpbf16 zmm2, word ptr [eax]{1to32}
-0x62,0xf5,0x7d,0x58,0x42,0x10
+0x62,0xf6,0x7c,0x58,0x42,0x10
 
 # ATT:   vgetexpbf16  -2048(,%ebp,2), %zmm2
 # INTEL: vgetexpbf16 zmm2, zmmword ptr [2*ebp - 2048]
-0x62,0xf5,0x7d,0x48,0x42,0x14,0x6d,0x00,0xf8,0xff,0xff
+0x62,0xf6,0x7c,0x48,0x42,0x14,0x6d,0x00,0xf8,0xff,0xff
 
 # ATT:   vgetexpbf16  8128(%ecx), %zmm2 {%k7} {z}
 # INTEL: vgetexpbf16 zmm2 {k7} {z}, zmmword ptr [ecx + 8128]
-0x62,0xf5,0x7d,0xcf,0x42,0x51,0x7f
+0x62,0xf6,0x7c,0xcf,0x42,0x51,0x7f
 
 # ATT:   vgetexpbf16  -256(%edx){1to32}, %zmm2 {%k7} {z}
 # INTEL: vgetexpbf16 zmm2 {k7} {z}, word ptr [edx - 256]{1to32}
-0x62,0xf5,0x7d,0xdf,0x42,0x52,0x80
+0x62,0xf6,0x7c,0xdf,0x42,0x52,0x80
 
 # ATT:   vgetmantbf16 $123, %zmm3, %zmm2
 # INTEL: vgetmantbf16 zmm2, zmm3, 123
diff --git a/llvm/test/MC/Disassembler/X86/avx10.2-bf16-64.txt b/llvm/test/MC/Disassembler/X86/avx10.2-bf16-64.txt
index 1319c5cbd0362..197415e5ba329 100644
--- a/llvm/test/MC/Disassembler/X86/avx10.2-bf16-64.txt
+++ b/llvm/test/MC/Disassembler/X86/avx10.2-bf16-64.txt
@@ -1719,111 +1719,111 @@
 
 # ATT:   vgetexpbf16 %xmm23, %xmm22
 # INTEL: vgetexpbf16 xmm22, xmm23
-0x62,0xa5,0x7d,0x08,0x42,0xf7
+0x62,0xa6,0x7c,0x08,0x42,0xf7
 
 # ATT:   vgetexpbf16 %xmm23, %xmm22 {%k7}
 # INTEL: vgetexpbf16 xmm22 {k7}, xmm23
-0x62,0xa5,0x7d,0x0f,0x42,0xf7
+0x62,0xa6,0x7c,0x0f,0x42,0xf7
 
 # ATT:   vgetexpbf16 %xmm23, %xmm22 {%k7} {z}
 # INTEL: vgetexpbf16 xmm22 {k7} {z}, xmm23
-0x62,0xa5,0x7d,0x8f,0x42,0xf7
+0x62,0xa6,0x7c,0x8f,0x42,0xf7
 
 # ATT:   vgetexpbf16 %zmm23, %zmm22
 # INTEL: vgetexpbf16 zmm22, zmm23
-0x62,0xa5,0x7d,0x48,0x42,0xf7
+0x62,0xa6,0x7c,0x48,0x42,0xf7
 
 # ATT:   vgetexpbf16 %zmm23, %zmm22 {%k7}
 # INTEL: vgetexpbf16 zmm22 {k7}, zmm23
-0x62,0xa5,0x7d,0x4f,0x42,0xf7
+0x62,0xa6,0x7c,0x4f,0x42,0xf7
 
 # ATT:   vgetexpbf16 %zmm23, %zmm22 {%k7} {z}
 # INTEL: vgetexpbf16 zmm22 {k7} {z}, zmm23
-0x62,0xa5,0x7d,0xcf,0x42,0xf7
+0x62,0xa6,0x7c,0xcf,0x42,0xf7
 
 # ATT:   vgetexpbf16 %ymm23, %ymm22
 # INTEL: vgetexpbf16 ymm22, ymm23
-0x62,0xa5,0x7d,0x28,0x42,0xf7
+0x62,0xa6,0x7c,0x28,0x42,0xf7
 
 # ATT:   vgetexpbf16 %ymm23, %ymm22 {%k7}
 # INTEL: vgetexpbf16 ymm22 {k7}, ymm23
-0x62,0xa5,0x7d,0x2f,0x42,0xf7
+0x62,0xa6,0x7c,0x2f,0x42,0xf7
 
 # ATT:   vgetexpbf16 %ymm23, %ymm22 {%k7} {z}
 # INTEL: vgetexpbf16 ymm22 {k7} {z}, ymm23
-0x62,0xa5,0x7d,0xaf,0x42,0xf7
+0x62,0xa6,0x7c,0xaf,0x42,0xf7
 
 # ATT:   vgetexpbf16  268435456(%rbp,%r14,8), %xmm22
 # INTEL: vgetexpbf16 xmm22, xmmword ptr [rbp + 8*r14 + 268435456]
-0x62,0xa5,0x7d,0x08,0x42,0xb4,0xf5,0x00,0x00,0x00,0x10
+0x62,0xa6,0x7c,0x08,0x42,0xb4,0xf5,0x00,0x00,0x00,0x10
 
 # ATT:   vgetexpbf16  291(%r8,%rax,4), %xmm22 {%k7}
 # INTEL: vgetexpbf16 xmm22 {k7}, xmmword ptr [r8 + 4*rax + 291]
-0x62,0xc5,0x7d,0x0f,0x42,0xb4,0x80,0x23,0x01,0x00,0x00
+0x62,0xc6,0x7c,0x0f,0x42,0xb4,0x80,0x23,0x01,0x00,0x00
 
 # ATT:   vgetexpbf16  (%rip){1to8}, %xmm22
 # INTEL: vgetexpbf16 xmm22, word ptr [rip]{1to8}
-0x62,0xe5,0x7d,0x18,0x42,0x35,0x00,0x00,0x00,0x00
+0x62,0xe6,0x7c,0x18,0x42,0x35,0x00,0x00,0x00,0x00
 
 # ATT:   vgetexpbf16  -512(,%rbp,2), %xmm22
 # INTEL: vgetexpbf16 xmm22, xmmword ptr [2*rbp - 512]
-0x62,0xe5,0x7d,0x08,0x42,0x34,0x6d,0x00,0xfe,0xff,0xff
+0x62,0xe6,0x7c,0x08,0x42,0x34,0x6d,0x00,0xfe,0xff,0xff
 
 # ATT:   vgetexpbf16  2032(%rcx), %xmm22 {%k7} {z}
 # INTEL: vgetexpbf16 xmm22 {k7} {z}, xmmword ptr [rcx + 2032]
-0x62,0xe5,0x7d,0x8f,0x42,0x71,0x7f
+0x62,0xe6,0x7c,0x8f,0x42,0x71,0x7f
 
 # ATT:   vgetexpbf16  -256(%rdx){1to8}, %xmm22 {%k7} {z}
 # INTEL: vgetexpbf16 xmm22 {k7} {z}, word ptr [rdx - 256]{1to8}
-0x62,0xe5,0x7d,0x9f,0x42,0x72,0x80
+0x62,0xe6,0x7c,0x9f,0x42,0x72,0x80
 
 # ATT:   vgetexpbf16  268435456(%rbp,%r14,8), %ymm22
 # INTEL: vgetexpbf16 ymm22, ymmword ptr [rbp + 8*r14 + 268435456]
-0x62,0xa5,0x7d,0x28,0x42,0xb4,0xf5,0x00,0x00,0x00,0x10
+0x62,0xa6,0x7c,0x28,0x42,0xb4,0xf5,0x00,0x00,0x00,0x10
 
 # ATT:   vgetexpbf16  291(%r8,%rax,4), %ymm22 {%k7}
 # INTEL: vgetexpbf16 ymm22 {k7}, ymmword ptr [r8 + 4*rax + 291]
-0x62,0xc5,0x7d,0x2f,0x42,0xb4,0x80,0x23,0x01,0x00,0x00
+0x62,0xc6,0x7c,0x2f,0x42,0xb4,0x80,0x23,0x01,0x00,0x00
 
 # ATT:   vgetexpbf16  (%rip){1to16}, %ymm22
 # INTEL: vgetexpbf16 ymm22, word ptr [rip]{1to16}
-0x62,0xe5,0x7d,0x38,0x42,0x35,0x00,0x00,0x00,0x00
+0x62,0xe6,0x7c,0x38,0x42,0x35,0x00,0x00,0x00,0x00
 
 # ATT:   vgetexpbf16  -1024(,%rbp,2), %ymm22
 # INTEL: vgetexpbf16 ymm22, ymmword ptr [2*rbp - 1024]
-0x62,0xe5,0x7d,0x28,0x42,0x34,0x6d,0x00,0xfc,0xff,0xff
+0x62,0xe6,0x7c,0x28,0x42,0x34,0x6d,0x00,0xfc,0xff,0xff
 
 # ATT:   vgetexpbf16  4064(%rcx), %ymm22 {%k7} {z}
 # INTEL: vgetexpbf16 ymm22 {k7} {z}, ymmword ptr [rcx + 4064]
-0x62,0xe5,0x7d,0xaf,0x42,0x71,0x7f
+0x62,0xe6,0x7c,0xaf,0x42,0x71,0x7f
 
 # ATT:   vgetexpbf16  -256(%rdx){1to16}, %ymm22 {%k7} {z}
 # INTEL: vgetexpbf16 ymm22 {k7} {z}, word ptr [rd...
[truncated]

Copy link
Contributor

@phoebewang phoebewang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@Amir429206
Copy link

LGTM, thanks!

@Amir429206
Copy link

Address

@e-kud e-kud merged commit 975c208 into llvm:main Mar 25, 2025
14 checks passed
@phoebewang phoebewang added this to the LLVM 20.X Release milestone Mar 25, 2025
@phoebewang
Copy link
Contributor

/cherry-pick 975c208

@llvmbot
Copy link
Member

llvmbot commented Mar 25, 2025

/pull-request #132932

swift-ci pushed a commit to swiftlang/llvm-project that referenced this pull request Mar 25, 2025
…#132824)

Address missing changes:
- V[,U]COMXSD need to have XD (F3.0F –> F2.0F)
- V[,U]COMXS[S,H] need to have XS (F2.[0F,MAP5] -> F3.[0F,MAP5])
- VGETEXPBF16 needs to have T_MAP6 and NP (66.MAP5 -> NP.MAP6)

 Ref.: https://cdrdv2.intel.com/v1/dl/getContent/828965

(cherry picked from commit 975c208)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:X86 mc Machine (object) code
Projects
Development

Successfully merging this pull request may close these issues.

5 participants