Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

release/20.x: [X86][AVX10.2] Include changes for COMX and VGETEXP from rev. 2 (#132824) #132932

Merged
merged 1 commit into from
Mar 25, 2025

Conversation

llvmbot
Copy link
Member

@llvmbot llvmbot commented Mar 25, 2025

Backport 975c208

Requested by: @phoebewang

…#132824)

Address missing changes:
- V[,U]COMXSD need to have XD (F3.0F –> F2.0F)
- V[,U]COMXS[S,H] need to have XS (F2.[0F,MAP5] -> F3.[0F,MAP5])
- VGETEXPBF16 needs to have T_MAP6 and NP (66.MAP5 -> NP.MAP6)

 Ref.: https://cdrdv2.intel.com/v1/dl/getContent/828965

(cherry picked from commit 975c208)
@llvmbot llvmbot added this to the LLVM 20.X Release milestone Mar 25, 2025
@llvmbot
Copy link
Member Author

llvmbot commented Mar 25, 2025

@RKSimon What do you think about merging this PR to the release branch?

@llvmbot
Copy link
Member Author

llvmbot commented Mar 25, 2025

@llvm/pr-subscribers-backend-x86

@llvm/pr-subscribers-mc

Author: None (llvmbot)

Changes

Backport 975c208

Requested by: @phoebewang


Patch is 106.53 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/132932.diff

15 Files Affected:

  • (modified) llvm/lib/Target/X86/X86InstrAVX10.td (+10-10)
  • (modified) llvm/test/CodeGen/X86/avx10_2_512bf16-intrinsics.ll (+2-2)
  • (modified) llvm/test/CodeGen/X86/avx10_2bf16-intrinsics.ll (+10-10)
  • (modified) llvm/test/MC/Disassembler/X86/avx10.2-bf16-32.txt (+27-27)
  • (modified) llvm/test/MC/Disassembler/X86/avx10.2-bf16-64.txt (+27-27)
  • (modified) llvm/test/MC/Disassembler/X86/avx10.2-com-ef-32.txt (+48-48)
  • (modified) llvm/test/MC/Disassembler/X86/avx10.2-com-ef-64.txt (+48-48)
  • (modified) llvm/test/MC/X86/avx10.2-bf16-32-att.s (+27-27)
  • (modified) llvm/test/MC/X86/avx10.2-bf16-32-intel.s (+27-27)
  • (modified) llvm/test/MC/X86/avx10.2-bf16-64-att.s (+27-27)
  • (modified) llvm/test/MC/X86/avx10.2-bf16-64-intel.s (+27-27)
  • (modified) llvm/test/MC/X86/avx10.2-com-ef-32-att.s (+48-48)
  • (modified) llvm/test/MC/X86/avx10.2-com-ef-32-intel.s (+48-48)
  • (modified) llvm/test/MC/X86/avx10.2-com-ef-64-att.s (+48-48)
  • (modified) llvm/test/MC/X86/avx10.2-com-ef-64-intel.s (+48-48)
diff --git a/llvm/lib/Target/X86/X86InstrAVX10.td b/llvm/lib/Target/X86/X86InstrAVX10.td
index 9bb3e364f7c62..37d3b0a67cd33 100644
--- a/llvm/lib/Target/X86/X86InstrAVX10.td
+++ b/llvm/lib/Target/X86/X86InstrAVX10.td
@@ -1468,7 +1468,7 @@ defm VRSQRT  : avx10_fp14_bf16<0x4E, "vrsqrt", X86rsqrt14, SchedWriteFRsqrt>,
 defm VRCP    : avx10_fp14_bf16<0x4C, "vrcp", X86rcp14, SchedWriteFRcp>,
                                 T_MAP6, PS, EVEX_CD8<16, CD8VF>;
 defm VGETEXP : avx10_fp14_bf16<0x42, "vgetexp", X86fgetexp, SchedWriteFRnd>,
-                                T_MAP5, EVEX_CD8<16, CD8VF>;
+                                T_MAP6, PS, EVEX_CD8<16, CD8VF>;
 
 // VSCALEFBF16
 multiclass avx10_fp_scalef_bf16<bits<8> opc, string OpcodeStr,
@@ -1665,31 +1665,31 @@ multiclass avx10_com_ef_int<bits<8> Opc, X86VectorVTInfo _, SDNode OpNode,
 let Defs = [EFLAGS], Uses = [MXCSR], Predicates = [HasAVX10_2] in {
   defm VUCOMXSDZ  :  avx10_com_ef<0x2e, FR64X, f64, X86ucomi512,
                                   "vucomxsd", f64mem, loadf64, SSEPackedDouble>,
-                                  TB, XS, VEX_LIG, REX_W, EVEX_CD8<64, CD8VT1>;
+                                  TB, XD, VEX_LIG, REX_W, EVEX_CD8<64, CD8VT1>;
   defm VUCOMXSHZ  :  avx10_com_ef<0x2e, FR16X, f16, X86ucomi512,
                                   "vucomxsh", f16mem, loadf16, SSEPackedSingle>,
-                                  T_MAP5, XD, EVEX_CD8<16, CD8VT1>;
+                                  T_MAP5, XS, EVEX_CD8<16, CD8VT1>;
   defm VUCOMXSSZ  :  avx10_com_ef<0x2e, FR32X, f32, X86ucomi512,
                                   "vucomxss", f32mem, loadf32, SSEPackedSingle>,
-                                  TB, XD, VEX_LIG, EVEX_CD8<32, CD8VT1>;
+                                  TB, XS, VEX_LIG, EVEX_CD8<32, CD8VT1>;
   defm VCOMXSDZ   :  avx10_com_ef_int<0x2f, v2f64x_info, X86comi512,
                                       "vcomxsd", SSEPackedDouble>,
-                                      TB, XS, VEX_LIG, REX_W, EVEX_CD8<64, CD8VT1>;
+                                      TB, XD, VEX_LIG, REX_W, EVEX_CD8<64, CD8VT1>;
   defm VCOMXSHZ   :  avx10_com_ef_int<0x2f, v8f16x_info, X86comi512,
                                       "vcomxsh", SSEPackedSingle>,
-                                      T_MAP5, XD, EVEX_CD8<16, CD8VT1>;
+                                      T_MAP5, XS, EVEX_CD8<16, CD8VT1>;
   defm VCOMXSSZ   :  avx10_com_ef_int<0x2f, v4f32x_info, X86comi512,
                                       "vcomxss", SSEPackedSingle>,
-                                      TB, XD, VEX_LIG, EVEX_CD8<32, CD8VT1>;
+                                      TB, XS, VEX_LIG, EVEX_CD8<32, CD8VT1>;
   defm VUCOMXSDZ  :  avx10_com_ef_int<0x2e, v2f64x_info, X86ucomi512,
                                       "vucomxsd", SSEPackedDouble>,
-                                      TB, XS, VEX_LIG, REX_W, EVEX_CD8<64, CD8VT1>;
+                                      TB, XD, VEX_LIG, REX_W, EVEX_CD8<64, CD8VT1>;
   defm VUCOMXSHZ  :  avx10_com_ef_int<0x2e, v8f16x_info, X86ucomi512,
                                       "vucomxsh", SSEPackedSingle>,
-                                      T_MAP5, XD, EVEX_CD8<16, CD8VT1>;
+                                      T_MAP5, XS, EVEX_CD8<16, CD8VT1>;
   defm VUCOMXSSZ  :  avx10_com_ef_int<0x2e, v4f32x_info, X86ucomi512,
                                       "vucomxss", SSEPackedSingle>,
-                                      TB, XD, VEX_LIG, EVEX_CD8<32, CD8VT1>;
+                                      TB, XS, VEX_LIG, EVEX_CD8<32, CD8VT1>;
 }
 
 //-------------------------------------------------
diff --git a/llvm/test/CodeGen/X86/avx10_2_512bf16-intrinsics.ll b/llvm/test/CodeGen/X86/avx10_2_512bf16-intrinsics.ll
index da17b995afedf..cbac76e9de273 100644
--- a/llvm/test/CodeGen/X86/avx10_2_512bf16-intrinsics.ll
+++ b/llvm/test/CodeGen/X86/avx10_2_512bf16-intrinsics.ll
@@ -164,7 +164,7 @@ define <32 x bfloat>@test_int_x86_avx512_mask_getexp_bf16_512(<32 x bfloat> %x0,
 ; X64-LABEL: test_int_x86_avx512_mask_getexp_bf16_512:
 ; X64:       # %bb.0:
 ; X64-NEXT:    kmovd %edi, %k1 # encoding: [0xc5,0xfb,0x92,0xcf]
-; X64-NEXT:    vgetexpbf16 %zmm0, %zmm0 # encoding: [0x62,0xf5,0x7d,0x48,0x42,0xc0]
+; X64-NEXT:    vgetexpbf16 %zmm0, %zmm0 # encoding: [0x62,0xf6,0x7c,0x48,0x42,0xc0]
 ; X64-NEXT:    vmovdqu16 %zmm0, %zmm1 {%k1} # encoding: [0x62,0xf1,0xff,0x49,0x6f,0xc8]
 ; X64-NEXT:    vaddbf16 %zmm0, %zmm1, %zmm0 # encoding: [0x62,0xf5,0x75,0x48,0x58,0xc0]
 ; X64-NEXT:    retq # encoding: [0xc3]
@@ -172,7 +172,7 @@ define <32 x bfloat>@test_int_x86_avx512_mask_getexp_bf16_512(<32 x bfloat> %x0,
 ; X86-LABEL: test_int_x86_avx512_mask_getexp_bf16_512:
 ; X86:       # %bb.0:
 ; X86-NEXT:    kmovd {{[0-9]+}}(%esp), %k1 # encoding: [0xc4,0xe1,0xf9,0x90,0x4c,0x24,0x04]
-; X86-NEXT:    vgetexpbf16 %zmm0, %zmm0 # encoding: [0x62,0xf5,0x7d,0x48,0x42,0xc0]
+; X86-NEXT:    vgetexpbf16 %zmm0, %zmm0 # encoding: [0x62,0xf6,0x7c,0x48,0x42,0xc0]
 ; X86-NEXT:    vmovdqu16 %zmm0, %zmm1 {%k1} # encoding: [0x62,0xf1,0xff,0x49,0x6f,0xc8]
 ; X86-NEXT:    vaddbf16 %zmm0, %zmm1, %zmm0 # encoding: [0x62,0xf5,0x75,0x48,0x58,0xc0]
 ; X86-NEXT:    retl # encoding: [0xc3]
diff --git a/llvm/test/CodeGen/X86/avx10_2bf16-intrinsics.ll b/llvm/test/CodeGen/X86/avx10_2bf16-intrinsics.ll
index 06875dbe7cd23..ba32b2adc7999 100644
--- a/llvm/test/CodeGen/X86/avx10_2bf16-intrinsics.ll
+++ b/llvm/test/CodeGen/X86/avx10_2bf16-intrinsics.ll
@@ -333,7 +333,7 @@ declare <16 x bfloat> @llvm.x86.avx10.mask.getexp.bf16.256(<16 x bfloat>, <16 x
 define <8 x bfloat>@test_int_x86_avx512_getexp_bf16_128(<8 x bfloat> %x0) {
 ; CHECK-LABEL: test_int_x86_avx512_getexp_bf16_128:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vgetexpbf16 %xmm0, %xmm0 # encoding: [0x62,0xf5,0x7d,0x08,0x42,0xc0]
+; CHECK-NEXT:    vgetexpbf16 %xmm0, %xmm0 # encoding: [0x62,0xf6,0x7c,0x08,0x42,0xc0]
 ; CHECK-NEXT:    ret{{[l|q]}} # encoding: [0xc3]
   %res = call <8 x bfloat> @llvm.x86.avx10.mask.getexp.bf16.128(<8 x bfloat> %x0, <8 x bfloat> zeroinitializer, i8 -1)
   ret <8 x bfloat> %res
@@ -343,14 +343,14 @@ define <8 x bfloat>@test_int_x86_avx512_mask_getexp_bf16_128(<8 x bfloat> %x0, <
 ; X64-LABEL: test_int_x86_avx512_mask_getexp_bf16_128:
 ; X64:       # %bb.0:
 ; X64-NEXT:    kmovd %edi, %k1 # encoding: [0xc5,0xfb,0x92,0xcf]
-; X64-NEXT:    vgetexpbf16 %xmm0, %xmm1 {%k1} # encoding: [0x62,0xf5,0x7d,0x09,0x42,0xc8]
+; X64-NEXT:    vgetexpbf16 %xmm0, %xmm1 {%k1} # encoding: [0x62,0xf6,0x7c,0x09,0x42,0xc8]
 ; X64-NEXT:    vmovaps %xmm1, %xmm0 # EVEX TO VEX Compression encoding: [0xc5,0xf8,0x28,0xc1]
 ; X64-NEXT:    retq # encoding: [0xc3]
 ;
 ; X86-LABEL: test_int_x86_avx512_mask_getexp_bf16_128:
 ; X86:       # %bb.0:
 ; X86-NEXT:    kmovb {{[0-9]+}}(%esp), %k1 # encoding: [0xc5,0xf9,0x90,0x4c,0x24,0x04]
-; X86-NEXT:    vgetexpbf16 %xmm0, %xmm1 {%k1} # encoding: [0x62,0xf5,0x7d,0x09,0x42,0xc8]
+; X86-NEXT:    vgetexpbf16 %xmm0, %xmm1 {%k1} # encoding: [0x62,0xf6,0x7c,0x09,0x42,0xc8]
 ; X86-NEXT:    vmovaps %xmm1, %xmm0 # EVEX TO VEX Compression encoding: [0xc5,0xf8,0x28,0xc1]
 ; X86-NEXT:    retl # encoding: [0xc3]
   %res = call <8 x bfloat> @llvm.x86.avx10.mask.getexp.bf16.128(<8 x bfloat> %x0, <8 x bfloat> %x1, i8 %x2)
@@ -361,13 +361,13 @@ define <8 x bfloat>@test_int_x86_avx512_maskz_getexp_bf16_128(<8 x bfloat> %x0,
 ; X64-LABEL: test_int_x86_avx512_maskz_getexp_bf16_128:
 ; X64:       # %bb.0:
 ; X64-NEXT:    kmovd %edi, %k1 # encoding: [0xc5,0xfb,0x92,0xcf]
-; X64-NEXT:    vgetexpbf16 %xmm0, %xmm0 {%k1} {z} # encoding: [0x62,0xf5,0x7d,0x89,0x42,0xc0]
+; X64-NEXT:    vgetexpbf16 %xmm0, %xmm0 {%k1} {z} # encoding: [0x62,0xf6,0x7c,0x89,0x42,0xc0]
 ; X64-NEXT:    retq # encoding: [0xc3]
 ;
 ; X86-LABEL: test_int_x86_avx512_maskz_getexp_bf16_128:
 ; X86:       # %bb.0:
 ; X86-NEXT:    kmovb {{[0-9]+}}(%esp), %k1 # encoding: [0xc5,0xf9,0x90,0x4c,0x24,0x04]
-; X86-NEXT:    vgetexpbf16 %xmm0, %xmm0 {%k1} {z} # encoding: [0x62,0xf5,0x7d,0x89,0x42,0xc0]
+; X86-NEXT:    vgetexpbf16 %xmm0, %xmm0 {%k1} {z} # encoding: [0x62,0xf6,0x7c,0x89,0x42,0xc0]
 ; X86-NEXT:    retl # encoding: [0xc3]
   %res = call <8 x bfloat> @llvm.x86.avx10.mask.getexp.bf16.128(<8 x bfloat> %x0, <8 x bfloat> zeroinitializer, i8 %x2)
   ret <8 x bfloat> %res
@@ -376,7 +376,7 @@ define <8 x bfloat>@test_int_x86_avx512_maskz_getexp_bf16_128(<8 x bfloat> %x0,
 define <16 x bfloat>@test_int_x86_avx512_getexp_bf16_256(<16 x bfloat> %x0) {
 ; CHECK-LABEL: test_int_x86_avx512_getexp_bf16_256:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vgetexpbf16 %ymm0, %ymm0 # encoding: [0x62,0xf5,0x7d,0x28,0x42,0xc0]
+; CHECK-NEXT:    vgetexpbf16 %ymm0, %ymm0 # encoding: [0x62,0xf6,0x7c,0x28,0x42,0xc0]
 ; CHECK-NEXT:    ret{{[l|q]}} # encoding: [0xc3]
   %res = call <16 x bfloat> @llvm.x86.avx10.mask.getexp.bf16.256(<16 x bfloat> %x0, <16 x bfloat> zeroinitializer, i16 -1)
   ret <16 x bfloat> %res
@@ -386,14 +386,14 @@ define <16 x bfloat>@test_int_x86_avx512_mask_getexp_bf16_256(<16 x bfloat> %x0,
 ; X64-LABEL: test_int_x86_avx512_mask_getexp_bf16_256:
 ; X64:       # %bb.0:
 ; X64-NEXT:    kmovd %edi, %k1 # encoding: [0xc5,0xfb,0x92,0xcf]
-; X64-NEXT:    vgetexpbf16 %ymm0, %ymm1 {%k1} # encoding: [0x62,0xf5,0x7d,0x29,0x42,0xc8]
+; X64-NEXT:    vgetexpbf16 %ymm0, %ymm1 {%k1} # encoding: [0x62,0xf6,0x7c,0x29,0x42,0xc8]
 ; X64-NEXT:    vmovaps %ymm1, %ymm0 # EVEX TO VEX Compression encoding: [0xc5,0xfc,0x28,0xc1]
 ; X64-NEXT:    retq # encoding: [0xc3]
 ;
 ; X86-LABEL: test_int_x86_avx512_mask_getexp_bf16_256:
 ; X86:       # %bb.0:
 ; X86-NEXT:    kmovw {{[0-9]+}}(%esp), %k1 # encoding: [0xc5,0xf8,0x90,0x4c,0x24,0x04]
-; X86-NEXT:    vgetexpbf16 %ymm0, %ymm1 {%k1} # encoding: [0x62,0xf5,0x7d,0x29,0x42,0xc8]
+; X86-NEXT:    vgetexpbf16 %ymm0, %ymm1 {%k1} # encoding: [0x62,0xf6,0x7c,0x29,0x42,0xc8]
 ; X86-NEXT:    vmovaps %ymm1, %ymm0 # EVEX TO VEX Compression encoding: [0xc5,0xfc,0x28,0xc1]
 ; X86-NEXT:    retl # encoding: [0xc3]
   %res = call <16 x bfloat> @llvm.x86.avx10.mask.getexp.bf16.256(<16 x bfloat> %x0, <16 x bfloat> %x1, i16 %x2)
@@ -404,13 +404,13 @@ define <16 x bfloat>@test_int_x86_avx512_maskz_getexp_bf16_256(<16 x bfloat> %x0
 ; X64-LABEL: test_int_x86_avx512_maskz_getexp_bf16_256:
 ; X64:       # %bb.0:
 ; X64-NEXT:    kmovd %edi, %k1 # encoding: [0xc5,0xfb,0x92,0xcf]
-; X64-NEXT:    vgetexpbf16 %ymm0, %ymm0 {%k1} {z} # encoding: [0x62,0xf5,0x7d,0xa9,0x42,0xc0]
+; X64-NEXT:    vgetexpbf16 %ymm0, %ymm0 {%k1} {z} # encoding: [0x62,0xf6,0x7c,0xa9,0x42,0xc0]
 ; X64-NEXT:    retq # encoding: [0xc3]
 ;
 ; X86-LABEL: test_int_x86_avx512_maskz_getexp_bf16_256:
 ; X86:       # %bb.0:
 ; X86-NEXT:    kmovw {{[0-9]+}}(%esp), %k1 # encoding: [0xc5,0xf8,0x90,0x4c,0x24,0x04]
-; X86-NEXT:    vgetexpbf16 %ymm0, %ymm0 {%k1} {z} # encoding: [0x62,0xf5,0x7d,0xa9,0x42,0xc0]
+; X86-NEXT:    vgetexpbf16 %ymm0, %ymm0 {%k1} {z} # encoding: [0x62,0xf6,0x7c,0xa9,0x42,0xc0]
 ; X86-NEXT:    retl # encoding: [0xc3]
   %res = call <16 x bfloat> @llvm.x86.avx10.mask.getexp.bf16.256(<16 x bfloat> %x0, <16 x bfloat> zeroinitializer, i16 %x2)
   ret <16 x bfloat> %res
diff --git a/llvm/test/MC/Disassembler/X86/avx10.2-bf16-32.txt b/llvm/test/MC/Disassembler/X86/avx10.2-bf16-32.txt
index a32e55e20e6b7..0db70d290e565 100644
--- a/llvm/test/MC/Disassembler/X86/avx10.2-bf16-32.txt
+++ b/llvm/test/MC/Disassembler/X86/avx10.2-bf16-32.txt
@@ -1719,111 +1719,111 @@
 
 # ATT:   vgetexpbf16 %xmm3, %xmm2
 # INTEL: vgetexpbf16 xmm2, xmm3
-0x62,0xf5,0x7d,0x08,0x42,0xd3
+0x62,0xf6,0x7c,0x08,0x42,0xd3
 
 # ATT:   vgetexpbf16 %xmm3, %xmm2 {%k7}
 # INTEL: vgetexpbf16 xmm2 {k7}, xmm3
-0x62,0xf5,0x7d,0x0f,0x42,0xd3
+0x62,0xf6,0x7c,0x0f,0x42,0xd3
 
 # ATT:   vgetexpbf16 %xmm3, %xmm2 {%k7} {z}
 # INTEL: vgetexpbf16 xmm2 {k7} {z}, xmm3
-0x62,0xf5,0x7d,0x8f,0x42,0xd3
+0x62,0xf6,0x7c,0x8f,0x42,0xd3
 
 # ATT:   vgetexpbf16 %zmm3, %zmm2
 # INTEL: vgetexpbf16 zmm2, zmm3
-0x62,0xf5,0x7d,0x48,0x42,0xd3
+0x62,0xf6,0x7c,0x48,0x42,0xd3
 
 # ATT:   vgetexpbf16 %zmm3, %zmm2 {%k7}
 # INTEL: vgetexpbf16 zmm2 {k7}, zmm3
-0x62,0xf5,0x7d,0x4f,0x42,0xd3
+0x62,0xf6,0x7c,0x4f,0x42,0xd3
 
 # ATT:   vgetexpbf16 %zmm3, %zmm2 {%k7} {z}
 # INTEL: vgetexpbf16 zmm2 {k7} {z}, zmm3
-0x62,0xf5,0x7d,0xcf,0x42,0xd3
+0x62,0xf6,0x7c,0xcf,0x42,0xd3
 
 # ATT:   vgetexpbf16 %ymm3, %ymm2
 # INTEL: vgetexpbf16 ymm2, ymm3
-0x62,0xf5,0x7d,0x28,0x42,0xd3
+0x62,0xf6,0x7c,0x28,0x42,0xd3
 
 # ATT:   vgetexpbf16 %ymm3, %ymm2 {%k7}
 # INTEL: vgetexpbf16 ymm2 {k7}, ymm3
-0x62,0xf5,0x7d,0x2f,0x42,0xd3
+0x62,0xf6,0x7c,0x2f,0x42,0xd3
 
 # ATT:   vgetexpbf16 %ymm3, %ymm2 {%k7} {z}
 # INTEL: vgetexpbf16 ymm2 {k7} {z}, ymm3
-0x62,0xf5,0x7d,0xaf,0x42,0xd3
+0x62,0xf6,0x7c,0xaf,0x42,0xd3
 
 # ATT:   vgetexpbf16  268435456(%esp,%esi,8), %xmm2
 # INTEL: vgetexpbf16 xmm2, xmmword ptr [esp + 8*esi + 268435456]
-0x62,0xf5,0x7d,0x08,0x42,0x94,0xf4,0x00,0x00,0x00,0x10
+0x62,0xf6,0x7c,0x08,0x42,0x94,0xf4,0x00,0x00,0x00,0x10
 
 # ATT:   vgetexpbf16  291(%edi,%eax,4), %xmm2 {%k7}
 # INTEL: vgetexpbf16 xmm2 {k7}, xmmword ptr [edi + 4*eax + 291]
-0x62,0xf5,0x7d,0x0f,0x42,0x94,0x87,0x23,0x01,0x00,0x00
+0x62,0xf6,0x7c,0x0f,0x42,0x94,0x87,0x23,0x01,0x00,0x00
 
 # ATT:   vgetexpbf16  (%eax){1to8}, %xmm2
 # INTEL: vgetexpbf16 xmm2, word ptr [eax]{1to8}
-0x62,0xf5,0x7d,0x18,0x42,0x10
+0x62,0xf6,0x7c,0x18,0x42,0x10
 
 # ATT:   vgetexpbf16  -512(,%ebp,2), %xmm2
 # INTEL: vgetexpbf16 xmm2, xmmword ptr [2*ebp - 512]
-0x62,0xf5,0x7d,0x08,0x42,0x14,0x6d,0x00,0xfe,0xff,0xff
+0x62,0xf6,0x7c,0x08,0x42,0x14,0x6d,0x00,0xfe,0xff,0xff
 
 # ATT:   vgetexpbf16  2032(%ecx), %xmm2 {%k7} {z}
 # INTEL: vgetexpbf16 xmm2 {k7} {z}, xmmword ptr [ecx + 2032]
-0x62,0xf5,0x7d,0x8f,0x42,0x51,0x7f
+0x62,0xf6,0x7c,0x8f,0x42,0x51,0x7f
 
 # ATT:   vgetexpbf16  -256(%edx){1to8}, %xmm2 {%k7} {z}
 # INTEL: vgetexpbf16 xmm2 {k7} {z}, word ptr [edx - 256]{1to8}
-0x62,0xf5,0x7d,0x9f,0x42,0x52,0x80
+0x62,0xf6,0x7c,0x9f,0x42,0x52,0x80
 
 # ATT:   vgetexpbf16  268435456(%esp,%esi,8), %ymm2
 # INTEL: vgetexpbf16 ymm2, ymmword ptr [esp + 8*esi + 268435456]
-0x62,0xf5,0x7d,0x28,0x42,0x94,0xf4,0x00,0x00,0x00,0x10
+0x62,0xf6,0x7c,0x28,0x42,0x94,0xf4,0x00,0x00,0x00,0x10
 
 # ATT:   vgetexpbf16  291(%edi,%eax,4), %ymm2 {%k7}
 # INTEL: vgetexpbf16 ymm2 {k7}, ymmword ptr [edi + 4*eax + 291]
-0x62,0xf5,0x7d,0x2f,0x42,0x94,0x87,0x23,0x01,0x00,0x00
+0x62,0xf6,0x7c,0x2f,0x42,0x94,0x87,0x23,0x01,0x00,0x00
 
 # ATT:   vgetexpbf16  (%eax){1to16}, %ymm2
 # INTEL: vgetexpbf16 ymm2, word ptr [eax]{1to16}
-0x62,0xf5,0x7d,0x38,0x42,0x10
+0x62,0xf6,0x7c,0x38,0x42,0x10
 
 # ATT:   vgetexpbf16  -1024(,%ebp,2), %ymm2
 # INTEL: vgetexpbf16 ymm2, ymmword ptr [2*ebp - 1024]
-0x62,0xf5,0x7d,0x28,0x42,0x14,0x6d,0x00,0xfc,0xff,0xff
+0x62,0xf6,0x7c,0x28,0x42,0x14,0x6d,0x00,0xfc,0xff,0xff
 
 # ATT:   vgetexpbf16  4064(%ecx), %ymm2 {%k7} {z}
 # INTEL: vgetexpbf16 ymm2 {k7} {z}, ymmword ptr [ecx + 4064]
-0x62,0xf5,0x7d,0xaf,0x42,0x51,0x7f
+0x62,0xf6,0x7c,0xaf,0x42,0x51,0x7f
 
 # ATT:   vgetexpbf16  -256(%edx){1to16}, %ymm2 {%k7} {z}
 # INTEL: vgetexpbf16 ymm2 {k7} {z}, word ptr [edx - 256]{1to16}
-0x62,0xf5,0x7d,0xbf,0x42,0x52,0x80
+0x62,0xf6,0x7c,0xbf,0x42,0x52,0x80
 
 # ATT:   vgetexpbf16  268435456(%esp,%esi,8), %zmm2
 # INTEL: vgetexpbf16 zmm2, zmmword ptr [esp + 8*esi + 268435456]
-0x62,0xf5,0x7d,0x48,0x42,0x94,0xf4,0x00,0x00,0x00,0x10
+0x62,0xf6,0x7c,0x48,0x42,0x94,0xf4,0x00,0x00,0x00,0x10
 
 # ATT:   vgetexpbf16  291(%edi,%eax,4), %zmm2 {%k7}
 # INTEL: vgetexpbf16 zmm2 {k7}, zmmword ptr [edi + 4*eax + 291]
-0x62,0xf5,0x7d,0x4f,0x42,0x94,0x87,0x23,0x01,0x00,0x00
+0x62,0xf6,0x7c,0x4f,0x42,0x94,0x87,0x23,0x01,0x00,0x00
 
 # ATT:   vgetexpbf16  (%eax){1to32}, %zmm2
 # INTEL: vgetexpbf16 zmm2, word ptr [eax]{1to32}
-0x62,0xf5,0x7d,0x58,0x42,0x10
+0x62,0xf6,0x7c,0x58,0x42,0x10
 
 # ATT:   vgetexpbf16  -2048(,%ebp,2), %zmm2
 # INTEL: vgetexpbf16 zmm2, zmmword ptr [2*ebp - 2048]
-0x62,0xf5,0x7d,0x48,0x42,0x14,0x6d,0x00,0xf8,0xff,0xff
+0x62,0xf6,0x7c,0x48,0x42,0x14,0x6d,0x00,0xf8,0xff,0xff
 
 # ATT:   vgetexpbf16  8128(%ecx), %zmm2 {%k7} {z}
 # INTEL: vgetexpbf16 zmm2 {k7} {z}, zmmword ptr [ecx + 8128]
-0x62,0xf5,0x7d,0xcf,0x42,0x51,0x7f
+0x62,0xf6,0x7c,0xcf,0x42,0x51,0x7f
 
 # ATT:   vgetexpbf16  -256(%edx){1to32}, %zmm2 {%k7} {z}
 # INTEL: vgetexpbf16 zmm2 {k7} {z}, word ptr [edx - 256]{1to32}
-0x62,0xf5,0x7d,0xdf,0x42,0x52,0x80
+0x62,0xf6,0x7c,0xdf,0x42,0x52,0x80
 
 # ATT:   vgetmantbf16 $123, %zmm3, %zmm2
 # INTEL: vgetmantbf16 zmm2, zmm3, 123
diff --git a/llvm/test/MC/Disassembler/X86/avx10.2-bf16-64.txt b/llvm/test/MC/Disassembler/X86/avx10.2-bf16-64.txt
index 1319c5cbd0362..197415e5ba329 100644
--- a/llvm/test/MC/Disassembler/X86/avx10.2-bf16-64.txt
+++ b/llvm/test/MC/Disassembler/X86/avx10.2-bf16-64.txt
@@ -1719,111 +1719,111 @@
 
 # ATT:   vgetexpbf16 %xmm23, %xmm22
 # INTEL: vgetexpbf16 xmm22, xmm23
-0x62,0xa5,0x7d,0x08,0x42,0xf7
+0x62,0xa6,0x7c,0x08,0x42,0xf7
 
 # ATT:   vgetexpbf16 %xmm23, %xmm22 {%k7}
 # INTEL: vgetexpbf16 xmm22 {k7}, xmm23
-0x62,0xa5,0x7d,0x0f,0x42,0xf7
+0x62,0xa6,0x7c,0x0f,0x42,0xf7
 
 # ATT:   vgetexpbf16 %xmm23, %xmm22 {%k7} {z}
 # INTEL: vgetexpbf16 xmm22 {k7} {z}, xmm23
-0x62,0xa5,0x7d,0x8f,0x42,0xf7
+0x62,0xa6,0x7c,0x8f,0x42,0xf7
 
 # ATT:   vgetexpbf16 %zmm23, %zmm22
 # INTEL: vgetexpbf16 zmm22, zmm23
-0x62,0xa5,0x7d,0x48,0x42,0xf7
+0x62,0xa6,0x7c,0x48,0x42,0xf7
 
 # ATT:   vgetexpbf16 %zmm23, %zmm22 {%k7}
 # INTEL: vgetexpbf16 zmm22 {k7}, zmm23
-0x62,0xa5,0x7d,0x4f,0x42,0xf7
+0x62,0xa6,0x7c,0x4f,0x42,0xf7
 
 # ATT:   vgetexpbf16 %zmm23, %zmm22 {%k7} {z}
 # INTEL: vgetexpbf16 zmm22 {k7} {z}, zmm23
-0x62,0xa5,0x7d,0xcf,0x42,0xf7
+0x62,0xa6,0x7c,0xcf,0x42,0xf7
 
 # ATT:   vgetexpbf16 %ymm23, %ymm22
 # INTEL: vgetexpbf16 ymm22, ymm23
-0x62,0xa5,0x7d,0x28,0x42,0xf7
+0x62,0xa6,0x7c,0x28,0x42,0xf7
 
 # ATT:   vgetexpbf16 %ymm23, %ymm22 {%k7}
 # INTEL: vgetexpbf16 ymm22 {k7}, ymm23
-0x62,0xa5,0x7d,0x2f,0x42,0xf7
+0x62,0xa6,0x7c,0x2f,0x42,0xf7
 
 # ATT:   vgetexpbf16 %ymm23, %ymm22 {%k7} {z}
 # INTEL: vgetexpbf16 ymm22 {k7} {z}, ymm23
-0x62,0xa5,0x7d,0xaf,0x42,0xf7
+0x62,0xa6,0x7c,0xaf,0x42,0xf7
 
 # ATT:   vgetexpbf16  268435456(%rbp,%r14,8), %xmm22
 # INTEL: vgetexpbf16 xmm22, xmmword ptr [rbp + 8*r14 + 268435456]
-0x62,0xa5,0x7d,0x08,0x42,0xb4,0xf5,0x00,0x00,0x00,0x10
+0x62,0xa6,0x7c,0x08,0x42,0xb4,0xf5,0x00,0x00,0x00,0x10
 
 # ATT:   vgetexpbf16  291(%r8,%rax,4), %xmm22 {%k7}
 # INTEL: vgetexpbf16 xmm22 {k7}, xmmword ptr [r8 + 4*rax + 291]
-0x62,0xc5,0x7d,0x0f,0x42,0xb4,0x80,0x23,0x01,0x00,0x00
+0x62,0xc6,0x7c,0x0f,0x42,0xb4,0x80,0x23,0x01,0x00,0x00
 
 # ATT:   vgetexpbf16  (%rip){1to8}, %xmm22
 # INTEL: vgetexpbf16 xmm22, word ptr [rip]{1to8}
-0x62,0xe5,0x7d,0x18,0x42,0x35,0x00,0x00,0x00,0x00
+0x62,0xe6,0x7c,0x18,0x42,0x35,0x00,0x00,0x00,0x00
 
 # ATT:   vgetexpbf16  -512(,%rbp,2), %xmm22
 # INTEL: vgetexpbf16 xmm22, xmmword ptr [2*rbp - 512]
-0x62,0xe5,0x7d,0x08,0x42,0x34,0x6d,0x00,0xfe,0xff,0xff
+0x62,0xe6,0x7c,0x08,0x42,0x34,0x6d,0x00,0xfe,0xff,0xff
 
 # ATT:   vgetexpbf16  2032(%rcx), %xmm22 {%k7} {z}
 # INTEL: vgetexpbf16 xmm22 {k7} {z}, xmmword ptr [rcx + 2032]
-0x62,0xe5,0x7d,0x8f,0x42,0x71,0x7f
+0x62,0xe6,0x7c,0x8f,0x42,0x71,0x7f
 
 # ATT:   vgetexpbf16  -256(%rdx){1to8}, %xmm22 {%k7} {z}
 # INTEL: vgetexpbf16 xmm22 {k7} {z}, word ptr [rdx - 256]{1to8}
-0x62,0xe5,0x7d,0x9f,0x42,0x72,0x80
+0x62,0xe6,0x7c,0x9f,0x42,0x72,0x80
 
 # ATT:   vgetexpbf16  268435456(%rbp,%r14,8), %ymm22
 # INTEL: vgetexpbf16 ymm22, ymmword ptr [rbp + 8*r14 + 268435456]
-0x62,0xa5,0x7d,0x28,0x42,0xb4,0xf5,0x00,0x00,0x00,0x10
+0x62,0xa6,0x7c,0x28,0x42,0xb4,0xf5,0x00,0x00,0x00,0x10
 
 # ATT:   vgetexpbf16  291(%r8,%rax,4), %ymm22 {%k7}
 # INTEL: vgetexpbf16 ymm22 {k7}, ymmword ptr [r8 + 4*rax + 291]
-0x62,0xc5,0x7d,0x2f,0x42,0xb4,0x80,0x23,0x01,0x00,0x00
+0x62,0xc6,0x7c,0x2f,0x42,0xb4,0x80,0x23,0x01,0x00,0x00
 
 # ATT:   vgetexpbf16  (%rip){1to16}, %ymm22
 # INTEL: vgetexpbf16 ymm22, word ptr [rip]{1to16}
-0x62,0xe5,0x7d,0x38,0x42,0x35,0x00,0x00,0x00,0x00
+0x62,0xe6,0x7c,0x38,0x42,0x35,0x00,0x00,0x00,0x00
 
 # ATT:   vgetexpbf16  -1024(,%rbp,2), %ymm22
 # INTEL: vgetexpbf16 ymm22, ymmword ptr [2*rbp - 1024]
-0x62,0xe5,0x7d,0x28,0x42,0x34,0x6d,0x00,0xfc,0xff,0xff
+0x62,0xe6,0x7c,0x28,0x42,0x34,0x6d,0x00,0xfc,0xff,0xff
 
 # ATT:   vgetexpbf16  4064(%rcx), %ymm22 {%k7} {z}
 # INTEL: vgetexpbf16 ymm22 {k7} {z}, ymmword ptr [rcx + 4064]
-0x62,0xe5,0x7d,0xaf,0x42,0x71,0x7f
+0x62,0xe6,0x7c,0xaf,0x42,0x71,0x7f
 
 # ATT:   vgetexpbf16  -256(%rdx){1to16}, %ymm22 {%k7} {z}
 # INTEL: vgetexpbf16 ymm22 {k7} {z}, word ptr [rd...
[truncated]

Copy link
Collaborator

@RKSimon RKSimon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tstellar tstellar merged commit 9710e99 into llvm:release/20.x Mar 25, 2025
18 checks passed
Copy link

@phoebewang (or anyone else). If you would like to add a note about this fix in the release notes (completely optional). Please reply to this comment with a one or two sentence description of the fix. When you are done, please add the release:note label to this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:X86 mc Machine (object) code
Projects
Development

Successfully merging this pull request may close these issues.

4 participants