Skip to content

[AArch64] Generate zeroing forms of certain SVE2.2 instructions (3/11) #116829

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

momchil-velikov
Copy link
Collaborator

SVE2.2 introduces instructions with predicated forms with zeroing of
the inactive lanes. This allows in some cases to save a movprfx or
a mov instruction when emitting code for _x or _z variants of
intrinsics.

This patch adds support for emitting the zeroing forms of certain
FCVTLT, and FCVTX instructions.

@llvmbot
Copy link
Member

llvmbot commented Nov 19, 2024

@llvm/pr-subscribers-backend-aarch64

Author: Momchil Velikov (momchil-velikov)

Changes

SVE2.2 introduces instructions with predicated forms with zeroing of
the inactive lanes. This allows in some cases to save a movprfx or
a mov instruction when emitting code for _x or _z variants of
intrinsics.

This patch adds support for emitting the zeroing forms of certain
FCVTLT, and FCVTX instructions.


Patch is 59.65 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/116829.diff

6 Files Affected:

  • (modified) llvm/lib/Target/AArch64/AArch64InstrInfo.td (+3)
  • (modified) llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td (+16-8)
  • (modified) llvm/lib/Target/AArch64/SVEInstrFormats.td (+67-21)
  • (added) llvm/test/CodeGen/AArch64/zeroing-forms-abs-neg.ll (+666)
  • (added) llvm/test/CodeGen/AArch64/zeroing-forms-fcvt-bfcvt.ll (+330)
  • (added) llvm/test/CodeGen/AArch64/zeroing-forms-fcvtlt-fcvtx.ll (+147)
diff --git a/llvm/lib/Target/AArch64/AArch64InstrInfo.td b/llvm/lib/Target/AArch64/AArch64InstrInfo.td
index e4ad27d4bcfc00..f7121373593fbd 100644
--- a/llvm/lib/Target/AArch64/AArch64InstrInfo.td
+++ b/llvm/lib/Target/AArch64/AArch64InstrInfo.td
@@ -381,6 +381,9 @@ def NoUseScalarIncVL : Predicate<"!Subtarget->useScalarIncVL()">;
 
 def UseSVEFPLD1R : Predicate<"!Subtarget->noSVEFPLD1R()">;
 
+def UseUnaryUndefPseudos
+  : Predicate<"!(Subtarget->isSVEorStreamingSVEAvailable() && (Subtarget->hasSVE2p2() || Subtarget->hasSME2p2()))">;
+
 def AArch64LocalRecover : SDNode<"ISD::LOCAL_RECOVER",
                                   SDTypeProfile<1, 1, [SDTCisSameAs<0, 1>,
                                                        SDTCisInt<1>]>>;
diff --git a/llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td b/llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
index 4f146b3ee59e9a..07320d040c0c10 100644
--- a/llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
+++ b/llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
@@ -664,6 +664,14 @@ let Predicates = [HasSVEorSME] in {
   defm FABS_ZPmZ : sve_int_un_pred_arit_bitwise_fp<0b100, "fabs", AArch64fabs_mt>;
   defm FNEG_ZPmZ : sve_int_un_pred_arit_bitwise_fp<0b101, "fneg", AArch64fneg_mt>;
 
+  let Predicates = [HasSVEorSME, UseUnaryUndefPseudos] in {
+    defm FABS_ZPmZ : sve_fp_un_pred_arit_hsd<AArch64fabs_mt>;
+    defm FNEG_ZPmZ : sve_fp_un_pred_arit_hsd<AArch64fneg_mt>;
+
+    defm ABS_ZPmZ : sve_int_un_pred_arit_bhsd<AArch64abs_mt>;
+    defm NEG_ZPmZ : sve_int_un_pred_arit_bhsd<AArch64neg_mt>;
+  }
+
   foreach VT = [nxv2bf16, nxv4bf16, nxv8bf16] in {
     // No dedicated instruction, so just clear the sign bit.
     def : Pat<(VT (fabs VT:$op)),
@@ -4246,21 +4254,21 @@ defm TBLQ_ZZZ  : sve2p1_tblq<"tblq", int_aarch64_sve_tblq>;
 //===----------------------------------------------------------------------===//
 let Predicates = [HasSVE2p2orSME2p2] in {
   // SVE Floating-point convert precision, zeroing predicate
-  defm FCVT_ZPzZ : sve_fp_z2op_p_zd_b_0<"fcvt">;
+  defm FCVT_ZPzZ : sve_fp_z2op_p_zd_b_0<"fcvt", "int_aarch64_sve_fcvt">;
 
   // SVE2p2 floating-point convert precision down (placing odd), zeroing predicate
   defm FCVTNT_ZPzZ      : sve_fp_fcvtntz<"fcvtnt">;
   def FCVTXNT_ZPzZ_DtoS : sve_fp_fcvt2z<0b0010, "fcvtxnt", ZPR32, ZPR64>;
   // Placing even
-  def FCVTX_ZPzZ_DtoS   : sve_fp_z2op_p_zd<0b0001010, "fcvtx", ZPR64, ZPR32>;
+  defm FCVTX_ZPzZ       : sve_fp_z2op_p_zd<"fcvtx", int_aarch64_sve_fcvtx_f32f64>;
 
   // SVE2p2 floating-point convert precision up, zeroing predicate
-  defm FCVTLT_ZPzZ      : sve_fp_fcvtltz<"fcvtlt">;
+  defm FCVTLT_ZPzZ      : sve_fp_fcvtltz<"fcvtlt", "int_aarch64_sve_fcvtlt">;
 
   // SVE2p2 floating-point convert single-to-bf (placing odd), zeroing predicate
   def BFCVTNT_ZPzZ      : sve_fp_fcvt2z<0b1010, "bfcvtnt", ZPR16, ZPR32>;
   // Placing corresponding
-  def BFCVT_ZPzZ_StoH   : sve_fp_z2op_p_zd<0b1001010, "bfcvt", ZPR32, ZPR16>;
+  defm BFCVT_ZPzZ_StoH  : sve_fp_z2op_p_zd_bfcvt<0b1001010, "bfcvt", int_aarch64_sve_fcvt_bf16f32_v2>;
 
   // Floating-point convert to integer, zeroing predicate
   defm FCVTZS_ZPzZ : sve_fp_z2op_p_zd_d<0b0, "fcvtzs">;
@@ -4310,16 +4318,16 @@ let Predicates = [HasSVE2p2orSME2p2] in {
   defm NOT_ZPzZ  : sve_int_un_pred_arit_bitwise_z<0b110, "not">;
 
   // floating point
-  defm FABS_ZPzZ : sve_int_un_pred_arit_bitwise_fp_z<0b100, "fabs">;
-  defm FNEG_ZPzZ : sve_int_un_pred_arit_bitwise_fp_z<0b101, "fneg">;
+  defm FABS_ZPzZ : sve_int_un_pred_arit_bitwise_fp_z<0b100, "fabs", AArch64fabs_mt>;
+  defm FNEG_ZPzZ : sve_int_un_pred_arit_bitwise_fp_z<0b101, "fneg", AArch64fneg_mt>;
 
   // SVE2p2 integer unary arithmetic, zeroing predicate
   defm SXTB_ZPzZ  : sve_int_un_pred_arit_h_z<0b000, "sxtb">;
   defm UXTB_ZPzZ  : sve_int_un_pred_arit_h_z<0b001, "uxtb">;
   defm SXTH_ZPzZ  : sve_int_un_pred_arit_w_z<0b010, "sxth">;
   defm UXTH_ZPzZ  : sve_int_un_pred_arit_w_z<0b011, "uxth">;
-  defm ABS_ZPzZ   : sve_int_un_pred_arit_z<  0b110, "abs">;
-  defm NEG_ZPzZ   : sve_int_un_pred_arit_z<  0b111, "neg">;
+  defm ABS_ZPzZ   : sve_int_un_pred_arit_z<  0b110, "abs", AArch64abs_mt>;
+  defm NEG_ZPzZ   : sve_int_un_pred_arit_z<  0b111, "neg", AArch64neg_mt>;
   def SXTW_ZPzZ_D : sve_int_un_pred_arit_z<0b11, 0b1000, "sxtw", ZPR64>;
   def UXTW_ZPzZ_D : sve_int_un_pred_arit_z<0b11, 0b1010, "uxtw", ZPR64>;
 
diff --git a/llvm/lib/Target/AArch64/SVEInstrFormats.td b/llvm/lib/Target/AArch64/SVEInstrFormats.td
index 6de6aed3b2a816..067b0724b46139 100644
--- a/llvm/lib/Target/AArch64/SVEInstrFormats.td
+++ b/llvm/lib/Target/AArch64/SVEInstrFormats.td
@@ -482,6 +482,8 @@ let Predicates = [HasSVEorSME] in {
 //===----------------------------------------------------------------------===//
 // SVE pattern match helpers.
 //===----------------------------------------------------------------------===//
+def SVEDup0 : ComplexPattern<vAny, 0, "SelectDupZero", []>;
+def SVEDup0Undef : ComplexPattern<vAny, 0, "SelectDupZeroOrUndef", []>;
 
 class SVE_1_Op_Pat<ValueType vtd, SDPatternOperator op, ValueType vt1,
                    Instruction inst>
@@ -502,6 +504,11 @@ multiclass SVE_1_Op_PassthruUndef_Pat<ValueType vtd, SDPatternOperator op, Value
             (inst $Op3, $Op1, $Op2)>;
 }
 
+class SVE_1_Op_PassthruUndefZero_Pat<ValueType vtd, SDPatternOperator op, ValueType pg,
+                   ValueType vts, Instruction inst>
+  : Pat<(vtd (op pg:$Op1, vts:$Op2, (vtd (SVEDup0Undef)))),
+        (inst $Op1, $Op2)>;
+
 // Used to match FP_ROUND_MERGE_PASSTHRU, which has an additional flag for the
 // type of rounding. This is matched by timm0_1 in pattern below and ignored.
 class SVE_1_Op_Passthru_Round_Pat<ValueType vtd, SDPatternOperator op, ValueType pg,
@@ -517,8 +524,6 @@ multiclass SVE_1_Op_PassthruUndef_Round_Pat<ValueType vtd, SDPatternOperator op,
             (inst $Op3, $Op1, $Op2)>;
 }
 
-def SVEDup0 : ComplexPattern<vAny, 0, "SelectDupZero", []>;
-
 class SVE_1_Op_PassthruZero_Pat<ValueType vtd, SDPatternOperator op, ValueType vt1,
                    ValueType vt2, Instruction inst>
    : Pat<(vtd (op (vtd (SVEDup0)), vt1:$Op1, vt2:$Op2)),
@@ -571,6 +576,11 @@ multiclass SVE_3_Op_Undef_Pat<ValueType vtd, SDPatternOperator op, ValueType vt1
             (inst $Op1, $Op2, $Op3)>;
 }
 
+class SVE_3_Op_UndefZero_Pat<ValueType vtd, SDPatternOperator op, ValueType vt1,
+                             ValueType vt2, ValueType vt3, Instruction inst>
+  : Pat<(vtd (op (vt1 (SVEDup0Undef)), vt2:$Op1, vt3:$Op2)),
+        (inst $Op1, $Op2)>;
+
 class SVE_4_Op_Pat<ValueType vtd, SDPatternOperator op, ValueType vt1,
                    ValueType vt2, ValueType vt3, ValueType vt4,
                    Instruction inst>
@@ -606,8 +616,6 @@ class SVE_4_Op_Imm_Pat<ValueType vtd, SDPatternOperator op, ValueType vt1,
 : Pat<(vtd (op vt1:$Op1, vt2:$Op2, vt3:$Op3, (vt4 ImmTy:$Op4))),
       (inst $Op1, $Op2, $Op3, ImmTy:$Op4)>;
 
-def SVEDup0Undef : ComplexPattern<vAny, 0, "SelectDupZeroOrUndef", []>;
-
 let AddedComplexity = 1 in {
 class SVE_3_Op_Pat_SelZero<ValueType vtd, SDPatternOperator op, ValueType vt1,
                    ValueType vt2, ValueType vt3, Instruction inst>
@@ -2850,9 +2858,12 @@ multiclass sve_fp_fcvtntz<string asm> {
   def _DtoS : sve_fp_fcvt2z<0b1110, asm,  ZPR32, ZPR64>;
 }
 
-multiclass sve_fp_fcvtltz<string asm> {
+multiclass sve_fp_fcvtltz<string asm, string op> {
   def _HtoS  : sve_fp_fcvt2z<0b1001, asm,  ZPR32, ZPR16>;
   def _StoD  : sve_fp_fcvt2z<0b1111, asm,  ZPR64, ZPR32>;
+
+  def : SVE_3_Op_UndefZero_Pat<nxv4f32, !cast<SDPatternOperator>(op # _f32f16), nxv4f32, nxv4i1, nxv8f16, !cast<Instruction>(NAME # _HtoS)>;
+  def : SVE_3_Op_UndefZero_Pat<nxv2f64, !cast<SDPatternOperator>(op # _f64f32), nxv2f64, nxv2i1, nxv4f32, !cast<Instruction>(NAME # _StoD)>;
 }
 
 //===----------------------------------------------------------------------===//
@@ -3259,6 +3270,12 @@ class sve_fp_z2op_p_zd<bits<7> opc,string asm, RegisterOperand i_zprtype,
   let mayRaiseFPException = 1;
 }
 
+multiclass sve_fp_z2op_p_zd<string asm, SDPatternOperator op> {
+  def _DtoS : sve_fp_z2op_p_zd<0b0001010, asm, ZPR64, ZPR32>;
+
+  def : SVE_3_Op_UndefZero_Pat<nxv4f32, op, nxv4f32, nxv2i1, nxv2f64, !cast<Instruction>(NAME # _DtoS)>;
+}
+
 multiclass sve_fp_z2op_p_zd_hsd<bits<5> opc, string asm> {
   def _H : sve_fp_z2op_p_zd<{ 0b01, opc }, asm, ZPR16, ZPR16>;
   def _S : sve_fp_z2op_p_zd<{ 0b10, opc }, asm, ZPR32, ZPR32>;
@@ -3270,6 +3287,12 @@ multiclass sve_fp_z2op_p_zd_frint<bits<2> opc, string asm> {
   def _D : sve_fp_z2op_p_zd<{ 0b0010, opc{1}, 1, opc{0} }, asm, ZPR64, ZPR64>;
 }
 
+multiclass sve_fp_z2op_p_zd_bfcvt<bits<7> opc, string asm, SDPatternOperator op> {
+  def _StoH : sve_fp_z2op_p_zd<opc, asm, ZPR32, ZPR16>;
+
+  def : SVE_3_Op_UndefZero_Pat<nxv8bf16, op, nxv8bf16, nxv4i1, nxv4f32, !cast<Instruction>(NAME # _StoH)>;
+}
+
 multiclass sve_fp_z2op_p_zd_d<bit U, string asm> {
   def _HtoH : sve_fp_z2op_p_zd<{ 0b011101, U }, asm, ZPR16, ZPR16>;
   def _HtoS : sve_fp_z2op_p_zd<{ 0b011110, U }, asm, ZPR16, ZPR32>;
@@ -3296,13 +3319,20 @@ multiclass sve_fp_z2op_p_zd_d_flogb<string asm> {
   def _D : sve_fp_z2op_p_zd<0b0011011, asm, ZPR64, ZPR64>;
 }
 
-multiclass sve_fp_z2op_p_zd_b_0<string asm> {
+multiclass sve_fp_z2op_p_zd_b_0<string asm, string op> {
   def _StoH : sve_fp_z2op_p_zd<0b1001000, asm, ZPR32, ZPR16>;
   def _HtoS : sve_fp_z2op_p_zd<0b1001001, asm, ZPR16, ZPR32>;
   def _DtoH : sve_fp_z2op_p_zd<0b1101000, asm, ZPR64, ZPR16>;
   def _HtoD : sve_fp_z2op_p_zd<0b1101001, asm, ZPR16, ZPR64>;
   def _DtoS : sve_fp_z2op_p_zd<0b1101010, asm, ZPR64, ZPR32>;
   def _StoD : sve_fp_z2op_p_zd<0b1101011, asm, ZPR32, ZPR64>;
+
+  def : SVE_3_Op_UndefZero_Pat<nxv8f16, !cast<SDPatternOperator>(op # _f16f32), nxv8f16, nxv4i1, nxv4f32, !cast<Instruction>(NAME # _StoH)>;
+  def : SVE_3_Op_UndefZero_Pat<nxv8f16, !cast<SDPatternOperator>(op # _f16f64), nxv8f16, nxv2i1, nxv2f64, !cast<Instruction>(NAME # _DtoH)>;
+  def : SVE_3_Op_UndefZero_Pat<nxv4f32, !cast<SDPatternOperator>(op # _f32f64), nxv4f32, nxv2i1, nxv2f64, !cast<Instruction>(NAME # _DtoS)>;
+  def : SVE_3_Op_UndefZero_Pat<nxv4f32, !cast<SDPatternOperator>(op # _f32f16), nxv4f32, nxv4i1, nxv8f16, !cast<Instruction>(NAME # _HtoS)>;
+  def : SVE_3_Op_UndefZero_Pat<nxv2f64, !cast<SDPatternOperator>(op # _f64f16), nxv2f64, nxv2i1, nxv8f16, !cast<Instruction>(NAME # _HtoD)>;
+  def : SVE_3_Op_UndefZero_Pat<nxv2f64, !cast<SDPatternOperator>(op # _f64f32), nxv2f64, nxv2i1, nxv4f32, !cast<Instruction>(NAME # _StoD)>;
 }
 
 //===----------------------------------------------------------------------===//
@@ -4820,23 +4850,18 @@ multiclass sve_int_un_pred_arit<bits<3> opc, string asm,
   def : SVE_1_Op_Passthru_Pat<nxv8i16, op, nxv8i1,  nxv8i16, !cast<Instruction>(NAME # _H)>;
   def : SVE_1_Op_Passthru_Pat<nxv4i32, op, nxv4i1,  nxv4i32, !cast<Instruction>(NAME # _S)>;
   def : SVE_1_Op_Passthru_Pat<nxv2i64, op, nxv2i1,  nxv2i64, !cast<Instruction>(NAME # _D)>;
-
-  def _B_UNDEF : PredOneOpPassthruPseudo<NAME # _B, ZPR8>;
-  def _H_UNDEF : PredOneOpPassthruPseudo<NAME # _H, ZPR16>;
-  def _S_UNDEF : PredOneOpPassthruPseudo<NAME # _S, ZPR32>;
-  def _D_UNDEF : PredOneOpPassthruPseudo<NAME # _D, ZPR64>;
-
-  defm : SVE_1_Op_PassthruUndef_Pat<nxv16i8, op, nxv16i1, nxv16i8, !cast<Pseudo>(NAME # _B_UNDEF)>;
-  defm : SVE_1_Op_PassthruUndef_Pat<nxv8i16, op, nxv8i1,  nxv8i16, !cast<Pseudo>(NAME # _H_UNDEF)>;
-  defm : SVE_1_Op_PassthruUndef_Pat<nxv4i32, op, nxv4i1,  nxv4i32, !cast<Pseudo>(NAME # _S_UNDEF)>;
-  defm : SVE_1_Op_PassthruUndef_Pat<nxv2i64, op, nxv2i1,  nxv2i64, !cast<Pseudo>(NAME # _D_UNDEF)>;
 }
 
-multiclass sve_int_un_pred_arit_z<bits<3> opc, string asm> {
+multiclass sve_int_un_pred_arit_z<bits<3> opc, string asm, SDPatternOperator op> {
   def _B : sve_int_un_pred_arit_z<0b00, { opc, 0b0 }, asm, ZPR8>;
   def _H : sve_int_un_pred_arit_z<0b01, { opc, 0b0 }, asm, ZPR16>;
   def _S : sve_int_un_pred_arit_z<0b10, { opc, 0b0 }, asm, ZPR32>;
   def _D : sve_int_un_pred_arit_z<0b11, { opc, 0b0 }, asm, ZPR64>;
+
+  def : SVE_1_Op_PassthruUndefZero_Pat<nxv16i8, op, nxv16i1, nxv16i8, !cast<Instruction>(NAME # _B)>;
+  def : SVE_1_Op_PassthruUndefZero_Pat<nxv8i16, op, nxv8i1,  nxv8i16, !cast<Instruction>(NAME # _H)>;
+  def : SVE_1_Op_PassthruUndefZero_Pat<nxv4i32, op, nxv4i1,  nxv4i32, !cast<Instruction>(NAME # _S)>;
+  def : SVE_1_Op_PassthruUndefZero_Pat<nxv2i64, op, nxv2i1,  nxv2i64, !cast<Instruction>(NAME # _D)>;
 }
 
 multiclass sve_int_un_pred_arit_h<bits<3> opc, string asm,
@@ -4950,7 +4975,22 @@ multiclass sve_int_un_pred_arit_bitwise_fp<bits<3> opc, string asm,
   def : SVE_1_Op_Passthru_Pat<nxv4f32, op, nxv4i1, nxv4f32, !cast<Instruction>(NAME # _S)>;
   def : SVE_1_Op_Passthru_Pat<nxv2f32, op, nxv2i1, nxv2f32, !cast<Instruction>(NAME # _S)>;
   def : SVE_1_Op_Passthru_Pat<nxv2f64, op, nxv2i1, nxv2f64, !cast<Instruction>(NAME # _D)>;
+}
+
+multiclass sve_int_un_pred_arit_bitwise_fp_z<bits<3> opc, string asm, SDPatternOperator op> {
+  def _H : sve_int_un_pred_arit_z<0b01, { opc, 0b1 }, asm, ZPR16>;
+  def _S : sve_int_un_pred_arit_z<0b10, { opc, 0b1 }, asm, ZPR32>;
+  def _D : sve_int_un_pred_arit_z<0b11, { opc, 0b1 }, asm, ZPR64>;
 
+  def : SVE_1_Op_PassthruUndefZero_Pat<nxv8f16, op, nxv8i1, nxv8f16, !cast<Instruction>(NAME # _H)>;
+  def : SVE_1_Op_PassthruUndefZero_Pat<nxv4f16, op, nxv4i1, nxv4f16, !cast<Instruction>(NAME # _H)>;
+  def : SVE_1_Op_PassthruUndefZero_Pat<nxv2f16, op, nxv2i1, nxv2f16, !cast<Instruction>(NAME # _H)>;
+  def : SVE_1_Op_PassthruUndefZero_Pat<nxv4f32, op, nxv4i1, nxv4f32, !cast<Instruction>(NAME # _S)>;
+  def : SVE_1_Op_PassthruUndefZero_Pat<nxv2f32, op, nxv2i1, nxv2f32, !cast<Instruction>(NAME # _S)>;
+  def : SVE_1_Op_PassthruUndefZero_Pat<nxv2f64, op, nxv2i1, nxv2f64, !cast<Instruction>(NAME # _D)>;
+}
+
+multiclass sve_fp_un_pred_arit_hsd<SDPatternOperator op> {
   def _H_UNDEF : PredOneOpPassthruPseudo<NAME # _H, ZPR16>;
   def _S_UNDEF : PredOneOpPassthruPseudo<NAME # _S, ZPR32>;
   def _D_UNDEF : PredOneOpPassthruPseudo<NAME # _D, ZPR64>;
@@ -4963,10 +5003,16 @@ multiclass sve_int_un_pred_arit_bitwise_fp<bits<3> opc, string asm,
   defm : SVE_1_Op_PassthruUndef_Pat<nxv2f64, op, nxv2i1, nxv2f64, !cast<Pseudo>(NAME # _D_UNDEF)>;
 }
 
-multiclass sve_int_un_pred_arit_bitwise_fp_z<bits<3> opc, string asm> {
-  def _H : sve_int_un_pred_arit_z<0b01, { opc, 0b1 }, asm, ZPR16>;
-  def _S : sve_int_un_pred_arit_z<0b10, { opc, 0b1 }, asm, ZPR32>;
-  def _D : sve_int_un_pred_arit_z<0b11, { opc, 0b1 }, asm, ZPR64>;
+multiclass sve_int_un_pred_arit_bhsd<SDPatternOperator op> {
+  def _B_UNDEF : PredOneOpPassthruPseudo<NAME # _B, ZPR8>;
+  def _H_UNDEF : PredOneOpPassthruPseudo<NAME # _H, ZPR16>;
+  def _S_UNDEF : PredOneOpPassthruPseudo<NAME # _S, ZPR32>;
+  def _D_UNDEF : PredOneOpPassthruPseudo<NAME # _D, ZPR64>;
+
+  defm : SVE_1_Op_PassthruUndef_Pat<nxv16i8, op, nxv16i1, nxv16i8, !cast<Pseudo>(NAME # _B_UNDEF)>;
+  defm : SVE_1_Op_PassthruUndef_Pat<nxv8i16, op, nxv8i1,  nxv8i16, !cast<Pseudo>(NAME # _H_UNDEF)>;
+  defm : SVE_1_Op_PassthruUndef_Pat<nxv4i32, op, nxv4i1,  nxv4i32, !cast<Pseudo>(NAME # _S_UNDEF)>;
+  defm : SVE_1_Op_PassthruUndef_Pat<nxv2i64, op, nxv2i1,  nxv2i64, !cast<Pseudo>(NAME # _D_UNDEF)>;
 }
 
 //===----------------------------------------------------------------------===//
diff --git a/llvm/test/CodeGen/AArch64/zeroing-forms-abs-neg.ll b/llvm/test/CodeGen/AArch64/zeroing-forms-abs-neg.ll
new file mode 100644
index 00000000000000..1caee994220f05
--- /dev/null
+++ b/llvm/test/CodeGen/AArch64/zeroing-forms-abs-neg.ll
@@ -0,0 +1,666 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
+; RUN: llc -mattr=+sve    < %s | FileCheck %s
+; RUN: llc -mattr=+sve2p2 < %s | FileCheck %s -check-prefix CHECK-2p2
+
+; RUN: llc -mattr=+sme    -force-streaming < %s | FileCheck %s
+; RUN: llc -mattr=+sme2p2 -force-streaming < %s | FileCheck %s -check-prefix CHECK-2p2
+
+target triple = "aarch64-linux"
+
+define <vscale x 2 x double> @test_svabs_f64_x_1(<vscale x 2 x i1> %pg, <vscale x 2 x double> %x) {
+; CHECK-LABEL: test_svabs_f64_x_1:
+; CHECK:       // %bb.0: // %entry
+; CHECK-NEXT:    fabs z0.d, p0/m, z0.d
+; CHECK-NEXT:    ret
+;
+; CHECK-2p2-LABEL: test_svabs_f64_x_1:
+; CHECK-2p2:       // %bb.0: // %entry
+; CHECK-2p2-NEXT:    fabs z0.d, p0/z, z0.d
+; CHECK-2p2-NEXT:    ret
+entry:
+  %0 = tail call <vscale x 2 x double> @llvm.aarch64.sve.fabs.nxv2f64(<vscale x 2 x double> undef, <vscale x 2 x i1> %pg, <vscale x 2 x double> %x)
+  ret <vscale x 2 x double> %0
+}
+
+define <vscale x 2 x double> @test_svabs_f64_x_2(<vscale x 2 x i1> %pg, double %z0, <vscale x 2 x double> %x) {
+; CHECK-LABEL: test_svabs_f64_x_2:
+; CHECK:       // %bb.0: // %entry
+; CHECK-NEXT:    movprfx z0, z1
+; CHECK-NEXT:    fabs z0.d, p0/m, z1.d
+; CHECK-NEXT:    ret
+;
+; CHECK-2p2-LABEL: test_svabs_f64_x_2:
+; CHECK-2p2:       // %bb.0: // %entry
+; CHECK-2p2-NEXT:    fabs z0.d, p0/z, z1.d
+; CHECK-2p2-NEXT:    ret
+entry:
+  %0 = tail call <vscale x 2 x double> @llvm.aarch64.sve.fabs.nxv2f64(<vscale x 2 x double> undef, <vscale x 2 x i1> %pg, <vscale x 2 x double> %x)
+  ret <vscale x 2 x double> %0
+}
+
+define <vscale x 2 x double> @test_svabs_f64_z(<vscale x 2 x i1> %pg, double %z0, <vscale x 2 x double> %x) {
+; CHECK-LABEL: test_svabs_f64_z:
+; CHECK:       // %bb.0: // %entry
+; CHECK-NEXT:    mov z0.d, #0 // =0x0
+; CHECK-NEXT:    fabs z0.d, p0/m, z1.d
+; CHECK-NEXT:    ret
+;
+; CHECK-2p2-LABEL: test_svabs_f64_z:
+; CHECK-2p2:       // %bb.0: // %entry
+; CHECK-2p2-NEXT:    fabs z0.d, p0/z, z1.d
+; CHECK-2p2-NEXT:    ret
+entry:
+  %0 = tail call <vscale x 2 x double> @llvm.aarch64.sve.fabs.nxv2f64(<vscale x 2 x double> zeroinitializer, <vscale x 2 x i1> %pg, <vscale x 2 x double> %x)
+  ret <vscale x 2 x double> %0
+}
+
+define <vscale x 4 x float> @test_svabs_f32_x_1(<vscale x 4 x i1> %pg, <vscale x 4 x float> %x) {
+; CHECK-LABEL: test_svabs_f32_x_1:
+; CHECK:       // %bb.0: // %entry
+; CHECK-NEXT:    fabs z0.s, p0/m, z0.s
+; CHECK-NEXT:    ret
+;
+; CHECK-2p2-LABEL: test_svabs_f32_x_1:
+; CHECK-2p2:       // %bb.0: // %entry
+; CHECK-2p2-NEXT:    fabs z0.s, p0/z, z0.s
+; CHECK-2p2-NEXT:    ret
+entry:
+  %0 = tail call <vscale x 4 x float> @llvm.aarch64.sve.fabs.nxv4f32(<vscale x 4 x float> undef, <vscale x 4 x i1> %pg, <vscale x 4 x float> %x)
+  ret <vscale x 4 x float> %0
+}
+
+define <vscale x 4 x float> @test_svabs_f32_x_2(<vscale x 4 x i1> %pg, double %z0, <vscale x 4 x float> %x) {
+; CHECK-LABEL: test_svabs_f32_x_2:
+; CHECK:       // %bb.0: // %entry
+; CHECK-NEXT:    movprfx z0, z1
+; CHECK-NEXT:    fabs z0.s, p0/m, z1.s
+; CHECK-NEXT:    ret
+;
+; CHECK-2p2-LABEL: test_svabs_f32_x_2:
+; CHECK-2p2:       // %bb.0: // %entry
+; CHECK-2p2-NEXT:    fabs z0.s, p0/z, z1.s
+; CHECK-2p2-NEXT:    ret
+entry:
+  %0 = tail call <vscale x 4 x float> @llvm.aarch64.sve.fabs.nxv4f32(<vscale x 4 x float> undef, <vscale x 4 x i1> %pg, <vscale x 4 x float> %x)
+  ret <vscale x 4 x float> %0
+}
+
+define <vscale x 4 x float> @test_svabs_f32_z(<vscale x 4 x i1> %pg, double %z0, <vscale x 4 x float> %x) {
+; CHECK-LABEL: test_svabs_f32_z:
+; CHECK:       // %bb.0: // %entry
+; CHECK-NEXT:    mov z0.s, #0 // =0x0
+; CHECK-NEXT:    fabs z0.s, p0/m, z1.s
+; CHECK-NEXT:    ret
+;
+; CHECK-2p2-LABEL: test_svabs_f32_z:
+; CHECK-2p2:       // %bb.0: // %entry
+; CHECK-2p2-NEXT:    fabs z0.s, p0/z, z1.s
+; CHECK-2p2-NEXT:    ret
+entry:
+  %0 = tail call <vscale x 4 x float> @llvm.aarch64.sve.fabs.nxv4f32(<vscale x 4 x float> zeroinitializer, <vscale x 4 x i1> %pg, <vscale x 4 x float> %x)
+  ret <vscale x 4 x float> %0
+}
+
+define <vscale x 8 x half> @test_svabs_f16_x_1(<vscale x 8 x i1> %pg, <vscale x 8 x half> %x) {
+; CHECK-LABEL: test_svabs_f16_x_1:
+; CHECK:       // %bb.0: // %entry
+; CHECK-NEXT:    fabs z0.h, p0/m, z0.h
+; CHECK-NEXT:    ret
+;
+; CHECK-2p2-LABEL: test_svabs_f16_x_1:
+; CHECK-2p2:       // %bb.0: // %entry
+; CHECK-2p2-NEXT:    fabs z0.h, p0/z, z0.h
+; CHECK-2p2-NEXT:    ret
+entry:
+  %0 = tail call <vscale x 8 x half> @llvm.aarch64.sve.fabs.nxv8f16(<vscale x 8 x half> undef, <vscale x 8 x i1> %pg, <vscale x 8 x half> %x)
+  ret <vscale x 8 x half> %0
+}
+
+define <vscale x 8 x half> @test_svabs_f16_x_2(<vscale x 8 x i1> %pg, double %z0, <vscale x 8 x half> %x) {
+; CHECK-LABEL: test_svabs_f16_x_2:
+; CHECK:       // %bb.0: // %entry
+; CHECK-NEXT:    m...
[truncated]

@momchil-velikov momchil-velikov force-pushed the zeroing-03.1-fcvtlt-fcvtx branch from ca7e545 to 9b0d3b2 Compare December 2, 2024 12:44
@momchil-velikov momchil-velikov merged commit 2474cf7 into llvm:main Dec 2, 2024
8 checks passed
@llvm-ci
Copy link
Collaborator

llvm-ci commented Dec 2, 2024

LLVM Buildbot has detected a new failure on builder libc-x86_64-debian-gcc-fullbuild-dbg running on libc-x86_64-debian-fullbuild while building llvm at step 4 "annotate".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/131/builds/11494

Here is the relevant piece of the build log for the reference
Step 4 (annotate) failure: 'python ../llvm-zorg/zorg/buildbot/builders/annotated/libc-linux.py ...' (failure)
...
[       OK ] LlvmLibcRemapFilePagesTest.ErrorInvalidAddress (4 us)
Ran 3 tests.  PASS: 3  FAIL: 0
[936/1102] Running unit test libc.test.src.sys.mman.linux.shm_test
[==========] Running 2 tests from 1 test suite.
[ RUN      ] LlvmLibcShmTest.Basic
[       OK ] LlvmLibcShmTest.Basic (83 us)
[ RUN      ] LlvmLibcShmTest.NameConversion
[       OK ] LlvmLibcShmTest.NameConversion (19 us)
Ran 2 tests.  PASS: 2  FAIL: 0
[937/1102] Running unit test libc.test.src.sys.mman.linux.process_mrelease_test
FAILED: projects/libc/test/src/sys/mman/linux/CMakeFiles/libc.test.src.sys.mman.linux.process_mrelease_test /home/llvm-libc-buildbot/buildbot-worker/libc-x86_64-debian-fullbuild/libc-x86_64-debian-gcc-fullbuild-dbg/build/projects/libc/test/src/sys/mman/linux/CMakeFiles/libc.test.src.sys.mman.linux.process_mrelease_test 
cd /home/llvm-libc-buildbot/buildbot-worker/libc-x86_64-debian-fullbuild/libc-x86_64-debian-gcc-fullbuild-dbg/build/projects/libc/test/src/sys/mman/linux && /home/llvm-libc-buildbot/buildbot-worker/libc-x86_64-debian-fullbuild/libc-x86_64-debian-gcc-fullbuild-dbg/build/projects/libc/test/src/sys/mman/linux/libc.test.src.sys.mman.linux.process_mrelease_test.__build__
[==========] Running 3 tests from 1 test suite.
[ RUN      ] LlvmLibcProcessMReleaseTest.NoError
/home/llvm-libc-buildbot/buildbot-worker/libc-x86_64-debian-fullbuild/libc-x86_64-debian-gcc-fullbuild-dbg/llvm-project/libc/test/src/sys/mman/linux/process_mrelease_test.cpp:44: FAILURE
Failed to match LIBC_NAMESPACE::process_mrelease(pidfd, 0) against Succeeds().
Expected return value to be equal to 0 but got -1.
Expected errno to be equal to "Success" but got "No such process".
[  FAILED  ] LlvmLibcProcessMReleaseTest.NoError
[ RUN      ] LlvmLibcProcessMReleaseTest.ErrorNotKilled
[       OK ] LlvmLibcProcessMReleaseTest.ErrorNotKilled (161 us)
[ RUN      ] LlvmLibcProcessMReleaseTest.ErrorNonExistingPidfd
[       OK ] LlvmLibcProcessMReleaseTest.ErrorNonExistingPidfd (5 us)
Ran 3 tests.  PASS: 2  FAIL: 1
[938/1102] Running unit test libc.test.src.sys.random.linux.getrandom_test
[==========] Running 4 tests from 1 test suite.
[ RUN      ] LlvmLibcGetRandomTest.InvalidFlag
[       OK ] LlvmLibcGetRandomTest.InvalidFlag (4 us)
[ RUN      ] LlvmLibcGetRandomTest.InvalidBuffer
[       OK ] LlvmLibcGetRandomTest.InvalidBuffer (6 us)
[ RUN      ] LlvmLibcGetRandomTest.ReturnsSize
[       OK ] LlvmLibcGetRandomTest.ReturnsSize (17 us)
[ RUN      ] LlvmLibcGetRandomTest.CheckValue
[       OK ] LlvmLibcGetRandomTest.CheckValue (30 us)
Ran 4 tests.  PASS: 4  FAIL: 0
[939/1102] Running unit test libc.test.src.sys.random.linux.getrandom_test.__NO_MISC_MATH_BASIC_OPS_OPT
[==========] Running 4 tests from 1 test suite.
[ RUN      ] LlvmLibcGetRandomTest.InvalidFlag
[       OK ] LlvmLibcGetRandomTest.InvalidFlag (3 us)
[ RUN      ] LlvmLibcGetRandomTest.InvalidBuffer
[       OK ] LlvmLibcGetRandomTest.InvalidBuffer (5 us)
[ RUN      ] LlvmLibcGetRandomTest.ReturnsSize
[       OK ] LlvmLibcGetRandomTest.ReturnsSize (16 us)
[ RUN      ] LlvmLibcGetRandomTest.CheckValue
[       OK ] LlvmLibcGetRandomTest.CheckValue (5 us)
Ran 4 tests.  PASS: 4  FAIL: 0
[940/1102] Running unit test libc.test.src.sys.random.linux.getrandom_test.__NO_FMA_OPT
[==========] Running 4 tests from 1 test suite.
[ RUN      ] LlvmLibcGetRandomTest.InvalidFlag
Step 8 (libc-unit-tests) failure: libc-unit-tests (failure)
...
[       OK ] LlvmLibcRemapFilePagesTest.ErrorInvalidAddress (4 us)
Ran 3 tests.  PASS: 3  FAIL: 0
[936/1102] Running unit test libc.test.src.sys.mman.linux.shm_test
[==========] Running 2 tests from 1 test suite.
[ RUN      ] LlvmLibcShmTest.Basic
[       OK ] LlvmLibcShmTest.Basic (83 us)
[ RUN      ] LlvmLibcShmTest.NameConversion
[       OK ] LlvmLibcShmTest.NameConversion (19 us)
Ran 2 tests.  PASS: 2  FAIL: 0
[937/1102] Running unit test libc.test.src.sys.mman.linux.process_mrelease_test
FAILED: projects/libc/test/src/sys/mman/linux/CMakeFiles/libc.test.src.sys.mman.linux.process_mrelease_test /home/llvm-libc-buildbot/buildbot-worker/libc-x86_64-debian-fullbuild/libc-x86_64-debian-gcc-fullbuild-dbg/build/projects/libc/test/src/sys/mman/linux/CMakeFiles/libc.test.src.sys.mman.linux.process_mrelease_test 
cd /home/llvm-libc-buildbot/buildbot-worker/libc-x86_64-debian-fullbuild/libc-x86_64-debian-gcc-fullbuild-dbg/build/projects/libc/test/src/sys/mman/linux && /home/llvm-libc-buildbot/buildbot-worker/libc-x86_64-debian-fullbuild/libc-x86_64-debian-gcc-fullbuild-dbg/build/projects/libc/test/src/sys/mman/linux/libc.test.src.sys.mman.linux.process_mrelease_test.__build__
[==========] Running 3 tests from 1 test suite.
[ RUN      ] LlvmLibcProcessMReleaseTest.NoError
/home/llvm-libc-buildbot/buildbot-worker/libc-x86_64-debian-fullbuild/libc-x86_64-debian-gcc-fullbuild-dbg/llvm-project/libc/test/src/sys/mman/linux/process_mrelease_test.cpp:44: FAILURE
Failed to match LIBC_NAMESPACE::process_mrelease(pidfd, 0) against Succeeds().
Expected return value to be equal to 0 but got -1.
Expected errno to be equal to "Success" but got "No such process".
[  FAILED  ] LlvmLibcProcessMReleaseTest.NoError
[ RUN      ] LlvmLibcProcessMReleaseTest.ErrorNotKilled
[       OK ] LlvmLibcProcessMReleaseTest.ErrorNotKilled (161 us)
[ RUN      ] LlvmLibcProcessMReleaseTest.ErrorNonExistingPidfd
[       OK ] LlvmLibcProcessMReleaseTest.ErrorNonExistingPidfd (5 us)
Ran 3 tests.  PASS: 2  FAIL: 1
[938/1102] Running unit test libc.test.src.sys.random.linux.getrandom_test
[==========] Running 4 tests from 1 test suite.
[ RUN      ] LlvmLibcGetRandomTest.InvalidFlag
[       OK ] LlvmLibcGetRandomTest.InvalidFlag (4 us)
[ RUN      ] LlvmLibcGetRandomTest.InvalidBuffer
[       OK ] LlvmLibcGetRandomTest.InvalidBuffer (6 us)
[ RUN      ] LlvmLibcGetRandomTest.ReturnsSize
[       OK ] LlvmLibcGetRandomTest.ReturnsSize (17 us)
[ RUN      ] LlvmLibcGetRandomTest.CheckValue
[       OK ] LlvmLibcGetRandomTest.CheckValue (30 us)
Ran 4 tests.  PASS: 4  FAIL: 0
[939/1102] Running unit test libc.test.src.sys.random.linux.getrandom_test.__NO_MISC_MATH_BASIC_OPS_OPT
[==========] Running 4 tests from 1 test suite.
[ RUN      ] LlvmLibcGetRandomTest.InvalidFlag
[       OK ] LlvmLibcGetRandomTest.InvalidFlag (3 us)
[ RUN      ] LlvmLibcGetRandomTest.InvalidBuffer
[       OK ] LlvmLibcGetRandomTest.InvalidBuffer (5 us)
[ RUN      ] LlvmLibcGetRandomTest.ReturnsSize
[       OK ] LlvmLibcGetRandomTest.ReturnsSize (16 us)
[ RUN      ] LlvmLibcGetRandomTest.CheckValue
[       OK ] LlvmLibcGetRandomTest.CheckValue (5 us)
Ran 4 tests.  PASS: 4  FAIL: 0
[940/1102] Running unit test libc.test.src.sys.random.linux.getrandom_test.__NO_FMA_OPT
[==========] Running 4 tests from 1 test suite.
[ RUN      ] LlvmLibcGetRandomTest.InvalidFlag

@llvm-ci
Copy link
Collaborator

llvm-ci commented Dec 2, 2024

LLVM Buildbot has detected a new failure on builder flang-aarch64-out-of-tree running on linaro-flang-aarch64-out-of-tree while building llvm at step 7 "build-flang-unified-tree".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/53/builds/8800

Here is the relevant piece of the build log for the reference
Step 7 (build-flang-unified-tree) failure: build (failure)
...
3.062 [23/18/4] Linking CXX executable unittests/Evaluate/logical.test
3.088 [23/17/5] Linking CXX executable unittests/Evaluate/integer.test
3.130 [23/16/6] Linking CXX executable unittests/Evaluate/real.test
4.164 [23/15/7] Linking CXX executable bin/fir-lsp-server
4.177 [23/14/8] Linking CXX executable bin/f18-parse-demo
6.774 [23/13/9] Linking CXX executable unittests/Evaluate/folding.test
7.152 [23/12/10] Linking CXX executable unittests/Evaluate/expression.test
24.198 [23/11/11] Building CXX object lib/Optimizer/CodeGen/CMakeFiles/FIRCodeGen.dir/Target.cpp.o
37.552 [23/10/12] Building CXX object lib/Optimizer/CodeGen/CMakeFiles/FIRCodeGen.dir/TypeConverter.cpp.o
38.132 [23/9/13] Building CXX object lib/Optimizer/CodeGen/CMakeFiles/FIRCodeGen.dir/TargetRewrite.cpp.o
FAILED: lib/Optimizer/CodeGen/CMakeFiles/FIRCodeGen.dir/TargetRewrite.cpp.o 
/usr/local/bin/c++ -DFLANG_INCLUDE_TESTS=1 -DFLANG_LITTLE_ENDIAN=1 -DGTEST_HAS_RTTI=0 -D_DEBUG -D_GLIBCXX_ASSERTIONS -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -I/home/tcwg-buildbot/worker/flang-aarch64-out-of-tree/build_flang/lib/Optimizer/CodeGen -I/home/tcwg-buildbot/worker/flang-aarch64-out-of-tree/llvm-project/flang/lib/Optimizer/CodeGen -I/home/tcwg-buildbot/worker/flang-aarch64-out-of-tree/llvm-project/flang/include -I/home/tcwg-buildbot/worker/flang-aarch64-out-of-tree/build_flang/include -isystem /home/tcwg-buildbot/worker/flang-aarch64-out-of-tree/llvm-project/mlir/include -isystem /home/tcwg-buildbot/worker/flang-aarch64-out-of-tree/build_llvm/tools/mlir/include -isystem /home/tcwg-buildbot/worker/flang-aarch64-out-of-tree/llvm-project/llvm/include -isystem /home/tcwg-buildbot/worker/flang-aarch64-out-of-tree/build_llvm/include -isystem /home/tcwg-buildbot/worker/flang-aarch64-out-of-tree/llvm-project/clang/include -isystem /home/tcwg-buildbot/worker/flang-aarch64-out-of-tree/build_llvm/tools/clang/include -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wc++98-compat-extra-semi -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -Wmisleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -Werror -Wno-deprecated-copy -Wno-string-conversion -Wno-ctad-maybe-unsupported -Wno-unused-command-line-argument -Wstring-conversion           -Wcovered-switch-default -Wno-nested-anon-types -O3 -DNDEBUG -std=c++17   -D_GNU_SOURCE -D_DEBUG -D_GLIBCXX_ASSERTIONS -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS  -fno-exceptions -funwind-tables -fno-rtti -UNDEBUG -MD -MT lib/Optimizer/CodeGen/CMakeFiles/FIRCodeGen.dir/TargetRewrite.cpp.o -MF lib/Optimizer/CodeGen/CMakeFiles/FIRCodeGen.dir/TargetRewrite.cpp.o.d -o lib/Optimizer/CodeGen/CMakeFiles/FIRCodeGen.dir/TargetRewrite.cpp.o -c /home/tcwg-buildbot/worker/flang-aarch64-out-of-tree/llvm-project/flang/lib/Optimizer/CodeGen/TargetRewrite.cpp
../llvm-project/flang/lib/Optimizer/CodeGen/TargetRewrite.cpp:848:22: error: unused variable 'index' [-Werror,-Wunused-variable]
  848 |                 auto index = e.index();
      |                      ^~~~~
1 error generated.
40.409 [23/8/14] Building CXX object lib/Optimizer/CodeGen/CMakeFiles/FIRCodeGen.dir/FIROpPatterns.cpp.o
41.794 [23/7/15] Building CXX object lib/Optimizer/Transforms/CMakeFiles/FIRTransforms.dir/DebugTypeGenerator.cpp.o
44.612 [23/6/16] Building CXX object lib/Optimizer/CodeGen/CMakeFiles/FIRCodeGen.dir/CodeGenOpenMP.cpp.o
44.873 [23/5/17] Building CXX object lib/Optimizer/Transforms/CMakeFiles/FIRTransforms.dir/CUFAddConstructor.cpp.o
45.775 [23/4/18] Building CXX object lib/Optimizer/Transforms/CMakeFiles/FIRTransforms.dir/CUFGPUToLLVMConversion.cpp.o
45.861 [23/3/19] Building CXX object lib/Optimizer/Transforms/CMakeFiles/FIRTransforms.dir/CUFOpConversion.cpp.o
49.221 [23/2/20] Building CXX object lib/Optimizer/Transforms/CMakeFiles/FIRTransforms.dir/AddDebugInfo.cpp.o
145.308 [23/1/21] Building CXX object lib/Optimizer/CodeGen/CMakeFiles/FIRCodeGen.dir/CodeGen.cpp.o
ninja: build stopped: subcommand failed.

@DavidSpickett
Copy link
Collaborator

LLVM Buildbot has detected a new failure on builder flang-aarch64-out-of-tree

This bot is a bit strange because it triggers on changes to LLVM and Flang but doesn't list the flang/ changes for some reason. So you can definitely ignore this.

momchil-velikov added a commit to momchil-velikov/llvm-project that referenced this pull request Dec 20, 2024
… all-true predicate

When the predicate of a destructive operation is known to be all-true,
for example

   fabs z0.s, p0/m, z1.s

then the entire output register is written and we can use a zeroing
(instead of a merging) form of the instruction, for example

   fabs z0.s, p0/z, z1.s

thus eliminate the dependency on the input-output destination register
without the need to insert a `movprfx`.

This patch complements (and in the case of 2b3266c, fixes a regression)
the following:

  7f4414b [AArch64] Generate zeroing forms of certain SVE2.2 instructions (4/11)  (llvm#116830)
  2474cf7 [AArch64] Generate zeroing forms of certain SVE2.2 instructions (3/11) (llvm#116829)
  6f285d3 [AArch64] Generate zeroing forms of certain SVE2.2 instructions (2/11) (llvm#116828)
  2b3266c [AArch64] Generate zeroing forms of certain SVE2.2 instructions (1/11) (llvm#116259)
momchil-velikov added a commit that referenced this pull request Dec 24, 2024
… all-true predicate (#120595)

When the predicate of a destructive operation is known to be all-true,
for example

    fabs z0.s, p0/m, z1.s

then the entire output register is written and we can use a zeroing
(instead of a merging) form of the instruction, for example

    fabs z0.s, p0/z, z1.s

thus eliminate the dependency on the input-output destination register
without the need to insert a `movprfx`.

This patch complements (and in the case of
2b3266c,
fixes a regression) the following:

7f4414b
[AArch64] Generate zeroing forms of certain SVE2.2 instructions (4/11)
(#116830)

2474cf7
[AArch64] Generate zeroing forms of certain SVE2.2 instructions (3/11)
(#116829)

6f285d3
[AArch64] Generate zeroing forms of certain SVE2.2 instructions (2/11)
(#116828)

2b3266c
[AArch64] Generate zeroing forms of certain SVE2.2 instructions (1/11)
(#116259)
github-actions bot pushed a commit to arm/arm-toolchain that referenced this pull request Jan 10, 2025
…ons with an all-true predicate (#120595)

When the predicate of a destructive operation is known to be all-true,
for example

    fabs z0.s, p0/m, z1.s

then the entire output register is written and we can use a zeroing
(instead of a merging) form of the instruction, for example

    fabs z0.s, p0/z, z1.s

thus eliminate the dependency on the input-output destination register
without the need to insert a `movprfx`.

This patch complements (and in the case of
llvm/llvm-project@2b3266c,
fixes a regression) the following:

llvm/llvm-project@7f4414b
[AArch64] Generate zeroing forms of certain SVE2.2 instructions (4/11)
(llvm/llvm-project#116830)

llvm/llvm-project@2474cf7
[AArch64] Generate zeroing forms of certain SVE2.2 instructions (3/11)
(llvm/llvm-project#116829)

llvm/llvm-project@6f285d3
[AArch64] Generate zeroing forms of certain SVE2.2 instructions (2/11)
(llvm/llvm-project#116828)

llvm/llvm-project@2b3266c
[AArch64] Generate zeroing forms of certain SVE2.2 instructions (1/11)
(llvm/llvm-project#116259)
@momchil-velikov momchil-velikov deleted the zeroing-03.1-fcvtlt-fcvtx branch January 29, 2025 10:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants