Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[LegalizeVectorTypes] Always widen fabs #111298

Merged
merged 2 commits into from
Oct 7, 2024
Merged

[LegalizeVectorTypes] Always widen fabs #111298

merged 2 commits into from
Oct 7, 2024

Conversation

lukel97
Copy link
Contributor

@lukel97 lukel97 commented Oct 6, 2024

fabs and fneg are similar nodes in that they can always be expanded to integer ops, but currently they diverge when widened.

If the widened vector fabs is marked as expand (and the corresponding scalar type is too), LegalizeVectorTypes thinks that it may be turned into a libcall and so will unroll it to avoid the overhead on the undef elements.

However unlike the other ops in that list like fsin, fround, flog etc., an fabs marked as expand will never be legalized into a libcall. Like fneg, it can always be expanded into an integer op.

This moves it below unrollExpandedOp to bring it in line with fneg, which fixes an issue on RISC-V with f16 fabs being unexpectedly scalarized when there's no zfhmin.

@llvmbot llvmbot added the llvm:SelectionDAG SelectionDAGISel as well label Oct 6, 2024
@llvmbot
Copy link
Collaborator

llvmbot commented Oct 6, 2024

@llvm/pr-subscribers-llvm-selectiondag

Author: Luke Lau (lukel97)

Changes

fabs and fneg are similar nodes in that they can always be expanded to integer ops, but currently they diverge when widened.

If the widened vector fabs is marked as expand (and the corresponding scalar type is too), LegalizeVectorTypes thinks that it may be turned into a libcall and so will unroll it to avoid the overhead on the undef elements.

However unlike the other ops in that list like fsin, fround, flog etc., an fabs marked as expand will never be legalized into a libcall. Like fneg, it can always be expanded into an integer op.

This moves it below unrollExpandedOp to bring it in line with fneg, which fixes an issue on RISC-V with f16 fabs being unexpectedly scalarized when there's no zfhmin.


Full diff: https://github.com/llvm/llvm-project/pull/111298.diff

2 Files Affected:

  • (modified) llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp (+1-2)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fp.ll (+9-253)
diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
index 0a22f06271984e..f268393b6140ca 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
@@ -4679,7 +4679,6 @@ void DAGTypeLegalizer::WidenVectorResult(SDNode *N, unsigned ResNo) {
     Res = WidenVecRes_XROUND(N);
     break;
 
-  case ISD::FABS:
   case ISD::FACOS:
   case ISD::FASIN:
   case ISD::FATAN:
@@ -4727,7 +4726,7 @@ void DAGTypeLegalizer::WidenVectorResult(SDNode *N, unsigned ResNo) {
   case ISD::CTTZ_ZERO_UNDEF:
   case ISD::VP_CTTZ_ZERO_UNDEF:
   case ISD::FNEG: case ISD::VP_FNEG:
-  case ISD::VP_FABS:
+  case ISD::FABS: case ISD::VP_FABS:
   case ISD::VP_SQRT:
   case ISD::VP_FCEIL:
   case ISD::VP_FFLOOR:
diff --git a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fp.ll b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fp.ll
index ea7829f2d6c658..0f1bfb5fdfd40f 100644
--- a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fp.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fp.ll
@@ -484,259 +484,15 @@ define void @fabs_v6f16(ptr %x) {
 ; ZVFH-NEXT:    vse16.v v8, (a0)
 ; ZVFH-NEXT:    ret
 ;
-; RV32-ZVFHMIN-LABEL: fabs_v6f16:
-; RV32-ZVFHMIN:       # %bb.0:
-; RV32-ZVFHMIN-NEXT:    addi sp, sp, -48
-; RV32-ZVFHMIN-NEXT:    .cfi_def_cfa_offset 48
-; RV32-ZVFHMIN-NEXT:    sw ra, 44(sp) # 4-byte Folded Spill
-; RV32-ZVFHMIN-NEXT:    sw s0, 40(sp) # 4-byte Folded Spill
-; RV32-ZVFHMIN-NEXT:    sw s1, 36(sp) # 4-byte Folded Spill
-; RV32-ZVFHMIN-NEXT:    fsd fs0, 24(sp) # 8-byte Folded Spill
-; RV32-ZVFHMIN-NEXT:    .cfi_offset ra, -4
-; RV32-ZVFHMIN-NEXT:    .cfi_offset s0, -8
-; RV32-ZVFHMIN-NEXT:    .cfi_offset s1, -12
-; RV32-ZVFHMIN-NEXT:    .cfi_offset fs0, -24
-; RV32-ZVFHMIN-NEXT:    csrr a1, vlenb
-; RV32-ZVFHMIN-NEXT:    slli a1, a1, 1
-; RV32-ZVFHMIN-NEXT:    sub sp, sp, a1
-; RV32-ZVFHMIN-NEXT:    .cfi_escape 0x0f, 0x0d, 0x72, 0x00, 0x11, 0x30, 0x22, 0x11, 0x02, 0x92, 0xa2, 0x38, 0x00, 0x1e, 0x22 # sp + 48 + 2 * vlenb
-; RV32-ZVFHMIN-NEXT:    mv s0, a0
-; RV32-ZVFHMIN-NEXT:    vsetivli zero, 6, e16, m1, ta, ma
-; RV32-ZVFHMIN-NEXT:    vle16.v v8, (a0)
-; RV32-ZVFHMIN-NEXT:    csrr a0, vlenb
-; RV32-ZVFHMIN-NEXT:    add a0, sp, a0
-; RV32-ZVFHMIN-NEXT:    addi a0, a0, 16
-; RV32-ZVFHMIN-NEXT:    vs1r.v v8, (a0) # Unknown-size Folded Spill
-; RV32-ZVFHMIN-NEXT:    vmv.x.s a0, v8
-; RV32-ZVFHMIN-NEXT:    fmv.w.x fa0, a0
-; RV32-ZVFHMIN-NEXT:    call __extendhfsf2
-; RV32-ZVFHMIN-NEXT:    fmv.s fs0, fa0
-; RV32-ZVFHMIN-NEXT:    csrr a0, vlenb
-; RV32-ZVFHMIN-NEXT:    add a0, sp, a0
-; RV32-ZVFHMIN-NEXT:    addi a0, a0, 16
-; RV32-ZVFHMIN-NEXT:    vl1r.v v8, (a0) # Unknown-size Folded Reload
-; RV32-ZVFHMIN-NEXT:    vsetivli zero, 1, e16, m1, ta, ma
-; RV32-ZVFHMIN-NEXT:    vslidedown.vi v8, v8, 1
-; RV32-ZVFHMIN-NEXT:    vmv.x.s a0, v8
-; RV32-ZVFHMIN-NEXT:    fmv.w.x fa0, a0
-; RV32-ZVFHMIN-NEXT:    call __extendhfsf2
-; RV32-ZVFHMIN-NEXT:    fabs.s fa0, fa0
-; RV32-ZVFHMIN-NEXT:    call __truncsfhf2
-; RV32-ZVFHMIN-NEXT:    fabs.s fa5, fs0
-; RV32-ZVFHMIN-NEXT:    fmv.x.w s1, fa0
-; RV32-ZVFHMIN-NEXT:    fmv.s fa0, fa5
-; RV32-ZVFHMIN-NEXT:    call __truncsfhf2
-; RV32-ZVFHMIN-NEXT:    fmv.x.w a0, fa0
-; RV32-ZVFHMIN-NEXT:    vsetivli zero, 8, e16, m1, ta, ma
-; RV32-ZVFHMIN-NEXT:    vmv.v.x v8, a0
-; RV32-ZVFHMIN-NEXT:    vslide1down.vx v8, v8, s1
-; RV32-ZVFHMIN-NEXT:    addi a0, sp, 16
-; RV32-ZVFHMIN-NEXT:    vs1r.v v8, (a0) # Unknown-size Folded Spill
-; RV32-ZVFHMIN-NEXT:    csrr a0, vlenb
-; RV32-ZVFHMIN-NEXT:    add a0, sp, a0
-; RV32-ZVFHMIN-NEXT:    addi a0, a0, 16
-; RV32-ZVFHMIN-NEXT:    vl1r.v v8, (a0) # Unknown-size Folded Reload
-; RV32-ZVFHMIN-NEXT:    vslidedown.vi v8, v8, 2
-; RV32-ZVFHMIN-NEXT:    vmv.x.s a0, v8
-; RV32-ZVFHMIN-NEXT:    fmv.w.x fa0, a0
-; RV32-ZVFHMIN-NEXT:    call __extendhfsf2
-; RV32-ZVFHMIN-NEXT:    fabs.s fa0, fa0
-; RV32-ZVFHMIN-NEXT:    call __truncsfhf2
-; RV32-ZVFHMIN-NEXT:    fmv.x.w a0, fa0
-; RV32-ZVFHMIN-NEXT:    addi a1, sp, 16
-; RV32-ZVFHMIN-NEXT:    vl1r.v v8, (a1) # Unknown-size Folded Reload
-; RV32-ZVFHMIN-NEXT:    vsetivli zero, 8, e16, m1, ta, ma
-; RV32-ZVFHMIN-NEXT:    vslide1down.vx v8, v8, a0
-; RV32-ZVFHMIN-NEXT:    addi a0, sp, 16
-; RV32-ZVFHMIN-NEXT:    vs1r.v v8, (a0) # Unknown-size Folded Spill
-; RV32-ZVFHMIN-NEXT:    csrr a0, vlenb
-; RV32-ZVFHMIN-NEXT:    add a0, sp, a0
-; RV32-ZVFHMIN-NEXT:    addi a0, a0, 16
-; RV32-ZVFHMIN-NEXT:    vl1r.v v8, (a0) # Unknown-size Folded Reload
-; RV32-ZVFHMIN-NEXT:    vslidedown.vi v8, v8, 3
-; RV32-ZVFHMIN-NEXT:    vmv.x.s a0, v8
-; RV32-ZVFHMIN-NEXT:    fmv.w.x fa0, a0
-; RV32-ZVFHMIN-NEXT:    call __extendhfsf2
-; RV32-ZVFHMIN-NEXT:    fabs.s fa0, fa0
-; RV32-ZVFHMIN-NEXT:    call __truncsfhf2
-; RV32-ZVFHMIN-NEXT:    fmv.x.w a0, fa0
-; RV32-ZVFHMIN-NEXT:    addi a1, sp, 16
-; RV32-ZVFHMIN-NEXT:    vl1r.v v8, (a1) # Unknown-size Folded Reload
-; RV32-ZVFHMIN-NEXT:    vsetivli zero, 8, e16, m1, ta, ma
-; RV32-ZVFHMIN-NEXT:    vslide1down.vx v8, v8, a0
-; RV32-ZVFHMIN-NEXT:    addi a0, sp, 16
-; RV32-ZVFHMIN-NEXT:    vs1r.v v8, (a0) # Unknown-size Folded Spill
-; RV32-ZVFHMIN-NEXT:    csrr a0, vlenb
-; RV32-ZVFHMIN-NEXT:    add a0, sp, a0
-; RV32-ZVFHMIN-NEXT:    addi a0, a0, 16
-; RV32-ZVFHMIN-NEXT:    vl1r.v v8, (a0) # Unknown-size Folded Reload
-; RV32-ZVFHMIN-NEXT:    vslidedown.vi v8, v8, 4
-; RV32-ZVFHMIN-NEXT:    vmv.x.s a0, v8
-; RV32-ZVFHMIN-NEXT:    fmv.w.x fa0, a0
-; RV32-ZVFHMIN-NEXT:    call __extendhfsf2
-; RV32-ZVFHMIN-NEXT:    fabs.s fa0, fa0
-; RV32-ZVFHMIN-NEXT:    call __truncsfhf2
-; RV32-ZVFHMIN-NEXT:    fmv.x.w a0, fa0
-; RV32-ZVFHMIN-NEXT:    addi a1, sp, 16
-; RV32-ZVFHMIN-NEXT:    vl1r.v v8, (a1) # Unknown-size Folded Reload
-; RV32-ZVFHMIN-NEXT:    vsetivli zero, 8, e16, m1, ta, ma
-; RV32-ZVFHMIN-NEXT:    vslide1down.vx v8, v8, a0
-; RV32-ZVFHMIN-NEXT:    addi a0, sp, 16
-; RV32-ZVFHMIN-NEXT:    vs1r.v v8, (a0) # Unknown-size Folded Spill
-; RV32-ZVFHMIN-NEXT:    csrr a0, vlenb
-; RV32-ZVFHMIN-NEXT:    add a0, sp, a0
-; RV32-ZVFHMIN-NEXT:    addi a0, a0, 16
-; RV32-ZVFHMIN-NEXT:    vl1r.v v8, (a0) # Unknown-size Folded Reload
-; RV32-ZVFHMIN-NEXT:    vslidedown.vi v8, v8, 5
-; RV32-ZVFHMIN-NEXT:    vmv.x.s a0, v8
-; RV32-ZVFHMIN-NEXT:    fmv.w.x fa0, a0
-; RV32-ZVFHMIN-NEXT:    call __extendhfsf2
-; RV32-ZVFHMIN-NEXT:    fabs.s fa0, fa0
-; RV32-ZVFHMIN-NEXT:    call __truncsfhf2
-; RV32-ZVFHMIN-NEXT:    fmv.x.w a0, fa0
-; RV32-ZVFHMIN-NEXT:    addi a1, sp, 16
-; RV32-ZVFHMIN-NEXT:    vl1r.v v8, (a1) # Unknown-size Folded Reload
-; RV32-ZVFHMIN-NEXT:    vsetivli zero, 8, e16, m1, ta, ma
-; RV32-ZVFHMIN-NEXT:    vslide1down.vx v8, v8, a0
-; RV32-ZVFHMIN-NEXT:    vsetivli zero, 6, e16, m1, ta, ma
-; RV32-ZVFHMIN-NEXT:    vslidedown.vi v8, v8, 2
-; RV32-ZVFHMIN-NEXT:    vse16.v v8, (s0)
-; RV32-ZVFHMIN-NEXT:    csrr a0, vlenb
-; RV32-ZVFHMIN-NEXT:    slli a0, a0, 1
-; RV32-ZVFHMIN-NEXT:    add sp, sp, a0
-; RV32-ZVFHMIN-NEXT:    lw ra, 44(sp) # 4-byte Folded Reload
-; RV32-ZVFHMIN-NEXT:    lw s0, 40(sp) # 4-byte Folded Reload
-; RV32-ZVFHMIN-NEXT:    lw s1, 36(sp) # 4-byte Folded Reload
-; RV32-ZVFHMIN-NEXT:    fld fs0, 24(sp) # 8-byte Folded Reload
-; RV32-ZVFHMIN-NEXT:    addi sp, sp, 48
-; RV32-ZVFHMIN-NEXT:    ret
-;
-; RV64-ZVFHMIN-LABEL: fabs_v6f16:
-; RV64-ZVFHMIN:       # %bb.0:
-; RV64-ZVFHMIN-NEXT:    addi sp, sp, -48
-; RV64-ZVFHMIN-NEXT:    .cfi_def_cfa_offset 48
-; RV64-ZVFHMIN-NEXT:    sd ra, 40(sp) # 8-byte Folded Spill
-; RV64-ZVFHMIN-NEXT:    sd s0, 32(sp) # 8-byte Folded Spill
-; RV64-ZVFHMIN-NEXT:    sd s1, 24(sp) # 8-byte Folded Spill
-; RV64-ZVFHMIN-NEXT:    fsd fs0, 16(sp) # 8-byte Folded Spill
-; RV64-ZVFHMIN-NEXT:    .cfi_offset ra, -8
-; RV64-ZVFHMIN-NEXT:    .cfi_offset s0, -16
-; RV64-ZVFHMIN-NEXT:    .cfi_offset s1, -24
-; RV64-ZVFHMIN-NEXT:    .cfi_offset fs0, -32
-; RV64-ZVFHMIN-NEXT:    csrr a1, vlenb
-; RV64-ZVFHMIN-NEXT:    slli a1, a1, 1
-; RV64-ZVFHMIN-NEXT:    sub sp, sp, a1
-; RV64-ZVFHMIN-NEXT:    .cfi_escape 0x0f, 0x0d, 0x72, 0x00, 0x11, 0x30, 0x22, 0x11, 0x02, 0x92, 0xa2, 0x38, 0x00, 0x1e, 0x22 # sp + 48 + 2 * vlenb
-; RV64-ZVFHMIN-NEXT:    mv s0, a0
-; RV64-ZVFHMIN-NEXT:    vsetivli zero, 6, e16, m1, ta, ma
-; RV64-ZVFHMIN-NEXT:    vle16.v v8, (a0)
-; RV64-ZVFHMIN-NEXT:    csrr a0, vlenb
-; RV64-ZVFHMIN-NEXT:    add a0, sp, a0
-; RV64-ZVFHMIN-NEXT:    addi a0, a0, 16
-; RV64-ZVFHMIN-NEXT:    vs1r.v v8, (a0) # Unknown-size Folded Spill
-; RV64-ZVFHMIN-NEXT:    vmv.x.s a0, v8
-; RV64-ZVFHMIN-NEXT:    fmv.w.x fa0, a0
-; RV64-ZVFHMIN-NEXT:    call __extendhfsf2
-; RV64-ZVFHMIN-NEXT:    fmv.s fs0, fa0
-; RV64-ZVFHMIN-NEXT:    csrr a0, vlenb
-; RV64-ZVFHMIN-NEXT:    add a0, sp, a0
-; RV64-ZVFHMIN-NEXT:    addi a0, a0, 16
-; RV64-ZVFHMIN-NEXT:    vl1r.v v8, (a0) # Unknown-size Folded Reload
-; RV64-ZVFHMIN-NEXT:    vsetivli zero, 1, e16, m1, ta, ma
-; RV64-ZVFHMIN-NEXT:    vslidedown.vi v8, v8, 1
-; RV64-ZVFHMIN-NEXT:    vmv.x.s a0, v8
-; RV64-ZVFHMIN-NEXT:    fmv.w.x fa0, a0
-; RV64-ZVFHMIN-NEXT:    call __extendhfsf2
-; RV64-ZVFHMIN-NEXT:    fabs.s fa0, fa0
-; RV64-ZVFHMIN-NEXT:    call __truncsfhf2
-; RV64-ZVFHMIN-NEXT:    fabs.s fa5, fs0
-; RV64-ZVFHMIN-NEXT:    fmv.x.w s1, fa0
-; RV64-ZVFHMIN-NEXT:    fmv.s fa0, fa5
-; RV64-ZVFHMIN-NEXT:    call __truncsfhf2
-; RV64-ZVFHMIN-NEXT:    fmv.x.w a0, fa0
-; RV64-ZVFHMIN-NEXT:    vsetivli zero, 8, e16, m1, ta, ma
-; RV64-ZVFHMIN-NEXT:    vmv.v.x v8, a0
-; RV64-ZVFHMIN-NEXT:    vslide1down.vx v8, v8, s1
-; RV64-ZVFHMIN-NEXT:    addi a0, sp, 16
-; RV64-ZVFHMIN-NEXT:    vs1r.v v8, (a0) # Unknown-size Folded Spill
-; RV64-ZVFHMIN-NEXT:    csrr a0, vlenb
-; RV64-ZVFHMIN-NEXT:    add a0, sp, a0
-; RV64-ZVFHMIN-NEXT:    addi a0, a0, 16
-; RV64-ZVFHMIN-NEXT:    vl1r.v v8, (a0) # Unknown-size Folded Reload
-; RV64-ZVFHMIN-NEXT:    vslidedown.vi v8, v8, 2
-; RV64-ZVFHMIN-NEXT:    vmv.x.s a0, v8
-; RV64-ZVFHMIN-NEXT:    fmv.w.x fa0, a0
-; RV64-ZVFHMIN-NEXT:    call __extendhfsf2
-; RV64-ZVFHMIN-NEXT:    fabs.s fa0, fa0
-; RV64-ZVFHMIN-NEXT:    call __truncsfhf2
-; RV64-ZVFHMIN-NEXT:    fmv.x.w a0, fa0
-; RV64-ZVFHMIN-NEXT:    addi a1, sp, 16
-; RV64-ZVFHMIN-NEXT:    vl1r.v v8, (a1) # Unknown-size Folded Reload
-; RV64-ZVFHMIN-NEXT:    vsetivli zero, 8, e16, m1, ta, ma
-; RV64-ZVFHMIN-NEXT:    vslide1down.vx v8, v8, a0
-; RV64-ZVFHMIN-NEXT:    addi a0, sp, 16
-; RV64-ZVFHMIN-NEXT:    vs1r.v v8, (a0) # Unknown-size Folded Spill
-; RV64-ZVFHMIN-NEXT:    csrr a0, vlenb
-; RV64-ZVFHMIN-NEXT:    add a0, sp, a0
-; RV64-ZVFHMIN-NEXT:    addi a0, a0, 16
-; RV64-ZVFHMIN-NEXT:    vl1r.v v8, (a0) # Unknown-size Folded Reload
-; RV64-ZVFHMIN-NEXT:    vslidedown.vi v8, v8, 3
-; RV64-ZVFHMIN-NEXT:    vmv.x.s a0, v8
-; RV64-ZVFHMIN-NEXT:    fmv.w.x fa0, a0
-; RV64-ZVFHMIN-NEXT:    call __extendhfsf2
-; RV64-ZVFHMIN-NEXT:    fabs.s fa0, fa0
-; RV64-ZVFHMIN-NEXT:    call __truncsfhf2
-; RV64-ZVFHMIN-NEXT:    fmv.x.w a0, fa0
-; RV64-ZVFHMIN-NEXT:    addi a1, sp, 16
-; RV64-ZVFHMIN-NEXT:    vl1r.v v8, (a1) # Unknown-size Folded Reload
-; RV64-ZVFHMIN-NEXT:    vsetivli zero, 8, e16, m1, ta, ma
-; RV64-ZVFHMIN-NEXT:    vslide1down.vx v8, v8, a0
-; RV64-ZVFHMIN-NEXT:    addi a0, sp, 16
-; RV64-ZVFHMIN-NEXT:    vs1r.v v8, (a0) # Unknown-size Folded Spill
-; RV64-ZVFHMIN-NEXT:    csrr a0, vlenb
-; RV64-ZVFHMIN-NEXT:    add a0, sp, a0
-; RV64-ZVFHMIN-NEXT:    addi a0, a0, 16
-; RV64-ZVFHMIN-NEXT:    vl1r.v v8, (a0) # Unknown-size Folded Reload
-; RV64-ZVFHMIN-NEXT:    vslidedown.vi v8, v8, 4
-; RV64-ZVFHMIN-NEXT:    vmv.x.s a0, v8
-; RV64-ZVFHMIN-NEXT:    fmv.w.x fa0, a0
-; RV64-ZVFHMIN-NEXT:    call __extendhfsf2
-; RV64-ZVFHMIN-NEXT:    fabs.s fa0, fa0
-; RV64-ZVFHMIN-NEXT:    call __truncsfhf2
-; RV64-ZVFHMIN-NEXT:    fmv.x.w a0, fa0
-; RV64-ZVFHMIN-NEXT:    addi a1, sp, 16
-; RV64-ZVFHMIN-NEXT:    vl1r.v v8, (a1) # Unknown-size Folded Reload
-; RV64-ZVFHMIN-NEXT:    vsetivli zero, 8, e16, m1, ta, ma
-; RV64-ZVFHMIN-NEXT:    vslide1down.vx v8, v8, a0
-; RV64-ZVFHMIN-NEXT:    addi a0, sp, 16
-; RV64-ZVFHMIN-NEXT:    vs1r.v v8, (a0) # Unknown-size Folded Spill
-; RV64-ZVFHMIN-NEXT:    csrr a0, vlenb
-; RV64-ZVFHMIN-NEXT:    add a0, sp, a0
-; RV64-ZVFHMIN-NEXT:    addi a0, a0, 16
-; RV64-ZVFHMIN-NEXT:    vl1r.v v8, (a0) # Unknown-size Folded Reload
-; RV64-ZVFHMIN-NEXT:    vslidedown.vi v8, v8, 5
-; RV64-ZVFHMIN-NEXT:    vmv.x.s a0, v8
-; RV64-ZVFHMIN-NEXT:    fmv.w.x fa0, a0
-; RV64-ZVFHMIN-NEXT:    call __extendhfsf2
-; RV64-ZVFHMIN-NEXT:    fabs.s fa0, fa0
-; RV64-ZVFHMIN-NEXT:    call __truncsfhf2
-; RV64-ZVFHMIN-NEXT:    fmv.x.w a0, fa0
-; RV64-ZVFHMIN-NEXT:    addi a1, sp, 16
-; RV64-ZVFHMIN-NEXT:    vl1r.v v8, (a1) # Unknown-size Folded Reload
-; RV64-ZVFHMIN-NEXT:    vsetivli zero, 8, e16, m1, ta, ma
-; RV64-ZVFHMIN-NEXT:    vslide1down.vx v8, v8, a0
-; RV64-ZVFHMIN-NEXT:    vsetivli zero, 6, e16, m1, ta, ma
-; RV64-ZVFHMIN-NEXT:    vslidedown.vi v8, v8, 2
-; RV64-ZVFHMIN-NEXT:    vse16.v v8, (s0)
-; RV64-ZVFHMIN-NEXT:    csrr a0, vlenb
-; RV64-ZVFHMIN-NEXT:    slli a0, a0, 1
-; RV64-ZVFHMIN-NEXT:    add sp, sp, a0
-; RV64-ZVFHMIN-NEXT:    ld ra, 40(sp) # 8-byte Folded Reload
-; RV64-ZVFHMIN-NEXT:    ld s0, 32(sp) # 8-byte Folded Reload
-; RV64-ZVFHMIN-NEXT:    ld s1, 24(sp) # 8-byte Folded Reload
-; RV64-ZVFHMIN-NEXT:    fld fs0, 16(sp) # 8-byte Folded Reload
-; RV64-ZVFHMIN-NEXT:    addi sp, sp, 48
-; RV64-ZVFHMIN-NEXT:    ret
+; ZVFHMIN-LABEL: fabs_v6f16:
+; ZVFHMIN:       # %bb.0:
+; ZVFHMIN-NEXT:    vsetivli zero, 6, e16, m1, ta, ma
+; ZVFHMIN-NEXT:    vle16.v v8, (a0)
+; ZVFHMIN-NEXT:    lui a1, 8
+; ZVFHMIN-NEXT:    addi a1, a1, -1
+; ZVFHMIN-NEXT:    vand.vx v8, v8, a1
+; ZVFHMIN-NEXT:    vse16.v v8, (a0)
+; ZVFHMIN-NEXT:    ret
   %a = load <6 x half>, ptr %x
   %b = call <6 x half> @llvm.fabs.v6f16(<6 x half> %a)
   store <6 x half> %b, ptr %x

Copy link

github-actions bot commented Oct 6, 2024

⚠️ C/C++ code formatter, clang-format found issues in your code. ⚠️

You can test this locally with the following command:
git-clang-format --diff 18d3a5d558c3739cfe1d382e9976400d292e9b63 f11f82c10ecb5e977c09ef4c70b91b583b1e8448 --extensions cpp -- llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
View the diff from clang-format here.
diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
index ab734ffb25..7ddd7dcbec 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
@@ -4726,7 +4726,8 @@ void DAGTypeLegalizer::WidenVectorResult(SDNode *N, unsigned ResNo) {
   case ISD::CTTZ_ZERO_UNDEF:
   case ISD::VP_CTTZ_ZERO_UNDEF:
   case ISD::FNEG: case ISD::VP_FNEG:
-  case ISD::FABS: case ISD::VP_FABS:
+  case ISD::FABS:
+  case ISD::VP_FABS:
   case ISD::VP_SQRT:
   case ISD::VP_FCEIL:
   case ISD::VP_FFLOOR:

fabs and fneg are similar nodes in that they can always be expanded to integer ops, but currently they diverge when widened.

If the widened vector fabs is marked as expand (and the corresponding scalar type is too), LegalizeVectorTypes thinks that it may be turned into a libcall and so will unroll it to avoid the overhead on the undef elements.

However unlike the other ops in that list like fsin, fround, flog etc., an fabs marked as expand will never be legalized into a libcall. Like fneg, it can always be expanded into an integer op.

This moves it below unrollExpandedOp to bring it in line with fneg, which fixes an issue on RISC-V with f16 fabs being unexpectedly scalarized when there's no zfhmin.
@lukel97 lukel97 merged commit c98e41f into llvm:main Oct 7, 2024
6 of 9 checks passed
@llvm-ci
Copy link
Collaborator

llvm-ci commented Oct 7, 2024

LLVM Buildbot has detected a new failure on builder lldb-arm-ubuntu running on linaro-lldb-arm-ubuntu while building llvm at step 6 "test".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/18/builds/5017

Here is the relevant piece of the build log for the reference
Step 6 (test) failure: build (failure)
...
PASS: lldb-api :: lang/cpp/incomplete-stl-types/TestStlIncompleteTypes.py (794 of 2809)
PASS: lldb-api :: lang/cpp/keywords_enabled/TestCppKeywordsEnabled.py (795 of 2809)
PASS: lldb-api :: lang/cpp/inlines/TestInlines.py (796 of 2809)
PASS: lldb-api :: lang/cpp/incomplete-types/TestCppIncompleteTypes.py (797 of 2809)
PASS: lldb-api :: lang/cpp/lambdas/TestLambdas.py (798 of 2809)
PASS: lldb-api :: lang/cpp/llvm-style/TestLLVMStyle.py (799 of 2809)
UNSUPPORTED: lldb-api :: lang/cpp/modules-import/TestCXXModulesImport.py (800 of 2809)
PASS: lldb-api :: lang/cpp/limit-debug-info/TestWithLimitDebugInfo.py (801 of 2809)
PASS: lldb-api :: lang/cpp/multiple-inheritance/TestCppMultipleInheritance.py (802 of 2809)
PASS: lldb-api :: lang/cpp/member-and-local-vars-with-same-name/TestMembersAndLocalsWithSameName.py (803 of 2809)
FAIL: lldb-api :: lang/c/shared_lib_stripped_symbols/TestSharedLibStrippedSymbols.py (804 of 2809)
******************** TEST 'lldb-api :: lang/c/shared_lib_stripped_symbols/TestSharedLibStrippedSymbols.py' FAILED ********************
Script:
--
/usr/bin/python3.10 /home/tcwg-buildbot/worker/lldb-arm-ubuntu/llvm-project/lldb/test/API/dotest.py -u CXXFLAGS -u CFLAGS --env ARCHIVER=/usr/local/bin/llvm-ar --env OBJCOPY=/usr/bin/llvm-objcopy --env LLVM_LIBS_DIR=/home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/./lib --env LLVM_INCLUDE_DIR=/home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/include --env LLVM_TOOLS_DIR=/home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/./bin --arch armv8l --build-dir /home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/lldb-test-build.noindex --lldb-module-cache-dir /home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/lldb-test-build.noindex/module-cache-lldb/lldb-api --clang-module-cache-dir /home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/lldb-test-build.noindex/module-cache-clang/lldb-api --executable /home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/./bin/lldb --compiler /home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/./bin/clang --dsymutil /home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/./bin/dsymutil --llvm-tools-dir /home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/./bin --lldb-obj-root /home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/tools/lldb --lldb-libs-dir /home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/./lib /home/tcwg-buildbot/worker/lldb-arm-ubuntu/llvm-project/lldb/test/API/lang/c/shared_lib_stripped_symbols -p TestSharedLibStrippedSymbols.py
--
Exit Code: 1

Command Output (stdout):
--
lldb version 20.0.0git (https://github.com/llvm/llvm-project.git revision c98e41f8586bc43033d29ef3ec0f9a2f79b3ec32)
  clang revision c98e41f8586bc43033d29ef3ec0f9a2f79b3ec32
  llvm revision c98e41f8586bc43033d29ef3ec0f9a2f79b3ec32
Skipping the following test categories: ['libc++', 'dsym', 'gmodules', 'debugserver', 'objc']

--
Command Output (stderr):
--
UNSUPPORTED: LLDB (/home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/bin/clang-arm) :: test_expr_dsym (TestSharedLibStrippedSymbols.SharedLibStrippedTestCase) (test case does not fall in any category of interest for this run) 
FAIL: LLDB (/home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/bin/clang-arm) :: test_expr_dwarf (TestSharedLibStrippedSymbols.SharedLibStrippedTestCase)
PASS: LLDB (/home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/bin/clang-arm) :: test_expr_dwo (TestSharedLibStrippedSymbols.SharedLibStrippedTestCase)
UNSUPPORTED: LLDB (/home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/bin/clang-arm) :: test_frame_variable_dsym (TestSharedLibStrippedSymbols.SharedLibStrippedTestCase) (test case does not fall in any category of interest for this run) 
XFAIL: LLDB (/home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/bin/clang-arm) :: test_frame_variable_dwarf (TestSharedLibStrippedSymbols.SharedLibStrippedTestCase)
XFAIL: LLDB (/home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/bin/clang-arm) :: test_frame_variable_dwo (TestSharedLibStrippedSymbols.SharedLibStrippedTestCase)
======================================================================
FAIL: test_expr_dwarf (TestSharedLibStrippedSymbols.SharedLibStrippedTestCase)
   Test that types work when defined in a shared library and forwa/d-declared in the main executable
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/tcwg-buildbot/worker/lldb-arm-ubuntu/llvm-project/lldb/packages/Python/lldbsuite/test/lldbtest.py", line 1769, in test_method
    return attrvalue(self)
  File "/home/tcwg-buildbot/worker/lldb-arm-ubuntu/llvm-project/lldb/test/API/lang/c/shared_lib_stripped_symbols/TestSharedLibStrippedSymbols.py", line 24, in test_expr
    self.expect(
  File "/home/tcwg-buildbot/worker/lldb-arm-ubuntu/llvm-project/lldb/packages/Python/lldbsuite/test/lldbtest.py", line 2370, in expect
    self.runCmd(
  File "/home/tcwg-buildbot/worker/lldb-arm-ubuntu/llvm-project/lldb/packages/Python/lldbsuite/test/lldbtest.py", line 1000, in runCmd
    self.assertTrue(self.res.Succeeded(), msg + output)
AssertionError: False is not true : Variable(s) displayed correctly
Error output:

Kyvangka1610 added a commit to Kyvangka1610/llvm-project that referenced this pull request Oct 7, 2024
* commit 'FETCH_HEAD':
  [X86] getIntImmCostInst - pull out repeated Imm.getBitWidth() calls. NFC.
  [X86] Add test coverage for llvm#111323
  [Driver] Use empty multilib file in another test (llvm#111352)
  [clang][OpenMP][test] Use x86_64-linux-gnu triple for test referencing avx512f feature (llvm#111337)
  [doc] Fix Kaleidoscope tutorial chapter 3 code snippet and full listing discrepancies (llvm#111289)
  [Flang][OpenMP] Improve entry block argument creation and binding (llvm#110267)
  [x86] combineMul - handle 0/-1 KnownBits cases before MUL_IMM logic (REAPPLIED)
  [llvm-dis] Fix non-deterministic disassembly across multiple inputs (llvm#110988)
  [lldb][test] TestDataFormatterLibcxxOptionalSimulator.py: change order of ifdefs
  [lldb][test] Add libcxx-simulators test for std::optional (llvm#111133)
  [x86] combineMul - use computeKnownBits directly to find MUL_IMM constant splat. (REAPPLIED)
  Reland "[lldb][test] TestDataFormatterLibcxxStringSimulator.py: add new padding layout" (llvm#111123)
  Revert "[x86] combineMul - use computeKnownBits directly to find MUL_IMM constant splat."
  update_test_checks: fix a simple regression  (llvm#111347)
  [LegalizeVectorTypes] Always widen fabs (llvm#111298)
  [lsan] Make ReportUnsuspendedThreads return bool also for Fuchsia
  [mlir][vector] Add more tests for ConvertVectorToLLVM (6/n) (llvm#111121)
  [bazel] port 9144fed
  [SystemZ] Remove inlining threshold multiplier. (llvm#106058)
  [LegalizeVectorTypes] When widening don't check for libcalls if promoted (llvm#111297)
  [clang][Driver] Improve multilib custom error reporting (llvm#110804)
  [clang][Driver] Rename "FatalError" key to "Error" in multilib.yaml (llvm#110804)
  [LLVM][Maintainers] Update release managers (llvm#111164)
  [Clang][Driver] Add option to provide path for multilib's YAML config file (llvm#109640)
  [LoopVectorize] Remove redundant code in emitSCEVChecks (llvm#111132)
  [AMDGPU] Only emit SCOPE_SYS global_wb (llvm#110636)
  [ELF] Change Ctx::target to unique_ptr (llvm#111260)
  [ELF] Pass Ctx & to some free functions
  [RISCV] Only disassemble fcvtmod.w.d if the rounding mode is rtz. (llvm#111308)
  [Clang] Remove the special-casing for RequiresExprBodyDecl in BuildResolvedCallExpr() after fd87d76 (llvm#111277)
  [ELF] Pass Ctx & to InputFile
  [clang-format] Add AlignFunctionDeclarations to AlignConsecutiveDeclarations (llvm#108241)
  [AMDGPU] Support preloading hidden kernel arguments (llvm#98861)
  [ELF] Move static nextGroupId isInGroup to LinkerDriver
  [clangd] Add ArgumentLists config option under Completion (llvm#111322)
  [ELF] Pass Ctx & to SyntheticSections
  [ELF] Pass Ctx & to Symbols
  [ELF] Pass Ctx & to Symbols
  [ELF] getRelocTargetVA: pass Ctx and Relocation. NFC
  [clang-tidy] Avoid capturing a local variable in a static lambda in UseRangesCheck (llvm#111282)
  [VPlan] Use pointer to member 0 as VPInterleaveRecipe's pointer arg. (llvm#106431)
  [clangd] Simplify ternary expressions with std::optional::value_or (NFC) (llvm#111309)
  [libc++][format][2/3] Optimizes c-string arguments. (llvm#101805)
  [RISCV] Combine RVBUnary and RVKUnary into classes that are more similar to ALU(W)_r(r/i). NFC (llvm#111279)
  [ELF] Pass Ctx & to InputFiles
  [libc] GPU RPC interface: add return value to `rpc_host_call` (llvm#111288)

Signed-off-by: kyvangka1610 <kyvangka2002@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
llvm:SelectionDAG SelectionDAGISel as well
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants