Skip to content

llvm.canonicalize in common code #142105

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

dominik-steenken
Copy link
Contributor

@dominik-steenken dominik-steenken commented May 30, 2025

This PR takes the work previously done by @pawan-nirpal-031 on X86 in #106370, and makes it available in common code. This should enable all targets to use __builtin_canonicalize at least for f32 and f64 data types.

Canonicalization is implemented here as multiplication by 1.0, as suggested in the docs.

Moving the X86 implementation to common code was suggested in the reviews of #106370 by @arsenm .

@pawan-nirpal-031 I hope you don't mind that i went ahead with this to get it working on SystemZ.

I don't have a lot of experience with this part of the code base yet, so reviews are very welcome.

@llvmbot llvmbot added backend:SystemZ llvm:SelectionDAG SelectionDAGISel as well labels May 30, 2025
@dominik-steenken
Copy link
Contributor Author

@uweigand FYI

@llvmbot
Copy link
Member

llvmbot commented May 30, 2025

@llvm/pr-subscribers-backend-loongarch
@llvm/pr-subscribers-backend-powerpc
@llvm/pr-subscribers-backend-amdgpu
@llvm/pr-subscribers-backend-aarch64

@llvm/pr-subscribers-llvm-selectiondag

Author: Dominik Steenken (dominik-steenken)

Changes

This PR takes the work previously done by @pawan-nirpal-031 on X86 in #106370, and makes it available in common code. This should enable all targets to use __builtin_canonicalize at least for f32 and f64 data types.

Canonicalization is implemented here as multiplication by 1.0, as suggested in the docs.

Moving the X86 implementation to common code was suggested in the reviews of #106370 by @arsenm .

@pawan-nirpal-031 I hope you don't mind that i went ahead with this to getit working on SystemZ.

I don't have a lot of experience with this part of the code base yet, so reviews are very welcome.


Full diff: https://github.com/llvm/llvm-project/pull/142105.diff

3 Files Affected:

  • (modified) llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp (+17)
  • (modified) llvm/lib/CodeGen/TargetLoweringBase.cpp (+3)
  • (added) llvm/test/CodeGen/SystemZ/canonicalize-vars.ll (+141)
diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp b/llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
index 528c07cc5549d..6599704aaf290 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
@@ -3356,6 +3356,23 @@ bool SelectionDAGLegalize::ExpandNode(SDNode *Node) {
     Results.push_back(Op);
     break;
   }
+  case ISD::FCANONICALIZE: {
+    // This implements llvm.canonicalize.f* by multiplication with 1.0,
+    // as suggested in https://llvm.org/docs/LangRef.html#id2335.
+    // Get operand x.
+    SDValue Operand = Node->getOperand(0);
+    // Get fp value type used.
+    EVT VT = Operand.getValueType();
+    // Produce appropriately-typed 1.0 constant.
+    SDValue One = DAG.getConstantFP(1.0, dl, VT);
+    // Produce multiplication node x * 1.0.
+    SDValue Chain = DAG.getEntryNode();
+    SDValue Mul = DAG.getNode(ISD::STRICT_FMUL, dl, {VT, MVT::Other},
+                              {Chain, Operand, One});
+
+    Results.push_back(Mul);
+    break;
+  }
   case ISD::SIGN_EXTEND_INREG: {
     EVT ExtraVT = cast<VTSDNode>(Node->getOperand(1))->getVT();
     EVT VT = Node->getValueType(0);
diff --git a/llvm/lib/CodeGen/TargetLoweringBase.cpp b/llvm/lib/CodeGen/TargetLoweringBase.cpp
index 935afaf9dd550..ba765afd80bf1 100644
--- a/llvm/lib/CodeGen/TargetLoweringBase.cpp
+++ b/llvm/lib/CodeGen/TargetLoweringBase.cpp
@@ -870,6 +870,9 @@ void TargetLoweringBase::initActions() {
                       ISD::FATAN2},
                      {MVT::f32, MVT::f64, MVT::f128}, Expand);
 
+  // Insert custom handling default for llvm.canonicalize.*.
+  setOperationAction(ISD::FCANONICALIZE, {MVT::f32, MVT::f64}, Expand);
+
   // FIXME: Query RuntimeLibCalls to make the decision.
   setOperationAction({ISD::LRINT, ISD::LLRINT, ISD::LROUND, ISD::LLROUND},
                      {MVT::f32, MVT::f64, MVT::f128}, LibCall);
diff --git a/llvm/test/CodeGen/SystemZ/canonicalize-vars.ll b/llvm/test/CodeGen/SystemZ/canonicalize-vars.ll
new file mode 100644
index 0000000000000..477f5e1547567
--- /dev/null
+++ b/llvm/test/CodeGen/SystemZ/canonicalize-vars.ll
@@ -0,0 +1,141 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --default-march s390x-unknown-linux-gnu --version 5
+; RUN: llc -mtriple=s390x-linux-gnu -mcpu=z16 < %s | FileCheck %s -check-prefixes=Z16
+
+define float @canonicalize_fp32(float %a) {
+; Z16-LABEL: canonicalize_fp32:
+; Z16:       # %bb.0:
+; Z16-NEXT:    vgmf %v1, 2, 8
+; Z16-NEXT:    meebr %f0, %f1
+; Z16-NEXT:    br %r14
+  %canonicalized = call float @llvm.canonicalize.f32(float %a)
+  ret float %canonicalized
+}
+
+define double @canonicalize_fp64(double %a) {
+; Z16-LABEL: canonicalize_fp64:
+; Z16:       # %bb.0:
+; Z16-NEXT:    vgmg %v1, 2, 11
+; Z16-NEXT:    mdbr %f0, %f1
+; Z16-NEXT:    br %r14
+  %canonicalized = call double @llvm.canonicalize.f64(double %a)
+  ret double %canonicalized
+}
+
+define void @canonicalize_ptr_f32(float * %out) {
+; Z16-LABEL: canonicalize_ptr_f32:
+; Z16:       # %bb.0:
+; Z16-NEXT:    vgmf %v0, 2, 8
+; Z16-NEXT:    meeb %f0, 0(%r2)
+; Z16-NEXT:    ste %f0, 0(%r2)
+; Z16-NEXT:    br %r14
+  %val = load float, float * %out
+  %canonicalized = call float @llvm.canonicalize.f32(float %val)
+  store float %canonicalized, float * %out
+  ret void
+}
+
+define void @canonicalize_ptr_f64(double * %out) {
+; Z16-LABEL: canonicalize_ptr_f64:
+; Z16:       # %bb.0:
+; Z16-NEXT:    vgmg %v0, 2, 11
+; Z16-NEXT:    mdb %f0, 0(%r2)
+; Z16-NEXT:    std %f0, 0(%r2)
+; Z16-NEXT:    br %r14
+  %val = load double, double * %out
+  %canonicalized = call double @llvm.canonicalize.f64(double %val)
+  store double %canonicalized, double * %out
+  ret void
+}
+
+define <4 x float> @canonicalize_v4f32(<4 x float> %a) {
+; Z16-LABEL: canonicalize_v4f32:
+; Z16:       # %bb.0:
+; Z16-NEXT:    vrepf %v0, %v24, 3
+; Z16-NEXT:    vgmf %v1, 2, 8
+; Z16-NEXT:    vrepf %v2, %v24, 2
+; Z16-NEXT:    meebr %f0, %f1
+; Z16-NEXT:    meebr %f2, %f1
+; Z16-NEXT:    vrepf %v3, %v24, 1
+; Z16-NEXT:    vmrhf %v0, %v2, %v0
+; Z16-NEXT:    wfmsb %f2, %v24, %f1
+; Z16-NEXT:    wfmsb %f1, %f3, %f1
+; Z16-NEXT:    vmrhf %v1, %v2, %v1
+; Z16-NEXT:    vmrhg %v24, %v1, %v0
+; Z16-NEXT:    br %r14
+  %canonicalized = call <4 x float> @llvm.canonicalize.v4f32(<4 x float> %a)
+  ret <4 x float> %canonicalized
+}
+
+define <4 x double> @canonicalize_v4f64(<4 x double> %a) {
+; Z16-LABEL: canonicalize_v4f64:
+; Z16:       # %bb.0:
+; Z16-NEXT:    vgmg %v0, 2, 11
+; Z16-NEXT:    vrepg %v2, %v24, 1
+; Z16-NEXT:    wfmdb %f1, %v24, %f0
+; Z16-NEXT:    mdbr %f2, %f0
+; Z16-NEXT:    vmrhg %v24, %v1, %v2
+; Z16-NEXT:    vrepg %v2, %v26, 1
+; Z16-NEXT:    wfmdb %f1, %v26, %f0
+; Z16-NEXT:    wfmdb %f0, %f2, %f0
+; Z16-NEXT:    vmrhg %v26, %v1, %v0
+; Z16-NEXT:    br %r14
+  %canonicalized = call <4 x double> @llvm.canonicalize.v4f64(<4 x double> %a)
+  ret <4 x double> %canonicalized
+}
+
+define void @canonicalize_ptr_v4f32(<4 x float> * %out) {
+; Z16-LABEL: canonicalize_ptr_v4f32:
+; Z16:       # %bb.0:
+; Z16-NEXT:    vl %v0, 0(%r2), 3
+; Z16-NEXT:    vrepf %v1, %v0, 3
+; Z16-NEXT:    vgmf %v2, 2, 8
+; Z16-NEXT:    vrepf %v3, %v0, 2
+; Z16-NEXT:    meebr %f1, %f2
+; Z16-NEXT:    meebr %f3, %f2
+; Z16-NEXT:    vmrhf %v1, %v3, %v1
+; Z16-NEXT:    wfmsb %f3, %f0, %f2
+; Z16-NEXT:    vrepf %v0, %v0, 1
+; Z16-NEXT:    meebr %f0, %f2
+; Z16-NEXT:    vmrhf %v0, %v3, %v0
+; Z16-NEXT:    vmrhg %v0, %v0, %v1
+; Z16-NEXT:    vst %v0, 0(%r2), 3
+; Z16-NEXT:    br %r14
+  %val = load <4 x float>, <4 x float> * %out
+  %canonicalized = call <4 x float> @llvm.canonicalize.v4f32(<4 x float> %val)
+  store <4 x float> %canonicalized, <4 x float> * %out
+  ret void
+}
+
+define void @canonicalize_ptr_v4f64(<4 x double> * %out) {
+; Z16-LABEL: canonicalize_ptr_v4f64:
+; Z16:       # %bb.0:
+; Z16-NEXT:    vl %v1, 16(%r2), 4
+; Z16-NEXT:    vgmg %v2, 2, 11
+; Z16-NEXT:    wfmdb %f3, %f1, %f2
+; Z16-NEXT:    vrepg %v1, %v1, 1
+; Z16-NEXT:    mdbr %f1, %f2
+; Z16-NEXT:    vl %v0, 0(%r2), 4
+; Z16-NEXT:    vmrhg %v1, %v3, %v1
+; Z16-NEXT:    wfmdb %f3, %f0, %f2
+; Z16-NEXT:    vrepg %v0, %v0, 1
+; Z16-NEXT:    mdbr %f0, %f2
+; Z16-NEXT:    vmrhg %v0, %v3, %v0
+; Z16-NEXT:    vst %v0, 0(%r2), 4
+; Z16-NEXT:    vst %v1, 16(%r2), 4
+; Z16-NEXT:    br %r14
+  %val = load <4 x double>, <4 x double> * %out
+  %canonicalized = call <4 x double> @llvm.canonicalize.v4f64(<4 x double> %val)
+  store <4 x double> %canonicalized, <4 x double> * %out
+  ret void
+}
+
+define void @canonicalize_undef(double * %out) {
+; Z16-LABEL: canonicalize_undef:
+; Z16:       # %bb.0:
+; Z16-NEXT:    llihh %r0, 32760
+; Z16-NEXT:    stg %r0, 0(%r2)
+; Z16-NEXT:    br %r14
+  %canonicalized = call double @llvm.canonicalize.f64(double undef)
+  store double %canonicalized, double * %out
+  ret void
+}

@llvmbot
Copy link
Member

llvmbot commented May 30, 2025

@llvm/pr-subscribers-backend-systemz

Author: Dominik Steenken (dominik-steenken)

Changes

This PR takes the work previously done by @pawan-nirpal-031 on X86 in #106370, and makes it available in common code. This should enable all targets to use __builtin_canonicalize at least for f32 and f64 data types.

Canonicalization is implemented here as multiplication by 1.0, as suggested in the docs.

Moving the X86 implementation to common code was suggested in the reviews of #106370 by @arsenm .

@pawan-nirpal-031 I hope you don't mind that i went ahead with this to getit working on SystemZ.

I don't have a lot of experience with this part of the code base yet, so reviews are very welcome.


Full diff: https://github.com/llvm/llvm-project/pull/142105.diff

3 Files Affected:

  • (modified) llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp (+17)
  • (modified) llvm/lib/CodeGen/TargetLoweringBase.cpp (+3)
  • (added) llvm/test/CodeGen/SystemZ/canonicalize-vars.ll (+141)
diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp b/llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
index 528c07cc5549d..6599704aaf290 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
@@ -3356,6 +3356,23 @@ bool SelectionDAGLegalize::ExpandNode(SDNode *Node) {
     Results.push_back(Op);
     break;
   }
+  case ISD::FCANONICALIZE: {
+    // This implements llvm.canonicalize.f* by multiplication with 1.0,
+    // as suggested in https://llvm.org/docs/LangRef.html#id2335.
+    // Get operand x.
+    SDValue Operand = Node->getOperand(0);
+    // Get fp value type used.
+    EVT VT = Operand.getValueType();
+    // Produce appropriately-typed 1.0 constant.
+    SDValue One = DAG.getConstantFP(1.0, dl, VT);
+    // Produce multiplication node x * 1.0.
+    SDValue Chain = DAG.getEntryNode();
+    SDValue Mul = DAG.getNode(ISD::STRICT_FMUL, dl, {VT, MVT::Other},
+                              {Chain, Operand, One});
+
+    Results.push_back(Mul);
+    break;
+  }
   case ISD::SIGN_EXTEND_INREG: {
     EVT ExtraVT = cast<VTSDNode>(Node->getOperand(1))->getVT();
     EVT VT = Node->getValueType(0);
diff --git a/llvm/lib/CodeGen/TargetLoweringBase.cpp b/llvm/lib/CodeGen/TargetLoweringBase.cpp
index 935afaf9dd550..ba765afd80bf1 100644
--- a/llvm/lib/CodeGen/TargetLoweringBase.cpp
+++ b/llvm/lib/CodeGen/TargetLoweringBase.cpp
@@ -870,6 +870,9 @@ void TargetLoweringBase::initActions() {
                       ISD::FATAN2},
                      {MVT::f32, MVT::f64, MVT::f128}, Expand);
 
+  // Insert custom handling default for llvm.canonicalize.*.
+  setOperationAction(ISD::FCANONICALIZE, {MVT::f32, MVT::f64}, Expand);
+
   // FIXME: Query RuntimeLibCalls to make the decision.
   setOperationAction({ISD::LRINT, ISD::LLRINT, ISD::LROUND, ISD::LLROUND},
                      {MVT::f32, MVT::f64, MVT::f128}, LibCall);
diff --git a/llvm/test/CodeGen/SystemZ/canonicalize-vars.ll b/llvm/test/CodeGen/SystemZ/canonicalize-vars.ll
new file mode 100644
index 0000000000000..477f5e1547567
--- /dev/null
+++ b/llvm/test/CodeGen/SystemZ/canonicalize-vars.ll
@@ -0,0 +1,141 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --default-march s390x-unknown-linux-gnu --version 5
+; RUN: llc -mtriple=s390x-linux-gnu -mcpu=z16 < %s | FileCheck %s -check-prefixes=Z16
+
+define float @canonicalize_fp32(float %a) {
+; Z16-LABEL: canonicalize_fp32:
+; Z16:       # %bb.0:
+; Z16-NEXT:    vgmf %v1, 2, 8
+; Z16-NEXT:    meebr %f0, %f1
+; Z16-NEXT:    br %r14
+  %canonicalized = call float @llvm.canonicalize.f32(float %a)
+  ret float %canonicalized
+}
+
+define double @canonicalize_fp64(double %a) {
+; Z16-LABEL: canonicalize_fp64:
+; Z16:       # %bb.0:
+; Z16-NEXT:    vgmg %v1, 2, 11
+; Z16-NEXT:    mdbr %f0, %f1
+; Z16-NEXT:    br %r14
+  %canonicalized = call double @llvm.canonicalize.f64(double %a)
+  ret double %canonicalized
+}
+
+define void @canonicalize_ptr_f32(float * %out) {
+; Z16-LABEL: canonicalize_ptr_f32:
+; Z16:       # %bb.0:
+; Z16-NEXT:    vgmf %v0, 2, 8
+; Z16-NEXT:    meeb %f0, 0(%r2)
+; Z16-NEXT:    ste %f0, 0(%r2)
+; Z16-NEXT:    br %r14
+  %val = load float, float * %out
+  %canonicalized = call float @llvm.canonicalize.f32(float %val)
+  store float %canonicalized, float * %out
+  ret void
+}
+
+define void @canonicalize_ptr_f64(double * %out) {
+; Z16-LABEL: canonicalize_ptr_f64:
+; Z16:       # %bb.0:
+; Z16-NEXT:    vgmg %v0, 2, 11
+; Z16-NEXT:    mdb %f0, 0(%r2)
+; Z16-NEXT:    std %f0, 0(%r2)
+; Z16-NEXT:    br %r14
+  %val = load double, double * %out
+  %canonicalized = call double @llvm.canonicalize.f64(double %val)
+  store double %canonicalized, double * %out
+  ret void
+}
+
+define <4 x float> @canonicalize_v4f32(<4 x float> %a) {
+; Z16-LABEL: canonicalize_v4f32:
+; Z16:       # %bb.0:
+; Z16-NEXT:    vrepf %v0, %v24, 3
+; Z16-NEXT:    vgmf %v1, 2, 8
+; Z16-NEXT:    vrepf %v2, %v24, 2
+; Z16-NEXT:    meebr %f0, %f1
+; Z16-NEXT:    meebr %f2, %f1
+; Z16-NEXT:    vrepf %v3, %v24, 1
+; Z16-NEXT:    vmrhf %v0, %v2, %v0
+; Z16-NEXT:    wfmsb %f2, %v24, %f1
+; Z16-NEXT:    wfmsb %f1, %f3, %f1
+; Z16-NEXT:    vmrhf %v1, %v2, %v1
+; Z16-NEXT:    vmrhg %v24, %v1, %v0
+; Z16-NEXT:    br %r14
+  %canonicalized = call <4 x float> @llvm.canonicalize.v4f32(<4 x float> %a)
+  ret <4 x float> %canonicalized
+}
+
+define <4 x double> @canonicalize_v4f64(<4 x double> %a) {
+; Z16-LABEL: canonicalize_v4f64:
+; Z16:       # %bb.0:
+; Z16-NEXT:    vgmg %v0, 2, 11
+; Z16-NEXT:    vrepg %v2, %v24, 1
+; Z16-NEXT:    wfmdb %f1, %v24, %f0
+; Z16-NEXT:    mdbr %f2, %f0
+; Z16-NEXT:    vmrhg %v24, %v1, %v2
+; Z16-NEXT:    vrepg %v2, %v26, 1
+; Z16-NEXT:    wfmdb %f1, %v26, %f0
+; Z16-NEXT:    wfmdb %f0, %f2, %f0
+; Z16-NEXT:    vmrhg %v26, %v1, %v0
+; Z16-NEXT:    br %r14
+  %canonicalized = call <4 x double> @llvm.canonicalize.v4f64(<4 x double> %a)
+  ret <4 x double> %canonicalized
+}
+
+define void @canonicalize_ptr_v4f32(<4 x float> * %out) {
+; Z16-LABEL: canonicalize_ptr_v4f32:
+; Z16:       # %bb.0:
+; Z16-NEXT:    vl %v0, 0(%r2), 3
+; Z16-NEXT:    vrepf %v1, %v0, 3
+; Z16-NEXT:    vgmf %v2, 2, 8
+; Z16-NEXT:    vrepf %v3, %v0, 2
+; Z16-NEXT:    meebr %f1, %f2
+; Z16-NEXT:    meebr %f3, %f2
+; Z16-NEXT:    vmrhf %v1, %v3, %v1
+; Z16-NEXT:    wfmsb %f3, %f0, %f2
+; Z16-NEXT:    vrepf %v0, %v0, 1
+; Z16-NEXT:    meebr %f0, %f2
+; Z16-NEXT:    vmrhf %v0, %v3, %v0
+; Z16-NEXT:    vmrhg %v0, %v0, %v1
+; Z16-NEXT:    vst %v0, 0(%r2), 3
+; Z16-NEXT:    br %r14
+  %val = load <4 x float>, <4 x float> * %out
+  %canonicalized = call <4 x float> @llvm.canonicalize.v4f32(<4 x float> %val)
+  store <4 x float> %canonicalized, <4 x float> * %out
+  ret void
+}
+
+define void @canonicalize_ptr_v4f64(<4 x double> * %out) {
+; Z16-LABEL: canonicalize_ptr_v4f64:
+; Z16:       # %bb.0:
+; Z16-NEXT:    vl %v1, 16(%r2), 4
+; Z16-NEXT:    vgmg %v2, 2, 11
+; Z16-NEXT:    wfmdb %f3, %f1, %f2
+; Z16-NEXT:    vrepg %v1, %v1, 1
+; Z16-NEXT:    mdbr %f1, %f2
+; Z16-NEXT:    vl %v0, 0(%r2), 4
+; Z16-NEXT:    vmrhg %v1, %v3, %v1
+; Z16-NEXT:    wfmdb %f3, %f0, %f2
+; Z16-NEXT:    vrepg %v0, %v0, 1
+; Z16-NEXT:    mdbr %f0, %f2
+; Z16-NEXT:    vmrhg %v0, %v3, %v0
+; Z16-NEXT:    vst %v0, 0(%r2), 4
+; Z16-NEXT:    vst %v1, 16(%r2), 4
+; Z16-NEXT:    br %r14
+  %val = load <4 x double>, <4 x double> * %out
+  %canonicalized = call <4 x double> @llvm.canonicalize.v4f64(<4 x double> %val)
+  store <4 x double> %canonicalized, <4 x double> * %out
+  ret void
+}
+
+define void @canonicalize_undef(double * %out) {
+; Z16-LABEL: canonicalize_undef:
+; Z16:       # %bb.0:
+; Z16-NEXT:    llihh %r0, 32760
+; Z16-NEXT:    stg %r0, 0(%r2)
+; Z16-NEXT:    br %r14
+  %canonicalized = call double @llvm.canonicalize.f64(double undef)
+  store double %canonicalized, double * %out
+  ret void
+}

Copy link

github-actions bot commented May 30, 2025

✅ With the latest revision this PR passed the undef deprecator.

Comment on lines 3360 to 3361
// This implements llvm.canonicalize.f* by multiplication with 1.0,
// as suggested in https://llvm.org/docs/LangRef.html#id2335.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably elaborate on how this is hacking in strictfp operations in non-strict functions

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what i understand, the point of the strictfp operation here is to prevent any optimization that would, for instance, recognize that 1.0 * x can be converted to just x. So would this suffice as an addition to the comment?

// We use `STRICT_FMUL` here to prevent this operation from being optimized out.

@pawan-nirpal-031
Copy link
Contributor

pawan-nirpal-031 commented May 30, 2025

Hi @dominik-steenken absolutely, Please go ahead with this, I have been wanting to get back and finish on the commitments as following PRs, this is much welcome. I will make some time to support canonicalization of denormals ( I have a PR just haven't been able to attend to it, if you need it sooner please let me know, I'll give you the patch and you may go ahead with it ) Thanks for your PR.

Previously, this intrinsic has been implemented in the X86 backend. As
suggested in the comments for the PR that merges that implementation (llvm#106370),
this commit replicates that implementation (at least for `f32` and `f64`) in
common code. Canonicalization is implemented as multiplication with 1.0.
These tests include only the floating point types supported in common code.
Once a SystemZ-specific implementation for `f16` and `f128` is available, these
tests will be expanded to include those types as well.

The tests are heavily based on the equivalent tests for X86.
…r f32,f64

Since the default OperationAction for `ISD::FCANONICALIZE` is now `Expand`,
these targets need to explicitly set it to `Legal` in their backends in order
to preserve their functionality.
@dominik-steenken
Copy link
Contributor Author

Ok, so i updated the comment as requested, got rid of the undef check in order to remove the error from the undef deprecator and added a bunch of setOperationAction(..., Legal) to various backends that had independently handled fcanonicalize via their Tablegen files in order to get the tests clean again. Would appreciate another review, especially from folks of the backends that i had to touch in order to fix the tests :).

@jayfoad
Copy link
Contributor

jayfoad commented Jun 5, 2025

__builtin.canonicalize in common code

This patch is about the LLVM intrinsic, not the Clang builtin, so the title should say llvm.canonicalize

@dominik-steenken dominik-steenken changed the title __builtin.canonicalize in common code llvm.canonicalize in common code Jun 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants