-
Notifications
You must be signed in to change notification settings - Fork 13.6k
llvm.canonicalize
in common code
#142105
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
llvm.canonicalize
in common code
#142105
Conversation
@uweigand FYI |
@llvm/pr-subscribers-backend-loongarch @llvm/pr-subscribers-llvm-selectiondag Author: Dominik Steenken (dominik-steenken) ChangesThis PR takes the work previously done by @pawan-nirpal-031 on X86 in #106370, and makes it available in common code. This should enable all targets to use Canonicalization is implemented here as multiplication by Moving the X86 implementation to common code was suggested in the reviews of #106370 by @arsenm . @pawan-nirpal-031 I hope you don't mind that i went ahead with this to getit working on SystemZ. I don't have a lot of experience with this part of the code base yet, so reviews are very welcome. Full diff: https://github.com/llvm/llvm-project/pull/142105.diff 3 Files Affected:
diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp b/llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
index 528c07cc5549d..6599704aaf290 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
@@ -3356,6 +3356,23 @@ bool SelectionDAGLegalize::ExpandNode(SDNode *Node) {
Results.push_back(Op);
break;
}
+ case ISD::FCANONICALIZE: {
+ // This implements llvm.canonicalize.f* by multiplication with 1.0,
+ // as suggested in https://llvm.org/docs/LangRef.html#id2335.
+ // Get operand x.
+ SDValue Operand = Node->getOperand(0);
+ // Get fp value type used.
+ EVT VT = Operand.getValueType();
+ // Produce appropriately-typed 1.0 constant.
+ SDValue One = DAG.getConstantFP(1.0, dl, VT);
+ // Produce multiplication node x * 1.0.
+ SDValue Chain = DAG.getEntryNode();
+ SDValue Mul = DAG.getNode(ISD::STRICT_FMUL, dl, {VT, MVT::Other},
+ {Chain, Operand, One});
+
+ Results.push_back(Mul);
+ break;
+ }
case ISD::SIGN_EXTEND_INREG: {
EVT ExtraVT = cast<VTSDNode>(Node->getOperand(1))->getVT();
EVT VT = Node->getValueType(0);
diff --git a/llvm/lib/CodeGen/TargetLoweringBase.cpp b/llvm/lib/CodeGen/TargetLoweringBase.cpp
index 935afaf9dd550..ba765afd80bf1 100644
--- a/llvm/lib/CodeGen/TargetLoweringBase.cpp
+++ b/llvm/lib/CodeGen/TargetLoweringBase.cpp
@@ -870,6 +870,9 @@ void TargetLoweringBase::initActions() {
ISD::FATAN2},
{MVT::f32, MVT::f64, MVT::f128}, Expand);
+ // Insert custom handling default for llvm.canonicalize.*.
+ setOperationAction(ISD::FCANONICALIZE, {MVT::f32, MVT::f64}, Expand);
+
// FIXME: Query RuntimeLibCalls to make the decision.
setOperationAction({ISD::LRINT, ISD::LLRINT, ISD::LROUND, ISD::LLROUND},
{MVT::f32, MVT::f64, MVT::f128}, LibCall);
diff --git a/llvm/test/CodeGen/SystemZ/canonicalize-vars.ll b/llvm/test/CodeGen/SystemZ/canonicalize-vars.ll
new file mode 100644
index 0000000000000..477f5e1547567
--- /dev/null
+++ b/llvm/test/CodeGen/SystemZ/canonicalize-vars.ll
@@ -0,0 +1,141 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --default-march s390x-unknown-linux-gnu --version 5
+; RUN: llc -mtriple=s390x-linux-gnu -mcpu=z16 < %s | FileCheck %s -check-prefixes=Z16
+
+define float @canonicalize_fp32(float %a) {
+; Z16-LABEL: canonicalize_fp32:
+; Z16: # %bb.0:
+; Z16-NEXT: vgmf %v1, 2, 8
+; Z16-NEXT: meebr %f0, %f1
+; Z16-NEXT: br %r14
+ %canonicalized = call float @llvm.canonicalize.f32(float %a)
+ ret float %canonicalized
+}
+
+define double @canonicalize_fp64(double %a) {
+; Z16-LABEL: canonicalize_fp64:
+; Z16: # %bb.0:
+; Z16-NEXT: vgmg %v1, 2, 11
+; Z16-NEXT: mdbr %f0, %f1
+; Z16-NEXT: br %r14
+ %canonicalized = call double @llvm.canonicalize.f64(double %a)
+ ret double %canonicalized
+}
+
+define void @canonicalize_ptr_f32(float * %out) {
+; Z16-LABEL: canonicalize_ptr_f32:
+; Z16: # %bb.0:
+; Z16-NEXT: vgmf %v0, 2, 8
+; Z16-NEXT: meeb %f0, 0(%r2)
+; Z16-NEXT: ste %f0, 0(%r2)
+; Z16-NEXT: br %r14
+ %val = load float, float * %out
+ %canonicalized = call float @llvm.canonicalize.f32(float %val)
+ store float %canonicalized, float * %out
+ ret void
+}
+
+define void @canonicalize_ptr_f64(double * %out) {
+; Z16-LABEL: canonicalize_ptr_f64:
+; Z16: # %bb.0:
+; Z16-NEXT: vgmg %v0, 2, 11
+; Z16-NEXT: mdb %f0, 0(%r2)
+; Z16-NEXT: std %f0, 0(%r2)
+; Z16-NEXT: br %r14
+ %val = load double, double * %out
+ %canonicalized = call double @llvm.canonicalize.f64(double %val)
+ store double %canonicalized, double * %out
+ ret void
+}
+
+define <4 x float> @canonicalize_v4f32(<4 x float> %a) {
+; Z16-LABEL: canonicalize_v4f32:
+; Z16: # %bb.0:
+; Z16-NEXT: vrepf %v0, %v24, 3
+; Z16-NEXT: vgmf %v1, 2, 8
+; Z16-NEXT: vrepf %v2, %v24, 2
+; Z16-NEXT: meebr %f0, %f1
+; Z16-NEXT: meebr %f2, %f1
+; Z16-NEXT: vrepf %v3, %v24, 1
+; Z16-NEXT: vmrhf %v0, %v2, %v0
+; Z16-NEXT: wfmsb %f2, %v24, %f1
+; Z16-NEXT: wfmsb %f1, %f3, %f1
+; Z16-NEXT: vmrhf %v1, %v2, %v1
+; Z16-NEXT: vmrhg %v24, %v1, %v0
+; Z16-NEXT: br %r14
+ %canonicalized = call <4 x float> @llvm.canonicalize.v4f32(<4 x float> %a)
+ ret <4 x float> %canonicalized
+}
+
+define <4 x double> @canonicalize_v4f64(<4 x double> %a) {
+; Z16-LABEL: canonicalize_v4f64:
+; Z16: # %bb.0:
+; Z16-NEXT: vgmg %v0, 2, 11
+; Z16-NEXT: vrepg %v2, %v24, 1
+; Z16-NEXT: wfmdb %f1, %v24, %f0
+; Z16-NEXT: mdbr %f2, %f0
+; Z16-NEXT: vmrhg %v24, %v1, %v2
+; Z16-NEXT: vrepg %v2, %v26, 1
+; Z16-NEXT: wfmdb %f1, %v26, %f0
+; Z16-NEXT: wfmdb %f0, %f2, %f0
+; Z16-NEXT: vmrhg %v26, %v1, %v0
+; Z16-NEXT: br %r14
+ %canonicalized = call <4 x double> @llvm.canonicalize.v4f64(<4 x double> %a)
+ ret <4 x double> %canonicalized
+}
+
+define void @canonicalize_ptr_v4f32(<4 x float> * %out) {
+; Z16-LABEL: canonicalize_ptr_v4f32:
+; Z16: # %bb.0:
+; Z16-NEXT: vl %v0, 0(%r2), 3
+; Z16-NEXT: vrepf %v1, %v0, 3
+; Z16-NEXT: vgmf %v2, 2, 8
+; Z16-NEXT: vrepf %v3, %v0, 2
+; Z16-NEXT: meebr %f1, %f2
+; Z16-NEXT: meebr %f3, %f2
+; Z16-NEXT: vmrhf %v1, %v3, %v1
+; Z16-NEXT: wfmsb %f3, %f0, %f2
+; Z16-NEXT: vrepf %v0, %v0, 1
+; Z16-NEXT: meebr %f0, %f2
+; Z16-NEXT: vmrhf %v0, %v3, %v0
+; Z16-NEXT: vmrhg %v0, %v0, %v1
+; Z16-NEXT: vst %v0, 0(%r2), 3
+; Z16-NEXT: br %r14
+ %val = load <4 x float>, <4 x float> * %out
+ %canonicalized = call <4 x float> @llvm.canonicalize.v4f32(<4 x float> %val)
+ store <4 x float> %canonicalized, <4 x float> * %out
+ ret void
+}
+
+define void @canonicalize_ptr_v4f64(<4 x double> * %out) {
+; Z16-LABEL: canonicalize_ptr_v4f64:
+; Z16: # %bb.0:
+; Z16-NEXT: vl %v1, 16(%r2), 4
+; Z16-NEXT: vgmg %v2, 2, 11
+; Z16-NEXT: wfmdb %f3, %f1, %f2
+; Z16-NEXT: vrepg %v1, %v1, 1
+; Z16-NEXT: mdbr %f1, %f2
+; Z16-NEXT: vl %v0, 0(%r2), 4
+; Z16-NEXT: vmrhg %v1, %v3, %v1
+; Z16-NEXT: wfmdb %f3, %f0, %f2
+; Z16-NEXT: vrepg %v0, %v0, 1
+; Z16-NEXT: mdbr %f0, %f2
+; Z16-NEXT: vmrhg %v0, %v3, %v0
+; Z16-NEXT: vst %v0, 0(%r2), 4
+; Z16-NEXT: vst %v1, 16(%r2), 4
+; Z16-NEXT: br %r14
+ %val = load <4 x double>, <4 x double> * %out
+ %canonicalized = call <4 x double> @llvm.canonicalize.v4f64(<4 x double> %val)
+ store <4 x double> %canonicalized, <4 x double> * %out
+ ret void
+}
+
+define void @canonicalize_undef(double * %out) {
+; Z16-LABEL: canonicalize_undef:
+; Z16: # %bb.0:
+; Z16-NEXT: llihh %r0, 32760
+; Z16-NEXT: stg %r0, 0(%r2)
+; Z16-NEXT: br %r14
+ %canonicalized = call double @llvm.canonicalize.f64(double undef)
+ store double %canonicalized, double * %out
+ ret void
+}
|
@llvm/pr-subscribers-backend-systemz Author: Dominik Steenken (dominik-steenken) ChangesThis PR takes the work previously done by @pawan-nirpal-031 on X86 in #106370, and makes it available in common code. This should enable all targets to use Canonicalization is implemented here as multiplication by Moving the X86 implementation to common code was suggested in the reviews of #106370 by @arsenm . @pawan-nirpal-031 I hope you don't mind that i went ahead with this to getit working on SystemZ. I don't have a lot of experience with this part of the code base yet, so reviews are very welcome. Full diff: https://github.com/llvm/llvm-project/pull/142105.diff 3 Files Affected:
diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp b/llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
index 528c07cc5549d..6599704aaf290 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
@@ -3356,6 +3356,23 @@ bool SelectionDAGLegalize::ExpandNode(SDNode *Node) {
Results.push_back(Op);
break;
}
+ case ISD::FCANONICALIZE: {
+ // This implements llvm.canonicalize.f* by multiplication with 1.0,
+ // as suggested in https://llvm.org/docs/LangRef.html#id2335.
+ // Get operand x.
+ SDValue Operand = Node->getOperand(0);
+ // Get fp value type used.
+ EVT VT = Operand.getValueType();
+ // Produce appropriately-typed 1.0 constant.
+ SDValue One = DAG.getConstantFP(1.0, dl, VT);
+ // Produce multiplication node x * 1.0.
+ SDValue Chain = DAG.getEntryNode();
+ SDValue Mul = DAG.getNode(ISD::STRICT_FMUL, dl, {VT, MVT::Other},
+ {Chain, Operand, One});
+
+ Results.push_back(Mul);
+ break;
+ }
case ISD::SIGN_EXTEND_INREG: {
EVT ExtraVT = cast<VTSDNode>(Node->getOperand(1))->getVT();
EVT VT = Node->getValueType(0);
diff --git a/llvm/lib/CodeGen/TargetLoweringBase.cpp b/llvm/lib/CodeGen/TargetLoweringBase.cpp
index 935afaf9dd550..ba765afd80bf1 100644
--- a/llvm/lib/CodeGen/TargetLoweringBase.cpp
+++ b/llvm/lib/CodeGen/TargetLoweringBase.cpp
@@ -870,6 +870,9 @@ void TargetLoweringBase::initActions() {
ISD::FATAN2},
{MVT::f32, MVT::f64, MVT::f128}, Expand);
+ // Insert custom handling default for llvm.canonicalize.*.
+ setOperationAction(ISD::FCANONICALIZE, {MVT::f32, MVT::f64}, Expand);
+
// FIXME: Query RuntimeLibCalls to make the decision.
setOperationAction({ISD::LRINT, ISD::LLRINT, ISD::LROUND, ISD::LLROUND},
{MVT::f32, MVT::f64, MVT::f128}, LibCall);
diff --git a/llvm/test/CodeGen/SystemZ/canonicalize-vars.ll b/llvm/test/CodeGen/SystemZ/canonicalize-vars.ll
new file mode 100644
index 0000000000000..477f5e1547567
--- /dev/null
+++ b/llvm/test/CodeGen/SystemZ/canonicalize-vars.ll
@@ -0,0 +1,141 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --default-march s390x-unknown-linux-gnu --version 5
+; RUN: llc -mtriple=s390x-linux-gnu -mcpu=z16 < %s | FileCheck %s -check-prefixes=Z16
+
+define float @canonicalize_fp32(float %a) {
+; Z16-LABEL: canonicalize_fp32:
+; Z16: # %bb.0:
+; Z16-NEXT: vgmf %v1, 2, 8
+; Z16-NEXT: meebr %f0, %f1
+; Z16-NEXT: br %r14
+ %canonicalized = call float @llvm.canonicalize.f32(float %a)
+ ret float %canonicalized
+}
+
+define double @canonicalize_fp64(double %a) {
+; Z16-LABEL: canonicalize_fp64:
+; Z16: # %bb.0:
+; Z16-NEXT: vgmg %v1, 2, 11
+; Z16-NEXT: mdbr %f0, %f1
+; Z16-NEXT: br %r14
+ %canonicalized = call double @llvm.canonicalize.f64(double %a)
+ ret double %canonicalized
+}
+
+define void @canonicalize_ptr_f32(float * %out) {
+; Z16-LABEL: canonicalize_ptr_f32:
+; Z16: # %bb.0:
+; Z16-NEXT: vgmf %v0, 2, 8
+; Z16-NEXT: meeb %f0, 0(%r2)
+; Z16-NEXT: ste %f0, 0(%r2)
+; Z16-NEXT: br %r14
+ %val = load float, float * %out
+ %canonicalized = call float @llvm.canonicalize.f32(float %val)
+ store float %canonicalized, float * %out
+ ret void
+}
+
+define void @canonicalize_ptr_f64(double * %out) {
+; Z16-LABEL: canonicalize_ptr_f64:
+; Z16: # %bb.0:
+; Z16-NEXT: vgmg %v0, 2, 11
+; Z16-NEXT: mdb %f0, 0(%r2)
+; Z16-NEXT: std %f0, 0(%r2)
+; Z16-NEXT: br %r14
+ %val = load double, double * %out
+ %canonicalized = call double @llvm.canonicalize.f64(double %val)
+ store double %canonicalized, double * %out
+ ret void
+}
+
+define <4 x float> @canonicalize_v4f32(<4 x float> %a) {
+; Z16-LABEL: canonicalize_v4f32:
+; Z16: # %bb.0:
+; Z16-NEXT: vrepf %v0, %v24, 3
+; Z16-NEXT: vgmf %v1, 2, 8
+; Z16-NEXT: vrepf %v2, %v24, 2
+; Z16-NEXT: meebr %f0, %f1
+; Z16-NEXT: meebr %f2, %f1
+; Z16-NEXT: vrepf %v3, %v24, 1
+; Z16-NEXT: vmrhf %v0, %v2, %v0
+; Z16-NEXT: wfmsb %f2, %v24, %f1
+; Z16-NEXT: wfmsb %f1, %f3, %f1
+; Z16-NEXT: vmrhf %v1, %v2, %v1
+; Z16-NEXT: vmrhg %v24, %v1, %v0
+; Z16-NEXT: br %r14
+ %canonicalized = call <4 x float> @llvm.canonicalize.v4f32(<4 x float> %a)
+ ret <4 x float> %canonicalized
+}
+
+define <4 x double> @canonicalize_v4f64(<4 x double> %a) {
+; Z16-LABEL: canonicalize_v4f64:
+; Z16: # %bb.0:
+; Z16-NEXT: vgmg %v0, 2, 11
+; Z16-NEXT: vrepg %v2, %v24, 1
+; Z16-NEXT: wfmdb %f1, %v24, %f0
+; Z16-NEXT: mdbr %f2, %f0
+; Z16-NEXT: vmrhg %v24, %v1, %v2
+; Z16-NEXT: vrepg %v2, %v26, 1
+; Z16-NEXT: wfmdb %f1, %v26, %f0
+; Z16-NEXT: wfmdb %f0, %f2, %f0
+; Z16-NEXT: vmrhg %v26, %v1, %v0
+; Z16-NEXT: br %r14
+ %canonicalized = call <4 x double> @llvm.canonicalize.v4f64(<4 x double> %a)
+ ret <4 x double> %canonicalized
+}
+
+define void @canonicalize_ptr_v4f32(<4 x float> * %out) {
+; Z16-LABEL: canonicalize_ptr_v4f32:
+; Z16: # %bb.0:
+; Z16-NEXT: vl %v0, 0(%r2), 3
+; Z16-NEXT: vrepf %v1, %v0, 3
+; Z16-NEXT: vgmf %v2, 2, 8
+; Z16-NEXT: vrepf %v3, %v0, 2
+; Z16-NEXT: meebr %f1, %f2
+; Z16-NEXT: meebr %f3, %f2
+; Z16-NEXT: vmrhf %v1, %v3, %v1
+; Z16-NEXT: wfmsb %f3, %f0, %f2
+; Z16-NEXT: vrepf %v0, %v0, 1
+; Z16-NEXT: meebr %f0, %f2
+; Z16-NEXT: vmrhf %v0, %v3, %v0
+; Z16-NEXT: vmrhg %v0, %v0, %v1
+; Z16-NEXT: vst %v0, 0(%r2), 3
+; Z16-NEXT: br %r14
+ %val = load <4 x float>, <4 x float> * %out
+ %canonicalized = call <4 x float> @llvm.canonicalize.v4f32(<4 x float> %val)
+ store <4 x float> %canonicalized, <4 x float> * %out
+ ret void
+}
+
+define void @canonicalize_ptr_v4f64(<4 x double> * %out) {
+; Z16-LABEL: canonicalize_ptr_v4f64:
+; Z16: # %bb.0:
+; Z16-NEXT: vl %v1, 16(%r2), 4
+; Z16-NEXT: vgmg %v2, 2, 11
+; Z16-NEXT: wfmdb %f3, %f1, %f2
+; Z16-NEXT: vrepg %v1, %v1, 1
+; Z16-NEXT: mdbr %f1, %f2
+; Z16-NEXT: vl %v0, 0(%r2), 4
+; Z16-NEXT: vmrhg %v1, %v3, %v1
+; Z16-NEXT: wfmdb %f3, %f0, %f2
+; Z16-NEXT: vrepg %v0, %v0, 1
+; Z16-NEXT: mdbr %f0, %f2
+; Z16-NEXT: vmrhg %v0, %v3, %v0
+; Z16-NEXT: vst %v0, 0(%r2), 4
+; Z16-NEXT: vst %v1, 16(%r2), 4
+; Z16-NEXT: br %r14
+ %val = load <4 x double>, <4 x double> * %out
+ %canonicalized = call <4 x double> @llvm.canonicalize.v4f64(<4 x double> %val)
+ store <4 x double> %canonicalized, <4 x double> * %out
+ ret void
+}
+
+define void @canonicalize_undef(double * %out) {
+; Z16-LABEL: canonicalize_undef:
+; Z16: # %bb.0:
+; Z16-NEXT: llihh %r0, 32760
+; Z16-NEXT: stg %r0, 0(%r2)
+; Z16-NEXT: br %r14
+ %canonicalized = call double @llvm.canonicalize.f64(double undef)
+ store double %canonicalized, double * %out
+ ret void
+}
|
✅ With the latest revision this PR passed the undef deprecator. |
// This implements llvm.canonicalize.f* by multiplication with 1.0, | ||
// as suggested in https://llvm.org/docs/LangRef.html#id2335. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should probably elaborate on how this is hacking in strictfp operations in non-strict functions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From what i understand, the point of the strictfp operation here is to prevent any optimization that would, for instance, recognize that 1.0 * x
can be converted to just x
. So would this suffice as an addition to the comment?
// We use `STRICT_FMUL` here to prevent this operation from being optimized out.
Hi @dominik-steenken absolutely, Please go ahead with this, I have been wanting to get back and finish on the commitments as following PRs, this is much welcome. I will make some time to support canonicalization of denormals ( I have a PR just haven't been able to attend to it, if you need it sooner please let me know, I'll give you the patch and you may go ahead with it ) Thanks for your PR. |
Previously, this intrinsic has been implemented in the X86 backend. As suggested in the comments for the PR that merges that implementation (llvm#106370), this commit replicates that implementation (at least for `f32` and `f64`) in common code. Canonicalization is implemented as multiplication with 1.0.
f08a196
to
ce1fea8
Compare
ce1fea8
to
524580c
Compare
These tests include only the floating point types supported in common code. Once a SystemZ-specific implementation for `f16` and `f128` is available, these tests will be expanded to include those types as well. The tests are heavily based on the equivalent tests for X86.
…r f32,f64 Since the default OperationAction for `ISD::FCANONICALIZE` is now `Expand`, these targets need to explicitly set it to `Legal` in their backends in order to preserve their functionality.
524580c
to
07f5741
Compare
Ok, so i updated the comment as requested, got rid of the |
This patch is about the LLVM intrinsic, not the Clang builtin, so the title should say |
__builtin.canonicalize
in common codellvm.canonicalize
in common code
This PR takes the work previously done by @pawan-nirpal-031 on X86 in #106370, and makes it available in common code. This should enable all targets to use
__builtin_canonicalize
at least forf32
andf64
data types.Canonicalization is implemented here as multiplication by
1.0
, as suggested in the docs.Moving the X86 implementation to common code was suggested in the reviews of #106370 by @arsenm .
@pawan-nirpal-031 I hope you don't mind that i went ahead with this to get it working on SystemZ.
I don't have a lot of experience with this part of the code base yet, so reviews are very welcome.