-
Notifications
You must be signed in to change notification settings - Fork 13.5k
[AArch64][CodeGen] Fix crash when fptrunc returns fp16 with +nofp attr #81724
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-backend-aarch64 Author: Nashe Mncube (nasherm) ChangesWhen performing lowering of the fptrunc opcode returning fp16 with the +nofp flag enabled we could trigger a compiler crash. This is because we had no custom lowering implemented. This patch implements a custom lowering for the case in which we need to promote an fp16 return type for fptrunc when the +nofp attr is enabled. Full diff: https://github.com/llvm/llvm-project/pull/81724.diff 2 Files Affected:
diff --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
index a3b7e3128ac1a4..c58600ae386730 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
@@ -25008,6 +25008,15 @@ void AArch64TargetLowering::ReplaceNodeResults(
Results.push_back(
LowerToPredicatedOp(SDValue(N, 0), DAG, AArch64ISD::MULHU_PRED));
return;
+ case ISD::FP_ROUND:{
+ if (N->getValueType(0) == MVT::f16 && !Subtarget->hasFullFP16()) {
+ // Promote fp16 result to legal type
+ SDLoc DL(N);
+ EVT NVT = getTypeToTransformTo(*DAG.getContext(), N->getValueType(0));
+ Results.push_back(DAG.getNode(ISD::FP16_TO_FP, DL, NVT, N->getOperand(0)));
+ }
+ return;
+ }
case ISD::FP_TO_UINT:
case ISD::FP_TO_SINT:
case ISD::STRICT_FP_TO_SINT:
diff --git a/llvm/test/CodeGen/AArch64/float16-promotion-with-nofp.ll b/llvm/test/CodeGen/AArch64/float16-promotion-with-nofp.ll
new file mode 100644
index 00000000000000..03426579131a1d
--- /dev/null
+++ b/llvm/test/CodeGen/AArch64/float16-promotion-with-nofp.ll
@@ -0,0 +1,21 @@
+; RUN: llc -mcpu=cortex-r82 -O1 -o - %s | FileCheck %s
+
+; Source used:
+; __fp16 f2h(float a) { return a; }
+; Compiled with: clang --target=aarch64-arm-none-eabi -march=armv8-r+nofp
+
+define hidden noundef nofpclass(nan inf) half @f2h(float noundef nofpclass(nan inf) %a) local_unnamed_addr #0 {
+;CHECK: f2h: // @f2h
+;CHECK-NEXT: // %bb.0: // %entry
+;CHECK-NEXT: str x30, [sp, #-16]! // 8-byte Folded Spill
+;CHECK-NEXT: bl __gnu_h2f_ieee
+;CHECK-NEXT: ldr x30, [sp], #16 // 8-byte Folded Reload
+;CHECK-NEXT: ret
+entry:
+ %0 = fptrunc float %a to half
+ ret half %0
+}
+
+
+attributes #0 = { mustprogress nofree norecurse nosync nounwind willreturn memory(none) "denormal-fp-math"="preserve-sign,preserve-sign" "no-infs-fp-math"="true" "no-nans-fp-math"="true" "no-signed-zeros-fp-math"="true" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="generic" "target-features"="+crc,+lse,+pauth,+ras,+rcpc,+sb,+ssbs,+v8r,-complxnum,-dotprod,-fmv,-fp-armv8,-fp16fml,-fullfp16,-jsconv,-neon,-rdm" }
+
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
6e09677
to
d09478d
Compare
Investigating why new test is failing |
@@ -25008,6 +25008,16 @@ void AArch64TargetLowering::ReplaceNodeResults( | |||
Results.push_back( | |||
LowerToPredicatedOp(SDValue(N, 0), DAG, AArch64ISD::MULHU_PRED)); | |||
return; | |||
case ISD::FP_ROUND: { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like it is doing something that should be handled by the generic legalizer.
Would it be better to not mark the FP_RUND as custom instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I'm looking into not making this custom and offloading the work to the generic legalizer but it appears to cause another crash elsewhere. Currently debugging this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using a generic legalizer we run into the following:
Initial selection DAG: %bb.0 'f2h:entry'
SelectionDAG has 11 nodes:
t0: ch,glue = EntryToken
t2: i32,ch = CopyFromReg t0, Register:i32 %0
t3: f32 = bitcast t2
t5: f16 = fp_round t3, TargetConstant:i64<0>
t6: i16 = bitcast t5
t7: i32 = any_extend t6
t9: ch,glue = CopyToReg t0, Register:i32 $w0, t7
t10: ch = AArch64ISD::RET_GLUE t9, Register:i32 $w0, t9:1
Optimized lowered selection DAG: %bb.0 'f2h:entry'
SelectionDAG has 11 nodes:
t0: ch,glue = EntryToken
t2: i32,ch = CopyFromReg t0, Register:i32 %0
t3: f32 = bitcast t2
t5: f16 = fp_round t3, TargetConstant:i64<0>
t6: i16 = bitcast t5
t7: i32 = any_extend t6
t9: ch,glue = CopyToReg t0, Register:i32 $w0, t7
t10: ch = AArch64ISD::RET_GLUE t9, Register:i32 $w0, t9:1
SoftenFloatResult #0: t15: f32 = INSERT_SUBREG undef:i32, t5, TargetConstant:i32<7>
LLVM ERROR: Do not know how to soften the result of this operator!
After investigating a bit it appears that INSERT_SUBREG is an MIR instruction that is used to implement any_extend
. This is confusing to me as I wouldn't expect the DAG to try and legalize an MIR instruction. Am I right in thinking this is a bug?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is.. a very old bit of the code that probably would be better if it worked differently. But I'm not sure if it's easy to change.
I believe it comes from AArch64TargetLowering::LowerBITCAST? Could that function behave differently, or could the ISD::BITCAST be not marked Custom with no-fp?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My newest patch tries to just default to the generic legalizer when attempting to bitcast with the operand being an fp16. Do let me know your thoughts on this approach
This looks like it's on the right track. I think it should be based on hasFPARMv8, not hasFullFP16. If I change this to only be custom when we have fp16:
It runs into problems with BITCAST, so If I do the same thing there for the i16/f16/bf16 types then the test compiles, but produces multiple calls that might not be expected. I didn't look too deeply into why that was happening, and there is a chance it might cause problems in other places, but we don't have a lot of tests for "no-fp" codegen so far. |
eba9ee8
to
02032f3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. This looks OK to me with a little bit of cleanup so long as others dont disagree.
Make sure you update the strictfp_f16_abi_promote.ll test.
This effectively includes my PR #80576, which is ready to merge except for an unresolved question about where and how to document the change. If I can get some guidance on what documentation is wanted there, I can amend that, push it, so that when we then rebase this one, it shows fewer changes? |
Oh I thought I had seen a patch like that, but didn't manage to find it in a quick search - only the Arm parts and the generic part, both of which I think have been committed. Thanks for the info, we can try and get that one in. |
When performing lowering of the fptrunc opcode returning fp16 with the +nofp flag enabled we could trigger a compiler crash. This is because we had no custom lowering implemented. This patch implements a custom lowering for the case in which we need to promote an fp16 return type for fptrunc when the +nofp attr is enabled. Change-Id: Ibea20a676d40fde3f25e1ade365620071f46ff2b
Change-Id: Icf71578773b4c44fe8d79edd984661ec79fe1b09
Change-Id: Ic64506bb76bcc8f059b1de9ea6041fe8c0093b9c
Change-Id: Ib18e0ceeaab18d31d3fc43daab838ea95d62c2c1
Change-Id: I92af1dc9413486d2c20d90aebf0e377b077ad428
Change-Id: Ic1b7f8774ae529680597b4e032c4a63416ac927a
Change-Id: Icdb7c3742dc30da0652701ffc6c7468b8504e416
02032f3
to
46a77e3
Compare
Rebasing onto @hvdijk merged PR updates updates strictfp_f16_abi_promote.ll and reduces the size of this PR. Hopefully addressed all outstanding comments |
Thanks. This looks good to me now. One last thing, can you uncomment the @f16_return test from strictfp_f16_abi_promote.ll, so that we test the strict-fp case too? With that this LGTM if @ostannard doesn't disagree. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM too, assuming the strict-fp test case @davemgreen mentioned passes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks
f2530cb
to
53d140a
Compare
Change-Id: I11779e4a7e4494b45c1a166ebc46ef5887b38770
53d140a
to
f7df5b2
Compare
When performing lowering of the fptrunc opcode returning fp16 with the +nofp flag enabled we could trigger a compiler crash. This is because we had no custom lowering implemented. This patch implements a custom lowering for the case in which we need to promote an fp16 return type for fptrunc when the +nofp attr is enabled.