Description
openedon Aug 2, 2024
I have encountered an infinite recursion when doing f16
arithmetics on "no-f16-f128" platforms due to questionable LLVM optimizations.
My initial investigation shows this might be due to LLVM being too clever and optimizing the soft-float absolute value conversion to a hard-float one:
define weak hidden noundef float @__extendhfsf2(half noundef %a) unnamed_addr #9 {
start:
%0 = tail call half @llvm.fabs.f16(half %a)
%_0.i7.i.i = bitcast half %0 to i16
%_0.i8.i.i = add nsw i16 %_0.i7.i.i, -1024
%_0.i5.i.i = icmp ult i16 %_0.i8.i.i, 30720
br i1 %_0.i5.i.i, label %bb8.i.i, label %bb14.i.i
; <snip...>
}
When lowering to the machine code, this causes the intrinsic function to call either itself or __truncsfhf2
, and then __truncsfhf2
will call __extendhfsf2
again, forming an infinite recursion.
My current idea is to:
diff --git a/src/float/extend.rs b/src/float/extend.rs
index 5560489..10a0d61 100644
--- a/src/float/extend.rs
+++ b/src/float/extend.rs
@@ -32,7 +32,7 @@ where
let sign_bits_delta = dst_sign_bits - src_sign_bits;
let exp_bias_delta = dst_exp_bias - src_exp_bias;
- let a_abs = a.repr() & src_abs_mask;
+ let a_abs = core::hint::black_box(a.repr()) & src_abs_mask;
let mut abs_result = R::Int::ZERO;
if a_abs.wrapping_sub(src_min_normal) < src_infinity.wrapping_sub(src_min_normal) {
... which will try to prevent LLVM from merging the absolute value masking into the @llvm.fabs.f16
LLVM intrinsic.
However, this idea introduces two extra memory operations (storing f16
and reading i16
).
I did not open a pull request because I hope someone could come up with a much better idea.