Infinite recursion in `__extendhfsf2` and `__truncsfhf2` on "no-f16-f128" platforms

I have encountered an infinite recursion when doing `f16` arithmetics on "no-f16-f128" platforms due to questionable LLVM optimizations.

My initial investigation shows this might be due to LLVM being too clever and optimizing the soft-float absolute value conversion to a hard-float one:

```llvm
define weak hidden noundef float @__extendhfsf2(half noundef %a) unnamed_addr #9 {
start:                                                                                                                                                                                       
%0 = tail call half @llvm.fabs.f16(half %a)                                                                                                                                                  
%_0.i7.i.i = bitcast half %0 to i16
%_0.i8.i.i = add nsw i16 %_0.i7.i.i, -1024
%_0.i5.i.i = icmp ult i16 %_0.i8.i.i, 30720
br i1 %_0.i5.i.i, label %bb8.i.i, label %bb14.i.i

; <snip...>
}
```

When lowering to the machine code, this causes the intrinsic function to call either itself or `__truncsfhf2`, and then `__truncsfhf2` will call `__extendhfsf2` again, forming an infinite recursion.

My current idea is to:

```diff
diff --git a/src/float/extend.rs b/src/float/extend.rs
index 5560489..10a0d61 100644
--- a/src/float/extend.rs
+++ b/src/float/extend.rs
@@ -32,7 +32,7 @@ where
 
     let sign_bits_delta = dst_sign_bits - src_sign_bits;
     let exp_bias_delta = dst_exp_bias - src_exp_bias;
-    let a_abs = a.repr() & src_abs_mask;
+    let a_abs = core::hint::black_box(a.repr()) & src_abs_mask;
     let mut abs_result = R::Int::ZERO;
 
     if a_abs.wrapping_sub(src_min_normal) < src_infinity.wrapping_sub(src_min_normal) {
```

... which will try to prevent LLVM from merging the absolute value masking into the `@llvm.fabs.f16` LLVM intrinsic.
However, this idea introduces two extra memory operations (storing `f16` and reading `i16`).

I did not open a pull request because I hope someone could come up with a much better idea.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Infinite recursion in `extendhfsf2` and `truncsfhf2` on "no-f16-f128" platforms #651

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Infinite recursion in __extendhfsf2 and __truncsfhf2 on "no-f16-f128" platforms #651

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Infinite recursion in `extendhfsf2` and `truncsfhf2` on "no-f16-f128" platforms #651