-
Notifications
You must be signed in to change notification settings - Fork 11.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[libc][math][c23] Optimize fabsf16 on x86 with Clang #104869
base: main
Are you sure you want to change the base?
Conversation
Works around optimizations introduced in LLVM 17 and 18 that slow down `fputil::abs<float16>()` on x86.
@llvm/pr-subscribers-libc Author: OverMighty (overmighty) ChangesWorks around optimizations introduced in LLVM 17 and 18 that slow down Full diff: https://github.com/llvm/llvm-project/pull/104869.diff 2 Files Affected:
diff --git a/libc/src/math/generic/CMakeLists.txt b/libc/src/math/generic/CMakeLists.txt
index 350072f4b9649..7fa86f17269f2 100644
--- a/libc/src/math/generic/CMakeLists.txt
+++ b/libc/src/math/generic/CMakeLists.txt
@@ -563,10 +563,12 @@ add_entrypoint_object(
HDRS
../fabsf16.h
DEPENDS
- libc.src.__support.macros.properties.types
libc.src.__support.FPUtil.basic_operations
+ libc.src.__support.FPUtil.fp_bits
libc.src.__support.macros.properties.architectures
libc.src.__support.macros.properties.compiler
+ libc.src.__support.macros.properties.cpu_features
+ libc.src.__support.macros.properties.types
COMPILE_OPTIONS
-O3
FLAGS
diff --git a/libc/src/math/generic/fabsf16.cpp b/libc/src/math/generic/fabsf16.cpp
index 02e11330db718..a86aa0cb00a73 100644
--- a/libc/src/math/generic/fabsf16.cpp
+++ b/libc/src/math/generic/fabsf16.cpp
@@ -8,19 +8,30 @@
#include "src/math/fabsf16.h"
#include "src/__support/FPUtil/BasicOperations.h"
+#include "src/__support/FPUtil/FPBits.h"
#include "src/__support/common.h"
#include "src/__support/macros/config.h"
#include "src/__support/macros/properties/architectures.h"
#include "src/__support/macros/properties/compiler.h"
+#include "src/__support/macros/properties/cpu_features.h"
namespace LIBC_NAMESPACE_DECL {
LLVM_LIBC_FUNCTION(float16, fabsf16, (float16 x)) {
- // For x86, GCC generates better code from the generic implementation.
- // https://godbolt.org/z/K9orM4hTa
#if defined(__LIBC_MISC_MATH_BASIC_OPS_OPT) && \
- !(defined(LIBC_TARGET_ARCH_IS_X86) && defined(LIBC_COMPILER_IS_GCC))
+ defined(LIBC_TARGET_CPU_HAS_FAST_FLOAT16_OPS)
return __builtin_fabsf16(x);
+#elif defined(LIBC_TARGET_ARCH_IS_X86) && defined(LIBC_COMPILER_IS_CLANG)
+ // Prevent Clang from generating calls to slow soft-float conversion
+ // functions on x86. See https://godbolt.org/z/hvo6jbnGz.
+
+ using FPBits = fputil::FPBits<float16>;
+ using StorageType = typename FPBits::StorageType;
+
+ static constexpr volatile StorageType ABS_MASK = FPBits::EXP_SIG_MASK;
+
+ return FPBits(static_cast<StorageType>(FPBits(x).uintval() & ABS_MASK))
+ .get_val();
#else
return fputil::abs(x);
#endif
|
Minimum latencies: Before:
After:
If we enabled the same code path for other compilers/targets: Before:
After:
|
|
return __builtin_fabsf16(x); | ||
#elif defined(LIBC_TARGET_ARCH_IS_X86) && defined(LIBC_COMPILER_IS_CLANG) | ||
// Prevent Clang from generating calls to slow soft-float conversion | ||
// functions on x86. See https://godbolt.org/z/hvo6jbnGz. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just a codegen bug. it's wrong to introduce a canonicalizing extend when legalizing fabs. Don't workaround it here?
Works around optimizations introduced in LLVM 17 and 18 that slow down
fputil::abs<float16>()
on x86.