Skip to content

Conversation

wenju-he
Copy link
Contributor

@wenju-he wenju-he commented Aug 12, 2025

Enable -ffp-contract=fast-honor-pragmas globally improves performance.
Disable in functions that may have problem with the flag.

…unctions

According to OpenCL spec, native_* functions have implementation-defined
accuracy and typically have better performance. We can enable floating-
point contraction optimizations for them.
@wenju-he wenju-he requested a review from frasercrmck August 12, 2025 04:51
@llvmbot llvmbot added the libclc libclc OpenCL library label Aug 12, 2025
@wenju-he wenju-he requested a review from arsenm August 12, 2025 04:52
@arsenm
Copy link
Contributor

arsenm commented Aug 12, 2025

I think fp contract should be globally enabled in the build, and selectively disabled in the handful of places that it is problematic (namely specific blocks in expF, sinbF, and trig reductions)

@arsenm arsenm added the floating-point Floating-point math label Aug 12, 2025
@@ -304,7 +304,7 @@ set_source_files_properties(
${CMAKE_CURRENT_SOURCE_DIR}/opencl/lib/generic/math/native_sin.cl
${CMAKE_CURRENT_SOURCE_DIR}/opencl/lib/generic/math/native_sqrt.cl
${CMAKE_CURRENT_SOURCE_DIR}/opencl/lib/generic/math/native_tan.cl
PROPERTIES COMPILE_OPTIONS -fapprox-func
PROPERTIES COMPILE_OPTIONS "-fapprox-func;-ffp-contract=fast"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also maybe should use -ffp-contract=fast-honor-pragmas, not sure if the stupid interpretation ever got fixed for fast

…o exponential/trigonometric/hyperbolic funcs
@wenju-he wenju-he changed the title [libclc] Enable -ffp-contract=fast compile option for math native_* functions [libclc] Enable -ffp-contract=fast-honor-pragmas except for exp/trig/hyperbolic funcs Aug 12, 2025
@@ -6,6 +6,8 @@
//
//===----------------------------------------------------------------------===//

#pragma clang fp contract(off)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be much more targeted. The problematic areas can be specific block scopes inside of individual functions. I'd suggest running the conformance test with it enabled globally, and then finding the specific places that require this

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

e.g. in exp f32

-            float e = BUILTIN_RINT_F32(ph);
-            float a = ph - e + pl;
+            float a, e;
+            {
+                #pragma OPENCL FP_CONTRACT OFF
+                e = BUILTIN_RINT_F32(ph);
+                a = ph - e + pl;
+            }
+

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be much more targeted. The problematic areas can be specific block scopes inside of individual functions. I'd suggest running the conformance test with it enabled globally, and then finding the specific places that require this

thanks, I'll run opencl cts on intel gpu to find the places.

@wenju-he wenju-he marked this pull request as draft August 12, 2025 07:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
floating-point Floating-point math libclc libclc OpenCL library
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants