Skip to content

[Clang] Add __has_target_builtin macro #126324

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 10 commits into
base: main
Choose a base branch
from
Open

[Clang] Add __has_target_builtin macro #126324

wants to merge 10 commits into from

Conversation

sarnex
Copy link
Member

@sarnex sarnex commented Feb 7, 2025

As a follow-up to #121839, where we wanted to make __has_builtin return false for aux builtins, but that broke existing code.

Instead, introduce a new macro __has_target_builtin (name open to suggestions) that only considers builtins for the current target.

Copy link

github-actions bot commented Feb 7, 2025

⚠️ C/C++ code formatter, clang-format found issues in your code. ⚠️

You can test this locally with the following command:
git-clang-format --diff 196a1acc7d277d05d4b94ad7745c18bf13ea991f 2668404f9a46155a897030aca29e117f4319f9a5 --extensions cpp,h -- clang/test/Preprocessor/has_target_builtin.cpp clang/include/clang/Lex/Preprocessor.h clang/lib/Lex/PPMacroExpansion.cpp
View the diff from clang-format here.
diff --git a/clang/lib/Lex/PPMacroExpansion.cpp b/clang/lib/Lex/PPMacroExpansion.cpp
index 8be4d06c48..80ac767f16 100644
--- a/clang/lib/Lex/PPMacroExpansion.cpp
+++ b/clang/lib/Lex/PPMacroExpansion.cpp
@@ -1894,8 +1894,7 @@ void Preprocessor::ExpandBuiltinMacro(Token &Tok) {
 
         return false;
       });
-  } else if (II == Ident__has_cpp_attribute ||
-             II == Ident__has_c_attribute) {
+  } else if (II == Ident__has_cpp_attribute || II == Ident__has_c_attribute) {
     bool IsCXX = II == Ident__has_cpp_attribute;
     EvaluateFeatureLikeBuiltinMacro(OS, Tok, II, *this, true,
         [&](Token &Tok, bool &HasLexedNextToken) -> int {
@@ -1925,8 +1924,7 @@ void Preprocessor::ExpandBuiltinMacro(Token &Tok) {
                                    getLangOpts())
                     : 0;
         });
-  } else if (II == Ident__has_include ||
-             II == Ident__has_include_next) {
+  } else if (II == Ident__has_include || II == Ident__has_include_next) {
     // The argument to these two builtins should be a parenthesized
     // file name string literal using angle brackets (<>) or
     // double-quotes ("").

@sarnex sarnex marked this pull request as ready for review February 10, 2025 15:19
@llvmbot llvmbot added clang Clang issues not falling into any other category clang:frontend Language frontend issues, e.g. anything involving "Sema" labels Feb 10, 2025
@llvmbot
Copy link
Member

llvmbot commented Feb 10, 2025

@llvm/pr-subscribers-clang

Author: Nick Sarnie (sarnex)

Changes

As a follow-up to #121839, where we wanted to make __has_builtin return false for aux builtins, but that broke existing code.

Instead, introduce a new macro __has_target_builtin (name open to suggestions) that only considers builtins for the current target.


Full diff: https://github.com/llvm/llvm-project/pull/126324.diff

4 Files Affected:

  • (modified) clang/docs/LanguageExtensions.rst (+33)
  • (modified) clang/include/clang/Lex/Preprocessor.h (+1)
  • (modified) clang/lib/Lex/PPMacroExpansion.cpp (+58-52)
  • (added) clang/test/Preprocessor/has_target_builtin.cpp (+18)
diff --git a/clang/docs/LanguageExtensions.rst b/clang/docs/LanguageExtensions.rst
index 973cf8f9d091c30..057ad564f970bb4 100644
--- a/clang/docs/LanguageExtensions.rst
+++ b/clang/docs/LanguageExtensions.rst
@@ -67,6 +67,10 @@ It can be used like this:
   ``__has_builtin`` should not be used to detect support for a builtin macro;
   use ``#ifdef`` instead.
 
+  When using device offloading, a builtin is considered available if it is
+  available on either the host or the device targets.
+  Use ``__has_target_builtin`` to consider only the current target.
+
 ``__has_constexpr_builtin``
 ---------------------------
 
@@ -96,6 +100,35 @@ the ``<cmath>`` header file to conditionally make a function constexpr whenever
 the constant evaluation of the corresponding builtin (for example,
 ``std::fmax`` calls ``__builtin_fmax``) is supported in Clang.
 
+``__has_target_builtin``
+------------------------
+
+This function-like macro takes a single identifier argument that is the name of
+a builtin function, a builtin pseudo-function (taking one or more type
+arguments), or a builtin template.
+It evaluates to 1 if the builtin is supported on the current target or 0 if not.
+The behavior is different than ``__has_builtin`` when there is an auxiliary target,
+such when offloading to a target device.
+It can be used like this:
+
+.. code-block:: c++
+
+  #ifndef __has_target_builtin         // Optional of course.
+    #define __has_target_builtin(x) 0  // Compatibility with non-clang compilers.
+  #endif
+
+  ...
+  #if __has_target_builtin(__builtin_trap)
+    __builtin_trap();
+  #else
+    abort();
+  #endif
+  ...
+
+.. note::
+  ``__has_target_builtin`` should not be used to detect support for a builtin macro;
+  use ``#ifdef`` instead.
+
 .. _langext-__has_feature-__has_extension:
 
 ``__has_feature`` and ``__has_extension``
diff --git a/clang/include/clang/Lex/Preprocessor.h b/clang/include/clang/Lex/Preprocessor.h
index 2bf4d1a16699430..240fe28aba93e33 100644
--- a/clang/include/clang/Lex/Preprocessor.h
+++ b/clang/include/clang/Lex/Preprocessor.h
@@ -174,6 +174,7 @@ class Preprocessor {
   IdentifierInfo *Ident__has_extension;            // __has_extension
   IdentifierInfo *Ident__has_builtin;              // __has_builtin
   IdentifierInfo *Ident__has_constexpr_builtin;    // __has_constexpr_builtin
+  IdentifierInfo *Ident__has_target_builtin;       // __has_target_builtin
   IdentifierInfo *Ident__has_attribute;            // __has_attribute
   IdentifierInfo *Ident__has_embed;                // __has_embed
   IdentifierInfo *Ident__has_include;              // __has_include
diff --git a/clang/lib/Lex/PPMacroExpansion.cpp b/clang/lib/Lex/PPMacroExpansion.cpp
index 347c13da0ad215a..23a693b105fca3a 100644
--- a/clang/lib/Lex/PPMacroExpansion.cpp
+++ b/clang/lib/Lex/PPMacroExpansion.cpp
@@ -357,6 +357,7 @@ void Preprocessor::RegisterBuiltinMacros() {
   Ident__has_builtin = RegisterBuiltinMacro("__has_builtin");
   Ident__has_constexpr_builtin =
       RegisterBuiltinMacro("__has_constexpr_builtin");
+  Ident__has_target_builtin = RegisterBuiltinMacro("__has_target_builtin");
   Ident__has_attribute = RegisterBuiltinMacro("__has_attribute");
   if (!getLangOpts().CPlusPlus)
     Ident__has_c_attribute = RegisterBuiltinMacro("__has_c_attribute");
@@ -1797,55 +1798,62 @@ void Preprocessor::ExpandBuiltinMacro(Token &Tok) {
                                            diag::err_feature_check_malformed);
         return II && HasExtension(*this, II->getName());
       });
-  } else if (II == Ident__has_builtin) {
-    EvaluateFeatureLikeBuiltinMacro(OS, Tok, II, *this, false,
-      [this](Token &Tok, bool &HasLexedNextToken) -> int {
-        IdentifierInfo *II = ExpectFeatureIdentifierInfo(Tok, *this,
-                                           diag::err_feature_check_malformed);
-        if (!II)
-          return false;
-        else if (II->getBuiltinID() != 0) {
-          switch (II->getBuiltinID()) {
-          case Builtin::BI__builtin_cpu_is:
-            return getTargetInfo().supportsCpuIs();
-          case Builtin::BI__builtin_cpu_init:
-            return getTargetInfo().supportsCpuInit();
-          case Builtin::BI__builtin_cpu_supports:
-            return getTargetInfo().supportsCpuSupports();
-          case Builtin::BI__builtin_operator_new:
-          case Builtin::BI__builtin_operator_delete:
-            // denotes date of behavior change to support calling arbitrary
-            // usual allocation and deallocation functions. Required by libc++
-            return 201802;
-          default:
-            return Builtin::evaluateRequiredTargetFeatures(
-                getBuiltinInfo().getRequiredFeatures(II->getBuiltinID()),
-                getTargetInfo().getTargetOpts().FeatureMap);
+  } else if (II == Ident__has_builtin || II == Ident__has_target_builtin) {
+    bool IsHasTargetBuiltin = II == Ident__has_target_builtin;
+    EvaluateFeatureLikeBuiltinMacro(
+        OS, Tok, II, *this, false,
+        [this, IsHasTargetBuiltin](Token &Tok, bool &HasLexedNextToken) -> int {
+          IdentifierInfo *II = ExpectFeatureIdentifierInfo(
+              Tok, *this, diag::err_feature_check_malformed);
+          if (!II)
+            return false;
+          auto BuiltinID = II->getBuiltinID();
+          if (BuiltinID != 0) {
+            switch (BuiltinID) {
+            case Builtin::BI__builtin_cpu_is:
+              return getTargetInfo().supportsCpuIs();
+            case Builtin::BI__builtin_cpu_init:
+              return getTargetInfo().supportsCpuInit();
+            case Builtin::BI__builtin_cpu_supports:
+              return getTargetInfo().supportsCpuSupports();
+            case Builtin::BI__builtin_operator_new:
+            case Builtin::BI__builtin_operator_delete:
+              // denotes date of behavior change to support calling arbitrary
+              // usual allocation and deallocation functions. Required by libc++
+              return 201802;
+            default:
+              // __has_target_builtin should return false for aux builtins.
+              if (IsHasTargetBuiltin &&
+                  getBuiltinInfo().isAuxBuiltinID(BuiltinID))
+                return false;
+              return Builtin::evaluateRequiredTargetFeatures(
+                  getBuiltinInfo().getRequiredFeatures(BuiltinID),
+                  getTargetInfo().getTargetOpts().FeatureMap);
+            }
+            return true;
+          } else if (IsBuiltinTrait(Tok)) {
+            return true;
+          } else if (II->getTokenID() != tok::identifier &&
+                     II->getName().starts_with("__builtin_")) {
+            return true;
+          } else {
+            return llvm::StringSwitch<bool>(II->getName())
+                // Report builtin templates as being builtins.
+                .Case("__make_integer_seq", getLangOpts().CPlusPlus)
+                .Case("__type_pack_element", getLangOpts().CPlusPlus)
+                .Case("__builtin_common_type", getLangOpts().CPlusPlus)
+                // Likewise for some builtin preprocessor macros.
+                // FIXME: This is inconsistent; we usually suggest detecting
+                // builtin macros via #ifdef. Don't add more cases here.
+                .Case("__is_target_arch", true)
+                .Case("__is_target_vendor", true)
+                .Case("__is_target_os", true)
+                .Case("__is_target_environment", true)
+                .Case("__is_target_variant_os", true)
+                .Case("__is_target_variant_environment", true)
+                .Default(false);
           }
-          return true;
-        } else if (IsBuiltinTrait(Tok)) {
-          return true;
-        } else if (II->getTokenID() != tok::identifier &&
-                   II->getName().starts_with("__builtin_")) {
-          return true;
-        } else {
-          return llvm::StringSwitch<bool>(II->getName())
-              // Report builtin templates as being builtins.
-              .Case("__make_integer_seq", getLangOpts().CPlusPlus)
-              .Case("__type_pack_element", getLangOpts().CPlusPlus)
-              .Case("__builtin_common_type", getLangOpts().CPlusPlus)
-              // Likewise for some builtin preprocessor macros.
-              // FIXME: This is inconsistent; we usually suggest detecting
-              // builtin macros via #ifdef. Don't add more cases here.
-              .Case("__is_target_arch", true)
-              .Case("__is_target_vendor", true)
-              .Case("__is_target_os", true)
-              .Case("__is_target_environment", true)
-              .Case("__is_target_variant_os", true)
-              .Case("__is_target_variant_environment", true)
-              .Default(false);
-        }
-      });
+        });
   } else if (II == Ident__has_constexpr_builtin) {
     EvaluateFeatureLikeBuiltinMacro(
         OS, Tok, II, *this, false,
@@ -1886,8 +1894,7 @@ void Preprocessor::ExpandBuiltinMacro(Token &Tok) {
 
         return false;
       });
-  } else if (II == Ident__has_cpp_attribute ||
-             II == Ident__has_c_attribute) {
+  } else if (II == Ident__has_cpp_attribute || II == Ident__has_c_attribute) {
     bool IsCXX = II == Ident__has_cpp_attribute;
     EvaluateFeatureLikeBuiltinMacro(OS, Tok, II, *this, true,
         [&](Token &Tok, bool &HasLexedNextToken) -> int {
@@ -1917,8 +1924,7 @@ void Preprocessor::ExpandBuiltinMacro(Token &Tok) {
                                    getLangOpts())
                     : 0;
         });
-  } else if (II == Ident__has_include ||
-             II == Ident__has_include_next) {
+  } else if (II == Ident__has_include || II == Ident__has_include_next) {
     // The argument to these two builtins should be a parenthesized
     // file name string literal using angle brackets (<>) or
     // double-quotes ("").
diff --git a/clang/test/Preprocessor/has_target_builtin.cpp b/clang/test/Preprocessor/has_target_builtin.cpp
new file mode 100644
index 000000000000000..64b2d7e1b35d9ef
--- /dev/null
+++ b/clang/test/Preprocessor/has_target_builtin.cpp
@@ -0,0 +1,18 @@
+// RUN: %clang_cc1 -fopenmp -triple=spirv64 -fopenmp-is-target-device \
+// RUN: -aux-triple x86_64-linux-unknown -E %s | FileCheck -implicit-check-not=BAD %s
+
+// RUN: %clang_cc1 -fopenmp -triple=nvptx64 -fopenmp-is-target-device \
+// RUN: -aux-triple x86_64-linux-unknown -E %s | FileCheck -implicit-check-not=BAD %s
+
+// RUN: %clang_cc1 -fopenmp -triple=amdgcn-amd-amdhsa -fopenmp-is-target-device \
+// RUN: -aux-triple x86_64-linux-unknown -E %s | FileCheck -implicit-check-not=BAD %s
+
+// RUN: %clang_cc1 -fopenmp -triple=aarch64 -fopenmp-is-target-device \
+// RUN: -aux-triple x86_64-linux-unknown -E %s | FileCheck -implicit-check-not=BAD %s
+
+// CHECK: GOOD
+#if __has_target_builtin(__builtin_ia32_pause)
+  BAD
+#else
+  GOOD
+#endif

@sarnex sarnex requested review from AlexVlx and Artem-B February 10, 2025 15:20
@@ -357,6 +357,7 @@ void Preprocessor::RegisterBuiltinMacros() {
Ident__has_builtin = RegisterBuiltinMacro("__has_builtin");
Ident__has_constexpr_builtin =
RegisterBuiltinMacro("__has_constexpr_builtin");
Ident__has_target_builtin = RegisterBuiltinMacro("__has_target_builtin");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we may want to define this macro for offloading languages only. The reason is that non-offloading languages do not need this macro but if they start to use this macro then it will break again in offloading languages like __has_builtin did.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the usage would commonly be

#if defined(__has_target_builtin) && __has_target_builtin(foo)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My fear is that some C++ library headers start to use this macro __has_target_builtin in place of __has_builtin, and we cannot modify such headers.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, will do this. I can't find a good way to detect offloading languages in general here, so I'm just going to check for CUDA/HIP/SYCLDevice/OpenMPDevice, let me know if there's some common logic I can rely on that I missed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done in latest commit

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My fear is that some C++ library headers start to use this macro __has_target_builtin in place of __has_builtin, and we cannot modify such headers.

IMO, now that we do document semantics of __has_target_builtin(), its misuse on the library side will be their problem to fix. The problem with __has_builtin() was that it was never intended to handle heterogeneous compilation, and that's what created the issue when CUDA/HIP made builtins from both host and device visible to the compiler, but not all of them codegen-able. __has_target_builtin() clearly states what to expect. Sure, it's possible to misuse it, but having it available unconditionally will make it much less cumbersome to use in the headers shared between CUDA and C++, and that's a fairly common use case.

I'd prefer to have __has_target_builtin() generally available.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, __has_target_builtin() is probably identical to __has_builtin on non-offloading related things. It's up to them if they keep it portable.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

made it unconditional again in the latest commit

i kept the code example in langref because probably we still want to recommend only using this for offloading targets, even though it will work on non-offloading targets. let met know if you disagree.

@yxsamliu
Copy link
Collaborator

need release note

sarnex added a commit that referenced this pull request Feb 10, 2025
Feedback from #126324

Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>
Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>
github-actions bot pushed a commit to arm/arm-toolchain that referenced this pull request Feb 10, 2025
…26571)

Feedback from llvm/llvm-project#126324

Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>
Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>
``__has_target_builtin`` should not be used to detect support for a builtin macro;
use ``#ifdef`` instead.

``__has_target_built`` is only defined for offloading targets.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
``__has_target_built`` is only defined for offloading targets.
``__has_target_builtin`` is only defined for offloading targets.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wow thats embarrassing, thanks

arguments), or a builtin template.
It evaluates to 1 if the builtin is supported on the current target or 0 if not.
The behavior is different than ``__has_builtin`` when there is an auxiliary target,
such when offloading to a target device.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
such when offloading to a target device.
such as when offloading to a target device.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cant english today, thanks

Comment on lines 117 to 119
#ifndef __has_target_builtin // Optional of course.
#define __has_target_builtin(x) 0 // Compatibility with non-clang compilers.
#endif
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be more helpful to do something like ifdef CUDA ... else __has_builtin.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hopefully the latest commit has the use case youre looking for

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we make it available to C++, we'd better document the following invalid usage which originally leads to this extension:

 #if !__has_target_builtin(__wfi) 
 static __inline__ void __attribute__((__always_inline__, __nodebug__)) __wfi(void) { 
   __builtin_arm_wfi(); 
 } 
 #endif 

we should emphasize that a C++ header may be used by offloading languages, and in offloading language, the same source is compiled for host and device target separately. A builtin not available for the current target does not justify defining the builtin for both host and device targets. In this case, better to use __has_builtin(__wfi) since it makes sure the condition is true for both hosts and device targets so that the code won't break when used in offloading languages.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have somewhat conflicting requirements:

  • On C++ side, writers do not care about offloading (and we can't force them to). They only have __has_builtin() and it does what they need -- if the given builtin exists it will be compileable.
  • On offloading side, we want C++ headers to work out of the box for the host side. Ideally with the host and device compilations seeing the same code after preprocessing, and that's where we get into this problem. We can't tell whether the original C++ code needs __has_builtin() (works well enough for most uses inside of host function bodies) or if it needs __has_target_builtin() (e.g. when it's used inside a lambda or constexpr function which is implicitly HD, and we do need to generate code for it).

I'm not sure we can find a universal solution. That said, __has_target_builtin() gives us some flexibility on the offloading side. C++ side should stick with __has_builtin(). __has_target_builtin() should only be used when offloading comes into the picture, but it includes the possibility that it will be used in the headers shared with C++ and therefore the builtin itself should be available there.

Comment on lines 107 to 112
This function-like macro takes a single identifier argument that is the name of
a builtin function, a builtin pseudo-function (taking one or more type
arguments), or a builtin template.
It evaluates to 1 if the builtin is supported on the current target or 0 if not.
The behavior is different than ``__has_builtin`` when there is an auxiliary target,
such when offloading to a target device.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rephrase it to be more specific in terms of what the difference is rather than when it occurs.

__has_builtin() and __has_target_builtin() behave identically for normal C++ compilations.
For heterogeneous compilations that see source code intended for more than one target

  • __has_builtin() returns true if the builtin is known to the compiler (i.e. it's available via one of the targets), but makes no promises whether it's available on the current target. We can parse it, but not necessarily codegen it.
  • __has_target_builtin() returns true if the builtin can actually be codegen'ed for the current target.

__has_target_builtin() is, effectively, functional superset of CUDA's __CUDA_ARCH__ -- it allows distinguishing both host and target architectures. It has to be treated with similar caution so it does not break consistency of the TU source code seen by the compiler across sub-compilations.

Copy link
Member Author

@sarnex sarnex Feb 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, i like the way you worded it so i'll use most of this verbatim

Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>
Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>
Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>
Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>
Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>
Copy link
Collaborator

@AaronBallman AaronBallman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only some tiny nits from me

IdentifierInfo *II = ExpectFeatureIdentifierInfo(
Tok, *this, diag::err_feature_check_malformed);
if (!II)
return false;
else if (II->getBuiltinID() != 0) {
switch (II->getBuiltinID()) {
auto BuiltinID = II->getBuiltinID();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please spell out the type explicitly.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done in latest commit, thx

// CHECK-NOTOFFLOAD: DOESNT
#ifdef __has_target_builtin
HAS
#if __has_target_builtin(__builtin_ia32_pause)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also add test coverage for when the target does have the builtin?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done in latest commit, thx


// CHECK-NOTOFFLOAD: DOESNT
#ifdef __has_target_builtin
HAS
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should probably check for HAS explicitly.

Copy link
Member Author

@sarnex sarnex Feb 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the macro is unconditionally available in the latest commit so i removed the checking for the macro being defined

Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>
@sarnex sarnex requested a review from jhuber6 February 11, 2025 17:30
@jhuber6
Copy link
Contributor

jhuber6 commented Feb 13, 2025

CC @jdoerfert (we don't have a CUDA maintainer, so I'm not certain who else to tag)

Closest is @Artem-B, there's also a few NVIDIA people floating around like @gonzalobg.

@sarnex
Copy link
Member Author

sarnex commented Feb 13, 2025

GCC has __has_builtin, so how do they handle offloading targets? Do they have the same odd behavior where __has_builtin returns true for builtins it cannot actually emit code for?

Setting up offloading for GCC was a nightmare, but it looks like the behavior is the same as clang (available builtins are the union of host + offload).

I tried this example:

int main() {
  int x = 0;
  #pragma omp target
  {
   for(int i = 0; i < 100; i++) {
#if __has_builtin(__builtin_ia32_pause)
__builtin_ia32_pause();
#endif
     x+= i;
   }
  } 
  return x;
}

with flags:

g++-14 -fopenmp -foffload=amdgcn-amdhsa="-march=gfx1103" foo.cpp -fcf-protection=none 

and i got

lto1: fatal error: target specific builtin not available
compilation terminated.
gcn mkoffload: fatal error: x86_64-linux-gnu-accel-amdgcn-amdhsa-gcc-14 returned 1 exit status
compilation terminated.
lto-wrapper: fatal error: /usr/libexec/gcc/x86_64-linux-gnu/14//accel/amdgcn-amdhsa/mkoffload returned 1 exit status
compilation terminated.
/usr/bin/ld: error: lto-wrapper failed
collect2: error: ld returned 1 exit status

@AaronBallman
Copy link
Collaborator

GCC has __has_builtin, so how do they handle offloading targets? Do they have the same odd behavior where __has_builtin returns true for builtins it cannot actually emit code for?

Setting up offloading for GCC was a nightmare, but it looks like the behavior is the same as clang (available builtins are the union of host + offload).

I tried this example:

int main() {
  int x = 0;
  #pragma omp target
  {
   for(int i = 0; i < 100; i++) {
#if __has_builtin(__builtin_ia32_pause)
__builtin_ia32_pause();
#endif
     x+= i;
   }
  } 
  return x;
}

with flags:

g++-14 -fopenmp -foffload=amdgcn-amdhsa="-march=gfx1103" foo.cpp -fcf-protection=none 

and i got

lto1: fatal error: target specific builtin not available
compilation terminated.
gcn mkoffload: fatal error: x86_64-linux-gnu-accel-amdgcn-amdhsa-gcc-14 returned 1 exit status
compilation terminated.
lto-wrapper: fatal error: /usr/libexec/gcc/x86_64-linux-gnu/14//accel/amdgcn-amdhsa/mkoffload returned 1 exit status
compilation terminated.
/usr/bin/ld: error: lto-wrapper failed
collect2: error: ld returned 1 exit status

Ouch! Thank you for doing that investigation!

@pinskia -- do you happen to know if this is intentional? GCC's documentation says "recognizes" which could mean this is intended, but I wasn't 100% sure. Do you think GCC has an appetite for changing the behavior or perhaps deprecating the interface and adding __has_target_builtin? Or are there reasons for this behavior that we're not aware of?

@pinskia
Copy link

pinskia commented Feb 13, 2025

What is the definition of current target? Is it the current subtarget? Or is it returning true even when NOT using #pragma GCC target (or the target attribute) even if the builtin is not work for the subtarget?
This is unrelated to offloading but is similar issue.
With respect to offloading, the same question comes into play for the offload target too.

I am not defintely the wrong person to ask about offloading and __has_{*}builtin as I try to stay away from the offloading stuff.

@pinskia
Copy link

pinskia commented Feb 13, 2025

Maybe the problem with the name I am having is the word has. Maybe __can_use_builtin seems like a better name. And that seems like would be a good context sensative clue.

@AaronBallman
Copy link
Collaborator

Maybe the problem with the name I am having is the word has. Maybe __can_use_builtin seems like a better name. And that seems like would be a good context sensative clue.

That's actually a really good idea, thank you! But I think users will still have the very reasonable question of why there's a __has_builtin that's different from __can_use_builtin (or whatever we name it), and I don't know that any of us have an answer for that, which is a bit worrying to me. It'd be nice if we were able to either have one builtin that behaves how users would expect, or if we had a compelling example we could document that helps users to understand when to use one feature test macro vs the other.

@pinskia
Copy link

pinskia commented Feb 14, 2025

I submitted https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118882 to ask the other GCC folks about what they think is a good idea because I have no other thoughts on this; I just think something needs to be done.

@Artem-B
Copy link
Member

Artem-B commented Feb 14, 2025

why there's a __has_builtin that's different from __can_use_builtin (or whatever we name it), and I don't know that any of us have an answer for that

my $.02
IMO it's a side effect of heterogeneous compilation, where compiler has to parse source code for multiple targets (and thus has to see each target's builtins), but can generate code only for the target we're currently compiling for. __can_use_builtin will give us a way to distinguish the two. It's not a perfect tool, because usability of a builtin depends on the context, but it's better than nothing.

For the classic compilation, __has_builtin is unambiguous.

joaosaffran pushed a commit to joaosaffran/llvm-project that referenced this pull request Feb 14, 2025
Feedback from llvm#126324

Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>
@jhuber6
Copy link
Contributor

jhuber6 commented Feb 14, 2025

why there's a __has_builtin that's different from __can_use_builtin (or whatever we name it), and I don't know that any of us have an answer for that

my $.02 IMO it's a side effect of heterogeneous compilation, where compiler has to parse source code for multiple targets (and thus has to see each target's builtins), but can generate code only for the target we're currently compiling for. __can_use_builtin will give us a way to distinguish the two. It's not a perfect tool, because usability of a builtin depends on the context, but it's better than nothing.

For the classic compilation, __has_builtin is unambiguous.

Sometimes I wonder if we should have a single way to detect if we're doing any kind of offloading so we can guard stuff like this.

@@ -1819,8 +1822,12 @@ void Preprocessor::ExpandBuiltinMacro(Token &Tok) {
// usual allocation and deallocation functions. Required by libc++
return 201802;
default:
// __has_target_builtin should return false for aux builtins.
if (IsHasTargetBuiltin &&
getBuiltinInfo().isAuxBuiltinID(BuiltinID))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic doesn't seem right to me. What happens if this builtin is supported on BOTH the host and device here? Shouldn't we still return 'true'?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. The function name is isAuxBuiltinID but the implementation of it makes it so that the behavior is like isOnlyAuxBuiltinID. The implementation of the function is:

 /// Return true if builtin ID belongs to AuxTarget.
 bool isAuxBuiltinID(unsigned ID) const {  
   return ID >= (Builtin::FirstTSBuiltin + NumTargetBuiltins); 
  } 

When we register builtins, we do it like this:

TargetShards = Target.getTargetBuiltins();
  for (const auto &Shard : TargetShards)
    NumTargetBuiltins += Shard.Infos.size();
  if (AuxTarget) {
    AuxTargetShards = AuxTarget->getTargetBuiltins();
    for (const auto &Shard : AuxTargetShards)
      NumAuxTargetBuiltins += Shard.Infos.size();
  }
}

So we register all the target builtins before the aux target builtins, and we only consider something an aux builtin if it was registered specifically as an aux builtin, so I think the logic does what we want.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah! Neat! Can you document that somewhere on this? Just extending the comment here will be really helpful to the next person along.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, opened PR here and added you as a reviewer.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah! Thanks! I meant here as well though, it was REALLY jarring to me and I suspect next one here will ahve the same problem.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah sure, will update this pr as well

@sarnex
Copy link
Member Author

sarnex commented Feb 20, 2025

@AaronBallman What do you recommend for next steps here? Should we wait until GCC makes a decision in this issue?

@jhuber6
Copy link
Contributor

jhuber6 commented Feb 20, 2025

I'd vote for fixing the CUDA on Arm case that failed in the meantime, then make a decision as to whether or not we should go back to __has_builtin only returning the current compilation target once that's gone.

Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>
sarnex added a commit that referenced this pull request Feb 20, 2025
Clarify behavior of the function when the builtin is also supported on
the main target.
Based on feedback from #126324

---------

Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Feb 20, 2025
Clarify behavior of the function when the builtin is also supported on
the main target.
Based on feedback from llvm/llvm-project#126324

---------

Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>
@AaronBallman
Copy link
Collaborator

@AaronBallman What do you recommend for next steps here? Should we wait until GCC makes a decision in this issue?

I don't think we need to wait for GCC to make a decision, but we can wait for a bit for them to consider the issue, and should make sure we coordinate closely with GCC if we make a decision on our own.

I'd vote for fixing the CUDA on Arm case that failed in the meantime, then make a decision as to whether or not we should go back to __has_builtin only returning the current compilation target once that's gone.

+1, if we can get away with it.

As for next steps, I think we need a broader community discussion on this, so I would recommend an RFC proposing an approach. I don't know whether that's changing the behavior of __has_builtin, proposing __can_use_builtin and deprecating __has_builtin, or something else; I don't have enough expertise in offloading to feel like I should set the direction in that way. But I think the most conservative approach would be to introduce __can_use_builtin and deprecate __has_builtin as being a confused interface.

@jhuber6
Copy link
Contributor

jhuber6 commented Feb 21, 2025

As for next steps, I think we need a broader community discussion on this, so I would recommend an RFC proposing an approach. I don't know whether that's changing the behavior of __has_builtin, proposing __can_use_builtin and deprecating __has_builtin, or something else; I don't have enough expertise in offloading to feel like I should set the direction in that way. But I think the most conservative approach would be to introduce __can_use_builtin and deprecate __has_builtin as being a confused interface.

I'm not a fan of deprecating __has_builtin because it's been around for quite awhile and it perfectly fine at doing what it says in the standardized languages. Offloading languages are kind of in this 'whatever works' area, so I don't think we should change behavior too much around their edge cases.

@Artem-B
Copy link
Member

Artem-B commented Feb 21, 2025

I'd vote for fixing the CUDA on Arm case that failed in the meantime, then make a decision as to whether or not we should go back to __has_builtin only returning the current compilation target once that's gone.

+1, if we can get away with it.

+1 to that.

@sarnex
Copy link
Member Author

sarnex commented Feb 21, 2025

I'll try to fix the CUDA Arm case and then draft an RFC. This turned out to be quite the change :P

@sarnex
Copy link
Member Author

sarnex commented Feb 21, 2025

PR for ARM CUDA #128222

sivan-shani pushed a commit to sivan-shani/llvm-project that referenced this pull request Feb 24, 2025
Feedback from llvm#126324

Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>
@sarnex
Copy link
Member Author

sarnex commented Mar 3, 2025

ARM PR looks kinda stalled, hopefully can write the RFC this week

@sarnex
Copy link
Member Author

sarnex commented Mar 4, 2025

@sarnex
Copy link
Member Author

sarnex commented Mar 7, 2025

So far RFC is going towards changing __has_builtin to respect the current target. If anyone has any comments please add them to the RFC here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:frontend Language frontend issues, e.g. anything involving "Sema" clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants