-
Notifications
You must be signed in to change notification settings - Fork 13.6k
[AMDGPU][clang][CodeGen][opt] Add late-resolved feature identifying predicates #134016
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@llvm/pr-subscribers-llvm-transforms @llvm/pr-subscribers-clang Author: Alex Voicu (AlexVlx) ChangesThis change adds two semi-magical builtins for AMDGPU:
Neither of these are The motivation for adding these is two-fold:
I've tried to keep the overall footprint of the change small. The changes to Sema are a bit unpleasant, but there was a strong desire to have Clang validate these, and to constrain their uses, and this was the most compact solution I could come up with (suggestions welcome). In the end, I will note there is nothing that is actually AMDGPU specific here, so it is possible that in the future, assuming interests from other targets / users, we'd just promote them to generic intrinsics. Patch is 59.55 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/134016.diff 17 Files Affected:
diff --git a/clang/docs/LanguageExtensions.rst b/clang/docs/LanguageExtensions.rst
index 3b8a9cac6587a..8a7cb75af13e5 100644
--- a/clang/docs/LanguageExtensions.rst
+++ b/clang/docs/LanguageExtensions.rst
@@ -4920,6 +4920,116 @@ If no address spaces names are provided, all address spaces are fenced.
__builtin_amdgcn_fence(__ATOMIC_SEQ_CST, "workgroup", "local")
__builtin_amdgcn_fence(__ATOMIC_SEQ_CST, "workgroup", "local", "global")
+__builtin_amdgcn_processor_is and __builtin_amdgcn_is_invocable
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+``__builtin_amdgcn_processor_is`` and ``__builtin_amdgcn_is_invocable`` provide
+a functional mechanism for programatically querying:
+
+* the identity of the current target processor;
+* the capability of the current target processor to invoke a particular builtin.
+
+**Syntax**:
+
+.. code-block:: c
+
+ // When used as the predicate for a control structure
+ bool __builtin_amdgcn_processor_is(const char*);
+ bool __builtin_amdgcn_is_invocable(builtin_name);
+ // Otherwise
+ void __builtin_amdgcn_processor_is(const char*);
+ void __builtin_amdgcn_is_invocable(void);
+
+**Example of use**:
+
+.. code-block:: c++
+
+ if (__builtin_amdgcn_processor_is("gfx1201") ||
+ __builtin_amdgcn_is_invocable(__builtin_amdgcn_s_sleep_var))
+ __builtin_amdgcn_s_sleep_var(x);
+
+ if (!__builtin_amdgcn_processor_is("gfx906"))
+ __builtin_amdgcn_s_wait_event_export_ready();
+ else if (__builtin_amdgcn_processor_is("gfx1010") ||
+ __builtin_amdgcn_processor_is("gfx1101"))
+ __builtin_amdgcn_s_ttracedata_imm(1);
+
+ while (__builtin_amdgcn_processor_is("gfx1101")) *p += x;
+
+ do { *p -= x; } while (__builtin_amdgcn_processor_is("gfx1010"));
+
+ for (; __builtin_amdgcn_processor_is("gfx1201"); ++*p) break;
+
+ if (__builtin_amdgcn_is_invocable(__builtin_amdgcn_s_wait_event_export_ready))
+ __builtin_amdgcn_s_wait_event_export_ready();
+ else if (__builtin_amdgcn_is_invocable(__builtin_amdgcn_s_ttracedata_imm))
+ __builtin_amdgcn_s_ttracedata_imm(1);
+
+ do {
+ *p -= x;
+ } while (__builtin_amdgcn_is_invocable(__builtin_amdgcn_global_load_tr_b64_i32));
+
+ for (; __builtin_amdgcn_is_invocable(__builtin_amdgcn_permlane64); ++*p) break;
+
+**Description**:
+
+When used as the predicate value of the following control structures:
+
+.. code-block:: c++
+
+ if (...)
+ while (...)
+ do { } while (...)
+ for (...)
+
+be it directly, or as arguments to logical operators such as ``!, ||, &&``, the
+builtins return a boolean value that:
+
+* indicates whether the current target matches the argument; the argument MUST
+ be a string literal and a valid AMDGPU target
+* indicates whether the builtin function passed as the argument can be invoked
+ by the current target; the argument MUST be either a generic or AMDGPU
+ specific builtin name
+
+Outside of these contexts, the builtins have a ``void`` returning signature
+which prevents their misuse.
+
+**Example of invalid use**:
+
+.. code-block:: c++
+
+ void kernel(int* p, int x, bool (*pfn)(bool), const char* str) {
+ if (__builtin_amdgcn_processor_is("not_an_amdgcn_gfx_id")) return;
+ else if (__builtin_amdgcn_processor_is(str)) __builtin_trap();
+
+ bool a = __builtin_amdgcn_processor_is("gfx906");
+ const bool b = !__builtin_amdgcn_processor_is("gfx906");
+ const bool c = !__builtin_amdgcn_processor_is("gfx906");
+ bool d = __builtin_amdgcn_is_invocable(__builtin_amdgcn_s_sleep_var);
+ bool e = !__builtin_amdgcn_is_invocable(__builtin_amdgcn_s_sleep_var);
+ const auto f =
+ !__builtin_amdgcn_is_invocable(__builtin_amdgcn_s_wait_event_export_ready)
+ || __builtin_amdgcn_is_invocable(__builtin_amdgcn_s_sleep_var);
+ const auto g =
+ !__builtin_amdgcn_is_invocable(__builtin_amdgcn_s_wait_event_export_ready)
+ || !__builtin_amdgcn_is_invocable(__builtin_amdgcn_s_sleep_var);
+ __builtin_amdgcn_processor_is("gfx1201")
+ ? __builtin_amdgcn_s_sleep_var(x) : __builtin_amdgcn_s_sleep(42);
+ if (pfn(__builtin_amdgcn_processor_is("gfx1200")))
+ __builtin_amdgcn_s_sleep_var(x);
+
+ if (__builtin_amdgcn_is_invocable("__builtin_amdgcn_s_sleep_var")) return;
+ else if (__builtin_amdgcn_is_invocable(x)) __builtin_trap();
+ }
+
+When invoked while compiling for a concrete target, the builtins are evaluated
+early by Clang, and never produce any CodeGen effects / have no observable
+side-effects in IR. Conversely, when compiling for AMDGCN flavoured SPIR-v,
+which is an abstract target, a series of predicate values are implicitly
+created. These predicates get resolved when finalizing the compilation process
+for a concrete target, and shall reflect the latter's identity and features.
+Thus, it is possible to author high-level code, in e.g. HIP, that is target
+adaptive in a dynamic fashion, contrary to macro based mechanisms.
ARM/AArch64 Language Extensions
-------------------------------
diff --git a/clang/include/clang/Basic/BuiltinsAMDGPU.def b/clang/include/clang/Basic/BuiltinsAMDGPU.def
index 44ef404aee72f..5d01a7e75f7e7 100644
--- a/clang/include/clang/Basic/BuiltinsAMDGPU.def
+++ b/clang/include/clang/Basic/BuiltinsAMDGPU.def
@@ -346,6 +346,11 @@ BUILTIN(__builtin_amdgcn_endpgm, "v", "nr")
BUILTIN(__builtin_amdgcn_get_fpenv, "WUi", "n")
BUILTIN(__builtin_amdgcn_set_fpenv, "vWUi", "n")
+// These are special FE only builtins intended for forwarding the requirements
+// to the ME.
+BUILTIN(__builtin_amdgcn_processor_is, "vcC*", "nctu")
+BUILTIN(__builtin_amdgcn_is_invocable, "v", "nctu")
+
//===----------------------------------------------------------------------===//
// R600-NI only builtins.
//===----------------------------------------------------------------------===//
diff --git a/clang/include/clang/Basic/DiagnosticSemaKinds.td b/clang/include/clang/Basic/DiagnosticSemaKinds.td
index 5e45482584946..45f0f9eb88e55 100644
--- a/clang/include/clang/Basic/DiagnosticSemaKinds.td
+++ b/clang/include/clang/Basic/DiagnosticSemaKinds.td
@@ -13054,4 +13054,14 @@ def err_acc_decl_for_routine
// AMDGCN builtins diagnostics
def err_amdgcn_global_load_lds_size_invalid_value : Error<"invalid size value">;
def note_amdgcn_global_load_lds_size_valid_value : Note<"size must be %select{1, 2, or 4|1, 2, 4, 12 or 16}0">;
+def err_amdgcn_processor_is_arg_not_literal
+ : Error<"the argument to __builtin_amdgcn_processor_is must be a string "
+ "literal">;
+def err_amdgcn_processor_is_arg_invalid_value
+ : Error<"the argument to __builtin_amdgcn_processor_is must be a valid "
+ "AMDGCN processor identifier; '%0' is not valid">;
+def err_amdgcn_is_invocable_arg_invalid_value
+ : Error<"the argument to __builtin_amdgcn_is_invocable must be either a "
+ "target agnostic builtin or an AMDGCN target specific builtin; `%0`"
+ " is not valid">;
} // end of sema component.
diff --git a/clang/lib/Basic/Targets/SPIR.cpp b/clang/lib/Basic/Targets/SPIR.cpp
index 5b5f47f9647a2..eb43d9b0be283 100644
--- a/clang/lib/Basic/Targets/SPIR.cpp
+++ b/clang/lib/Basic/Targets/SPIR.cpp
@@ -152,3 +152,7 @@ void SPIRV64AMDGCNTargetInfo::setAuxTarget(const TargetInfo *Aux) {
Float128Format = DoubleFormat;
}
}
+
+bool SPIRV64AMDGCNTargetInfo::isValidCPUName(StringRef CPU) const {
+ return AMDGPUTI.isValidCPUName(CPU);
+}
diff --git a/clang/lib/Basic/Targets/SPIR.h b/clang/lib/Basic/Targets/SPIR.h
index 78505d66d6f2f..7aa13cbeb89fd 100644
--- a/clang/lib/Basic/Targets/SPIR.h
+++ b/clang/lib/Basic/Targets/SPIR.h
@@ -432,6 +432,10 @@ class LLVM_LIBRARY_VISIBILITY SPIRV64AMDGCNTargetInfo final
}
bool hasInt128Type() const override { return TargetInfo::hasInt128Type(); }
+
+ // This is only needed for validating arguments passed to
+ // __builtin_amdgcn_processor_is
+ bool isValidCPUName(StringRef Name) const override;
};
} // namespace targets
diff --git a/clang/lib/CodeGen/TargetBuiltins/AMDGPU.cpp b/clang/lib/CodeGen/TargetBuiltins/AMDGPU.cpp
index b56b739094ff3..7b1a3815144b4 100644
--- a/clang/lib/CodeGen/TargetBuiltins/AMDGPU.cpp
+++ b/clang/lib/CodeGen/TargetBuiltins/AMDGPU.cpp
@@ -284,6 +284,18 @@ void CodeGenFunction::AddAMDGPUFenceAddressSpaceMMRA(llvm::Instruction *Inst,
Inst->setMetadata(LLVMContext::MD_mmra, MMRAMetadata::getMD(Ctx, MMRAs));
}
+static Value *GetOrInsertAMDGPUPredicate(CodeGenFunction &CGF, Twine Name) {
+ auto PTy = IntegerType::getInt1Ty(CGF.getLLVMContext());
+
+ auto P = cast<GlobalVariable>(
+ CGF.CGM.getModule().getOrInsertGlobal(Name.str(), PTy));
+ P->setConstant(true);
+ P->setExternallyInitialized(true);
+
+ return CGF.Builder.CreateLoad(RawAddress(P, PTy, CharUnits::One(),
+ KnownNonNull));
+}
+
Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned BuiltinID,
const CallExpr *E) {
llvm::AtomicOrdering AO = llvm::AtomicOrdering::SequentiallyConsistent;
@@ -585,6 +597,23 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned BuiltinID,
llvm::Value *Env = EmitScalarExpr(E->getArg(0));
return Builder.CreateCall(F, {Env});
}
+ case AMDGPU::BI__builtin_amdgcn_processor_is: {
+ assert(CGM.getTriple().isSPIRV() &&
+ "__builtin_amdgcn_processor_is should never reach CodeGen for "
+ "concrete targets!");
+ StringRef Proc = cast<clang::StringLiteral>(E->getArg(0))->getString();
+ return GetOrInsertAMDGPUPredicate(*this, "llvm.amdgcn.is." + Proc);
+ }
+ case AMDGPU::BI__builtin_amdgcn_is_invocable: {
+ assert(CGM.getTriple().isSPIRV() &&
+ "__builtin_amdgcn_is_invocable should never reach CodeGen for "
+ "concrete targets!");
+ auto FD = cast<FunctionDecl>(
+ cast<DeclRefExpr>(E->getArg(0))->getReferencedDeclOfCallee());
+ StringRef RF =
+ getContext().BuiltinInfo.getRequiredFeatures(FD->getBuiltinID());
+ return GetOrInsertAMDGPUPredicate(*this, "llvm.amdgcn.has." + RF);
+ }
case AMDGPU::BI__builtin_amdgcn_read_exec:
return EmitAMDGCNBallotForExec(*this, E, Int64Ty, Int64Ty, false);
case AMDGPU::BI__builtin_amdgcn_read_exec_lo:
diff --git a/clang/lib/Sema/SemaExpr.cpp b/clang/lib/Sema/SemaExpr.cpp
index 7cc8374e69d73..24f5262ab3cf4 100644
--- a/clang/lib/Sema/SemaExpr.cpp
+++ b/clang/lib/Sema/SemaExpr.cpp
@@ -6541,6 +6541,22 @@ ExprResult Sema::BuildCallExpr(Scope *Scope, Expr *Fn, SourceLocation LParenLoc,
if (Result.isInvalid()) return ExprError();
Fn = Result.get();
+ // The __builtin_amdgcn_is_invocable builtin is special, and will be resolved
+ // later, when we check boolean conditions, for now we merely forward it
+ // without any additional checking.
+ if (Fn->getType() == Context.BuiltinFnTy && ArgExprs.size() == 1 &&
+ ArgExprs[0]->getType() == Context.BuiltinFnTy) {
+ auto FD = cast<FunctionDecl>(Fn->getReferencedDeclOfCallee());
+
+ if (FD->getName() == "__builtin_amdgcn_is_invocable") {
+ auto FnPtrTy = Context.getPointerType(FD->getType());
+ auto R = ImpCastExprToType(Fn, FnPtrTy, CK_BuiltinFnToFnPtr).get();
+ return CallExpr::Create(Context, R, ArgExprs, Context.VoidTy,
+ ExprValueKind::VK_PRValue, RParenLoc,
+ FPOptionsOverride());
+ }
+ }
+
if (CheckArgsForPlaceholders(ArgExprs))
return ExprError();
@@ -13234,6 +13250,20 @@ inline QualType Sema::CheckBitwiseOperands(ExprResult &LHS, ExprResult &RHS,
return InvalidOperands(Loc, LHS, RHS);
}
+static inline bool IsAMDGPUPredicateBI(Expr *E) {
+ if (!E->getType()->isVoidType())
+ return false;
+
+ if (auto CE = dyn_cast<CallExpr>(E)) {
+ if (auto BI = CE->getDirectCallee())
+ if (BI->getName() == "__builtin_amdgcn_processor_is" ||
+ BI->getName() == "__builtin_amdgcn_is_invocable")
+ return true;
+ }
+
+ return false;
+}
+
// C99 6.5.[13,14]
inline QualType Sema::CheckLogicalOperands(ExprResult &LHS, ExprResult &RHS,
SourceLocation Loc,
@@ -13329,6 +13359,9 @@ inline QualType Sema::CheckLogicalOperands(ExprResult &LHS, ExprResult &RHS,
// The following is safe because we only use this method for
// non-overloadable operands.
+ if (IsAMDGPUPredicateBI(LHS.get()) && IsAMDGPUPredicateBI(RHS.get()))
+ return Context.VoidTy;
+
// C++ [expr.log.and]p1
// C++ [expr.log.or]p1
// The operands are both contextually converted to type bool.
@@ -15576,6 +15609,38 @@ static bool isOverflowingIntegerType(ASTContext &Ctx, QualType T) {
return Ctx.getIntWidth(T) >= Ctx.getIntWidth(Ctx.IntTy);
}
+static Expr *ExpandAMDGPUPredicateBI(ASTContext &Ctx, CallExpr *CE) {
+ if (!CE->getBuiltinCallee())
+ return CXXBoolLiteralExpr::Create(Ctx, false, Ctx.BoolTy, CE->getExprLoc());
+
+ if (Ctx.getTargetInfo().getTriple().isSPIRV()) {
+ CE->setType(Ctx.getLogicalOperationType());
+ return CE;
+ }
+
+ bool P = false;
+ auto &TI = Ctx.getTargetInfo();
+
+ if (CE->getDirectCallee()->getName() == "__builtin_amdgcn_processor_is") {
+ auto GFX = dyn_cast<StringLiteral>(CE->getArg(0)->IgnoreParenCasts());
+ auto TID = TI.getTargetID();
+ if (GFX && TID) {
+ auto N = GFX->getString();
+ P = TI.isValidCPUName(GFX->getString()) && TID->find(N) == 0;
+ }
+ } else {
+ auto FD = cast<FunctionDecl>(CE->getArg(0)->getReferencedDeclOfCallee());
+
+ StringRef RF = Ctx.BuiltinInfo.getRequiredFeatures(FD->getBuiltinID());
+ llvm::StringMap<bool> CF;
+ Ctx.getFunctionFeatureMap(CF, FD);
+
+ P = Builtin::evaluateRequiredTargetFeatures(RF, CF);
+ }
+
+ return CXXBoolLiteralExpr::Create(Ctx, P, Ctx.BoolTy, CE->getExprLoc());
+}
+
ExprResult Sema::CreateBuiltinUnaryOp(SourceLocation OpLoc,
UnaryOperatorKind Opc, Expr *InputExpr,
bool IsAfterAmp) {
@@ -15753,6 +15818,8 @@ ExprResult Sema::CreateBuiltinUnaryOp(SourceLocation OpLoc,
// Vector logical not returns the signed variant of the operand type.
resultType = GetSignedVectorType(resultType);
break;
+ } else if (IsAMDGPUPredicateBI(InputExpr)) {
+ break;
} else {
return ExprError(Diag(OpLoc, diag::err_typecheck_unary_expr)
<< resultType << Input.get()->getSourceRange());
@@ -20469,6 +20536,88 @@ void Sema::DiagnoseEqualityWithExtraParens(ParenExpr *ParenE) {
}
}
+static bool ValidateAMDGPUPredicateBI(Sema &Sema, CallExpr *CE) {
+ if (CE->getDirectCallee()->getName() == "__builtin_amdgcn_processor_is") {
+ auto GFX = dyn_cast<StringLiteral>(CE->getArg(0)->IgnoreParenCasts());
+ if (!GFX) {
+ Sema.Diag(CE->getExprLoc(),
+ diag::err_amdgcn_processor_is_arg_not_literal);
+ return false;
+ }
+ auto N = GFX->getString();
+ if (!Sema.getASTContext().getTargetInfo().isValidCPUName(N) &&
+ (!Sema.getASTContext().getAuxTargetInfo() ||
+ !Sema.getASTContext().getAuxTargetInfo()->isValidCPUName(N))) {
+ Sema.Diag(CE->getExprLoc(),
+ diag::err_amdgcn_processor_is_arg_invalid_value) << N;
+ return false;
+ }
+ } else {
+ auto Arg = CE->getArg(0);
+ if (!Arg || Arg->getType() != Sema.getASTContext().BuiltinFnTy) {
+ Sema.Diag(CE->getExprLoc(),
+ diag::err_amdgcn_is_invocable_arg_invalid_value) << Arg;
+ return false;
+ }
+ }
+
+ return true;
+}
+
+static Expr *MaybeHandleAMDGPUPredicateBI(Sema &Sema, Expr *E, bool &Invalid) {
+ if (auto UO = dyn_cast<UnaryOperator>(E)) {
+ auto SE = dyn_cast<CallExpr>(UO->getSubExpr());
+ if (IsAMDGPUPredicateBI(SE)) {
+ assert(
+ UO->getOpcode() == UnaryOperator::Opcode::UO_LNot &&
+ "__builtin_amdgcn_processor_is and __builtin_amdgcn_is_invocable "
+ "can only be used as operands of logical ops!");
+
+ if (!ValidateAMDGPUPredicateBI(Sema, SE)) {
+ Invalid = true;
+ return nullptr;
+ }
+
+ UO->setSubExpr(ExpandAMDGPUPredicateBI(Sema.getASTContext(), SE));
+ UO->setType(Sema.getASTContext().getLogicalOperationType());
+
+ return UO;
+ }
+ }
+ if (auto BO = dyn_cast<BinaryOperator>(E)) {
+ auto LHS = dyn_cast<CallExpr>(BO->getLHS());
+ auto RHS = dyn_cast<CallExpr>(BO->getRHS());
+ if (IsAMDGPUPredicateBI(LHS) && IsAMDGPUPredicateBI(RHS)) {
+ assert(
+ BO->isLogicalOp() &&
+ "__builtin_amdgcn_processor_is and __builtin_amdgcn_is_invocable "
+ "can only be used as operands of logical ops!");
+
+ if (!ValidateAMDGPUPredicateBI(Sema, LHS) ||
+ !ValidateAMDGPUPredicateBI(Sema, RHS)) {
+ Invalid = true;
+ return nullptr;
+ }
+
+ BO->setLHS(ExpandAMDGPUPredicateBI(Sema.getASTContext(), LHS));
+ BO->setRHS(ExpandAMDGPUPredicateBI(Sema.getASTContext(), RHS));
+ BO->setType(Sema.getASTContext().getLogicalOperationType());
+
+ return BO;
+ }
+ }
+ if (auto CE = dyn_cast<CallExpr>(E))
+ if (IsAMDGPUPredicateBI(CE)) {
+ if (!ValidateAMDGPUPredicateBI(Sema, CE)) {
+ Invalid = true;
+ return nullptr;
+ }
+ return ExpandAMDGPUPredicateBI(Sema.getASTContext(), CE);
+ }
+
+ return nullptr;
+}
+
ExprResult Sema::CheckBooleanCondition(SourceLocation Loc, Expr *E,
bool IsConstexpr) {
DiagnoseAssignmentAsCondition(E);
@@ -20480,6 +20629,14 @@ ExprResult Sema::CheckBooleanCondition(SourceLocation Loc, Expr *E,
E = result.get();
if (!E->isTypeDependent()) {
+ if (E->getType()->isVoidType()) {
+ bool IsInvalidPredicate = false;
+ if (auto BIC = MaybeHandleAMDGPUPredicateBI(*this, E, IsInvalidPredicate))
+ return BIC;
+ else if (IsInvalidPredicate)
+ return ExprError();
+ }
+
if (getLangOpts().CPlusPlus)
return CheckCXXBooleanCondition(E, IsConstexpr); // C++ 6.4p4
diff --git a/clang/test/CodeGen/amdgpu-builtin-cpu-is.c b/clang/test/CodeGen/amdgpu-builtin-cpu-is.c
new file mode 100644
index 0000000000000..6e261d9f5d239
--- /dev/null
+++ b/clang/test/CodeGen/amdgpu-builtin-cpu-is.c
@@ -0,0 +1,65 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --check-globals all --version 5
+// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -target-cpu gfx900 -emit-llvm %s -o - | FileCheck --check-prefix=AMDGCN-GFX900 %s
+// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -target-cpu gfx1010 -emit-llvm %s -o - | FileCheck --check-prefix=AMDGCN-GFX1010 %s
+// RUN: %clang_cc1 -triple spirv64-amd-amdhsa -emit-llvm %s -o - | FileCheck --check-prefix=AMDGCNSPIRV %s
+
+// Test that, depending on triple and, if applicable, target-cpu, one of three
+// things happens:
+// 1) for gfx900 we emit a call to trap (concrete target, matches)
+// 2) for gfx1010 we emit an empty kernel (concrete target, does not match)
+// 3) for AMDGCNSPIRV we emit llvm.amdgcn.is.gfx900 as a bool global, and
+// load from it to provide the condition a br (abstract target)
+//.
+// AMDGCN-GFX900: @__oclc_ABI_version = weak_odr hidden local_unnamed_addr addrspace(4) constant i32 600
+//.
+// AMDGCN-GFX1010: @__oclc_ABI_version = weak_odr hidden local_unnamed_addr addrspace(4) constant i32 600
+//.
+// AMDGCNSPIRV: @llvm.amdgcn.is.gfx900 = external addrspace(1) externally_initialized constant i1
+//.
+// AMDGCN-GFX900-LABEL: define dso_local void @foo(
+// AMDGCN-GFX900-SAME: ) #[[ATTR0:[0-9]+]] {
+// AMDGCN-GFX900-NEXT: [[ENTRY:.*:]]
+// AMDGCN-GFX900-NEXT: call void @llvm.trap()
+// AMDGCN-GFX900-NEXT: ret void
+//
+// AMDGCN-GFX1010-LABEL: define dso_local void @foo(
+// AMDGCN-GFX1010-SAME: ) #[[ATTR0:[0-9]+]] {
+// AMDGCN-GFX1010-NEXT: [[ENTRY:.*:]]
+// AMDGCN-GFX1010-NEXT: ret void
+//
+// AMDGCNSPIRV-LABEL: define spir_func void @foo(
+// AMDGCNSPIRV-SAME: ) addrspace(4) #[[ATTR0:[0-9]+]] {
+// AMDGCNSPIRV-NEXT: [[ENTRY:.*:]]
+// AMDGCNSPIRV-NEXT: [[TMP0:%.*]] = load i1, ptr addrspace(1) @llvm.amdgcn.is.gfx900, align 1
+// AMDGCNSPIRV-NEXT: br i1 [[TMP0]], label %[[IF_THEN:.*]], label %[[IF_END:.*]]
+// AMDGCNSPIRV: [[IF_THEN]]:
+// AMDGCNSPIRV-NEXT: call addrspace(4) void @llvm.trap()
+// AMDGCNSPIRV-NEXT: br label %[[IF_END]]
+// AMDGCNSPIRV: [[IF_END]]:
+// AMDGCNSPIRV-NEXT: ret void
+//
+void foo() {
+ if (__builtin_cpu_is("gfx90...
[truncated]
|
@llvm/pr-subscribers-clang-codegen Author: Alex Voicu (AlexVlx) ChangesThis change adds two semi-magical builtins for AMDGPU:
Neither of these are The motivation for adding these is two-fold:
I've tried to keep the overall footprint of the change small. The changes to Sema are a bit unpleasant, but there was a strong desire to have Clang validate these, and to constrain their uses, and this was the most compact solution I could come up with (suggestions welcome). In the end, I will note there is nothing that is actually AMDGPU specific here, so it is possible that in the future, assuming interests from other targets / users, we'd just promote them to generic intrinsics. Patch is 59.55 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/134016.diff 17 Files Affected:
diff --git a/clang/docs/LanguageExtensions.rst b/clang/docs/LanguageExtensions.rst
index 3b8a9cac6587a..8a7cb75af13e5 100644
--- a/clang/docs/LanguageExtensions.rst
+++ b/clang/docs/LanguageExtensions.rst
@@ -4920,6 +4920,116 @@ If no address spaces names are provided, all address spaces are fenced.
__builtin_amdgcn_fence(__ATOMIC_SEQ_CST, "workgroup", "local")
__builtin_amdgcn_fence(__ATOMIC_SEQ_CST, "workgroup", "local", "global")
+__builtin_amdgcn_processor_is and __builtin_amdgcn_is_invocable
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+``__builtin_amdgcn_processor_is`` and ``__builtin_amdgcn_is_invocable`` provide
+a functional mechanism for programatically querying:
+
+* the identity of the current target processor;
+* the capability of the current target processor to invoke a particular builtin.
+
+**Syntax**:
+
+.. code-block:: c
+
+ // When used as the predicate for a control structure
+ bool __builtin_amdgcn_processor_is(const char*);
+ bool __builtin_amdgcn_is_invocable(builtin_name);
+ // Otherwise
+ void __builtin_amdgcn_processor_is(const char*);
+ void __builtin_amdgcn_is_invocable(void);
+
+**Example of use**:
+
+.. code-block:: c++
+
+ if (__builtin_amdgcn_processor_is("gfx1201") ||
+ __builtin_amdgcn_is_invocable(__builtin_amdgcn_s_sleep_var))
+ __builtin_amdgcn_s_sleep_var(x);
+
+ if (!__builtin_amdgcn_processor_is("gfx906"))
+ __builtin_amdgcn_s_wait_event_export_ready();
+ else if (__builtin_amdgcn_processor_is("gfx1010") ||
+ __builtin_amdgcn_processor_is("gfx1101"))
+ __builtin_amdgcn_s_ttracedata_imm(1);
+
+ while (__builtin_amdgcn_processor_is("gfx1101")) *p += x;
+
+ do { *p -= x; } while (__builtin_amdgcn_processor_is("gfx1010"));
+
+ for (; __builtin_amdgcn_processor_is("gfx1201"); ++*p) break;
+
+ if (__builtin_amdgcn_is_invocable(__builtin_amdgcn_s_wait_event_export_ready))
+ __builtin_amdgcn_s_wait_event_export_ready();
+ else if (__builtin_amdgcn_is_invocable(__builtin_amdgcn_s_ttracedata_imm))
+ __builtin_amdgcn_s_ttracedata_imm(1);
+
+ do {
+ *p -= x;
+ } while (__builtin_amdgcn_is_invocable(__builtin_amdgcn_global_load_tr_b64_i32));
+
+ for (; __builtin_amdgcn_is_invocable(__builtin_amdgcn_permlane64); ++*p) break;
+
+**Description**:
+
+When used as the predicate value of the following control structures:
+
+.. code-block:: c++
+
+ if (...)
+ while (...)
+ do { } while (...)
+ for (...)
+
+be it directly, or as arguments to logical operators such as ``!, ||, &&``, the
+builtins return a boolean value that:
+
+* indicates whether the current target matches the argument; the argument MUST
+ be a string literal and a valid AMDGPU target
+* indicates whether the builtin function passed as the argument can be invoked
+ by the current target; the argument MUST be either a generic or AMDGPU
+ specific builtin name
+
+Outside of these contexts, the builtins have a ``void`` returning signature
+which prevents their misuse.
+
+**Example of invalid use**:
+
+.. code-block:: c++
+
+ void kernel(int* p, int x, bool (*pfn)(bool), const char* str) {
+ if (__builtin_amdgcn_processor_is("not_an_amdgcn_gfx_id")) return;
+ else if (__builtin_amdgcn_processor_is(str)) __builtin_trap();
+
+ bool a = __builtin_amdgcn_processor_is("gfx906");
+ const bool b = !__builtin_amdgcn_processor_is("gfx906");
+ const bool c = !__builtin_amdgcn_processor_is("gfx906");
+ bool d = __builtin_amdgcn_is_invocable(__builtin_amdgcn_s_sleep_var);
+ bool e = !__builtin_amdgcn_is_invocable(__builtin_amdgcn_s_sleep_var);
+ const auto f =
+ !__builtin_amdgcn_is_invocable(__builtin_amdgcn_s_wait_event_export_ready)
+ || __builtin_amdgcn_is_invocable(__builtin_amdgcn_s_sleep_var);
+ const auto g =
+ !__builtin_amdgcn_is_invocable(__builtin_amdgcn_s_wait_event_export_ready)
+ || !__builtin_amdgcn_is_invocable(__builtin_amdgcn_s_sleep_var);
+ __builtin_amdgcn_processor_is("gfx1201")
+ ? __builtin_amdgcn_s_sleep_var(x) : __builtin_amdgcn_s_sleep(42);
+ if (pfn(__builtin_amdgcn_processor_is("gfx1200")))
+ __builtin_amdgcn_s_sleep_var(x);
+
+ if (__builtin_amdgcn_is_invocable("__builtin_amdgcn_s_sleep_var")) return;
+ else if (__builtin_amdgcn_is_invocable(x)) __builtin_trap();
+ }
+
+When invoked while compiling for a concrete target, the builtins are evaluated
+early by Clang, and never produce any CodeGen effects / have no observable
+side-effects in IR. Conversely, when compiling for AMDGCN flavoured SPIR-v,
+which is an abstract target, a series of predicate values are implicitly
+created. These predicates get resolved when finalizing the compilation process
+for a concrete target, and shall reflect the latter's identity and features.
+Thus, it is possible to author high-level code, in e.g. HIP, that is target
+adaptive in a dynamic fashion, contrary to macro based mechanisms.
ARM/AArch64 Language Extensions
-------------------------------
diff --git a/clang/include/clang/Basic/BuiltinsAMDGPU.def b/clang/include/clang/Basic/BuiltinsAMDGPU.def
index 44ef404aee72f..5d01a7e75f7e7 100644
--- a/clang/include/clang/Basic/BuiltinsAMDGPU.def
+++ b/clang/include/clang/Basic/BuiltinsAMDGPU.def
@@ -346,6 +346,11 @@ BUILTIN(__builtin_amdgcn_endpgm, "v", "nr")
BUILTIN(__builtin_amdgcn_get_fpenv, "WUi", "n")
BUILTIN(__builtin_amdgcn_set_fpenv, "vWUi", "n")
+// These are special FE only builtins intended for forwarding the requirements
+// to the ME.
+BUILTIN(__builtin_amdgcn_processor_is, "vcC*", "nctu")
+BUILTIN(__builtin_amdgcn_is_invocable, "v", "nctu")
+
//===----------------------------------------------------------------------===//
// R600-NI only builtins.
//===----------------------------------------------------------------------===//
diff --git a/clang/include/clang/Basic/DiagnosticSemaKinds.td b/clang/include/clang/Basic/DiagnosticSemaKinds.td
index 5e45482584946..45f0f9eb88e55 100644
--- a/clang/include/clang/Basic/DiagnosticSemaKinds.td
+++ b/clang/include/clang/Basic/DiagnosticSemaKinds.td
@@ -13054,4 +13054,14 @@ def err_acc_decl_for_routine
// AMDGCN builtins diagnostics
def err_amdgcn_global_load_lds_size_invalid_value : Error<"invalid size value">;
def note_amdgcn_global_load_lds_size_valid_value : Note<"size must be %select{1, 2, or 4|1, 2, 4, 12 or 16}0">;
+def err_amdgcn_processor_is_arg_not_literal
+ : Error<"the argument to __builtin_amdgcn_processor_is must be a string "
+ "literal">;
+def err_amdgcn_processor_is_arg_invalid_value
+ : Error<"the argument to __builtin_amdgcn_processor_is must be a valid "
+ "AMDGCN processor identifier; '%0' is not valid">;
+def err_amdgcn_is_invocable_arg_invalid_value
+ : Error<"the argument to __builtin_amdgcn_is_invocable must be either a "
+ "target agnostic builtin or an AMDGCN target specific builtin; `%0`"
+ " is not valid">;
} // end of sema component.
diff --git a/clang/lib/Basic/Targets/SPIR.cpp b/clang/lib/Basic/Targets/SPIR.cpp
index 5b5f47f9647a2..eb43d9b0be283 100644
--- a/clang/lib/Basic/Targets/SPIR.cpp
+++ b/clang/lib/Basic/Targets/SPIR.cpp
@@ -152,3 +152,7 @@ void SPIRV64AMDGCNTargetInfo::setAuxTarget(const TargetInfo *Aux) {
Float128Format = DoubleFormat;
}
}
+
+bool SPIRV64AMDGCNTargetInfo::isValidCPUName(StringRef CPU) const {
+ return AMDGPUTI.isValidCPUName(CPU);
+}
diff --git a/clang/lib/Basic/Targets/SPIR.h b/clang/lib/Basic/Targets/SPIR.h
index 78505d66d6f2f..7aa13cbeb89fd 100644
--- a/clang/lib/Basic/Targets/SPIR.h
+++ b/clang/lib/Basic/Targets/SPIR.h
@@ -432,6 +432,10 @@ class LLVM_LIBRARY_VISIBILITY SPIRV64AMDGCNTargetInfo final
}
bool hasInt128Type() const override { return TargetInfo::hasInt128Type(); }
+
+ // This is only needed for validating arguments passed to
+ // __builtin_amdgcn_processor_is
+ bool isValidCPUName(StringRef Name) const override;
};
} // namespace targets
diff --git a/clang/lib/CodeGen/TargetBuiltins/AMDGPU.cpp b/clang/lib/CodeGen/TargetBuiltins/AMDGPU.cpp
index b56b739094ff3..7b1a3815144b4 100644
--- a/clang/lib/CodeGen/TargetBuiltins/AMDGPU.cpp
+++ b/clang/lib/CodeGen/TargetBuiltins/AMDGPU.cpp
@@ -284,6 +284,18 @@ void CodeGenFunction::AddAMDGPUFenceAddressSpaceMMRA(llvm::Instruction *Inst,
Inst->setMetadata(LLVMContext::MD_mmra, MMRAMetadata::getMD(Ctx, MMRAs));
}
+static Value *GetOrInsertAMDGPUPredicate(CodeGenFunction &CGF, Twine Name) {
+ auto PTy = IntegerType::getInt1Ty(CGF.getLLVMContext());
+
+ auto P = cast<GlobalVariable>(
+ CGF.CGM.getModule().getOrInsertGlobal(Name.str(), PTy));
+ P->setConstant(true);
+ P->setExternallyInitialized(true);
+
+ return CGF.Builder.CreateLoad(RawAddress(P, PTy, CharUnits::One(),
+ KnownNonNull));
+}
+
Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned BuiltinID,
const CallExpr *E) {
llvm::AtomicOrdering AO = llvm::AtomicOrdering::SequentiallyConsistent;
@@ -585,6 +597,23 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned BuiltinID,
llvm::Value *Env = EmitScalarExpr(E->getArg(0));
return Builder.CreateCall(F, {Env});
}
+ case AMDGPU::BI__builtin_amdgcn_processor_is: {
+ assert(CGM.getTriple().isSPIRV() &&
+ "__builtin_amdgcn_processor_is should never reach CodeGen for "
+ "concrete targets!");
+ StringRef Proc = cast<clang::StringLiteral>(E->getArg(0))->getString();
+ return GetOrInsertAMDGPUPredicate(*this, "llvm.amdgcn.is." + Proc);
+ }
+ case AMDGPU::BI__builtin_amdgcn_is_invocable: {
+ assert(CGM.getTriple().isSPIRV() &&
+ "__builtin_amdgcn_is_invocable should never reach CodeGen for "
+ "concrete targets!");
+ auto FD = cast<FunctionDecl>(
+ cast<DeclRefExpr>(E->getArg(0))->getReferencedDeclOfCallee());
+ StringRef RF =
+ getContext().BuiltinInfo.getRequiredFeatures(FD->getBuiltinID());
+ return GetOrInsertAMDGPUPredicate(*this, "llvm.amdgcn.has." + RF);
+ }
case AMDGPU::BI__builtin_amdgcn_read_exec:
return EmitAMDGCNBallotForExec(*this, E, Int64Ty, Int64Ty, false);
case AMDGPU::BI__builtin_amdgcn_read_exec_lo:
diff --git a/clang/lib/Sema/SemaExpr.cpp b/clang/lib/Sema/SemaExpr.cpp
index 7cc8374e69d73..24f5262ab3cf4 100644
--- a/clang/lib/Sema/SemaExpr.cpp
+++ b/clang/lib/Sema/SemaExpr.cpp
@@ -6541,6 +6541,22 @@ ExprResult Sema::BuildCallExpr(Scope *Scope, Expr *Fn, SourceLocation LParenLoc,
if (Result.isInvalid()) return ExprError();
Fn = Result.get();
+ // The __builtin_amdgcn_is_invocable builtin is special, and will be resolved
+ // later, when we check boolean conditions, for now we merely forward it
+ // without any additional checking.
+ if (Fn->getType() == Context.BuiltinFnTy && ArgExprs.size() == 1 &&
+ ArgExprs[0]->getType() == Context.BuiltinFnTy) {
+ auto FD = cast<FunctionDecl>(Fn->getReferencedDeclOfCallee());
+
+ if (FD->getName() == "__builtin_amdgcn_is_invocable") {
+ auto FnPtrTy = Context.getPointerType(FD->getType());
+ auto R = ImpCastExprToType(Fn, FnPtrTy, CK_BuiltinFnToFnPtr).get();
+ return CallExpr::Create(Context, R, ArgExprs, Context.VoidTy,
+ ExprValueKind::VK_PRValue, RParenLoc,
+ FPOptionsOverride());
+ }
+ }
+
if (CheckArgsForPlaceholders(ArgExprs))
return ExprError();
@@ -13234,6 +13250,20 @@ inline QualType Sema::CheckBitwiseOperands(ExprResult &LHS, ExprResult &RHS,
return InvalidOperands(Loc, LHS, RHS);
}
+static inline bool IsAMDGPUPredicateBI(Expr *E) {
+ if (!E->getType()->isVoidType())
+ return false;
+
+ if (auto CE = dyn_cast<CallExpr>(E)) {
+ if (auto BI = CE->getDirectCallee())
+ if (BI->getName() == "__builtin_amdgcn_processor_is" ||
+ BI->getName() == "__builtin_amdgcn_is_invocable")
+ return true;
+ }
+
+ return false;
+}
+
// C99 6.5.[13,14]
inline QualType Sema::CheckLogicalOperands(ExprResult &LHS, ExprResult &RHS,
SourceLocation Loc,
@@ -13329,6 +13359,9 @@ inline QualType Sema::CheckLogicalOperands(ExprResult &LHS, ExprResult &RHS,
// The following is safe because we only use this method for
// non-overloadable operands.
+ if (IsAMDGPUPredicateBI(LHS.get()) && IsAMDGPUPredicateBI(RHS.get()))
+ return Context.VoidTy;
+
// C++ [expr.log.and]p1
// C++ [expr.log.or]p1
// The operands are both contextually converted to type bool.
@@ -15576,6 +15609,38 @@ static bool isOverflowingIntegerType(ASTContext &Ctx, QualType T) {
return Ctx.getIntWidth(T) >= Ctx.getIntWidth(Ctx.IntTy);
}
+static Expr *ExpandAMDGPUPredicateBI(ASTContext &Ctx, CallExpr *CE) {
+ if (!CE->getBuiltinCallee())
+ return CXXBoolLiteralExpr::Create(Ctx, false, Ctx.BoolTy, CE->getExprLoc());
+
+ if (Ctx.getTargetInfo().getTriple().isSPIRV()) {
+ CE->setType(Ctx.getLogicalOperationType());
+ return CE;
+ }
+
+ bool P = false;
+ auto &TI = Ctx.getTargetInfo();
+
+ if (CE->getDirectCallee()->getName() == "__builtin_amdgcn_processor_is") {
+ auto GFX = dyn_cast<StringLiteral>(CE->getArg(0)->IgnoreParenCasts());
+ auto TID = TI.getTargetID();
+ if (GFX && TID) {
+ auto N = GFX->getString();
+ P = TI.isValidCPUName(GFX->getString()) && TID->find(N) == 0;
+ }
+ } else {
+ auto FD = cast<FunctionDecl>(CE->getArg(0)->getReferencedDeclOfCallee());
+
+ StringRef RF = Ctx.BuiltinInfo.getRequiredFeatures(FD->getBuiltinID());
+ llvm::StringMap<bool> CF;
+ Ctx.getFunctionFeatureMap(CF, FD);
+
+ P = Builtin::evaluateRequiredTargetFeatures(RF, CF);
+ }
+
+ return CXXBoolLiteralExpr::Create(Ctx, P, Ctx.BoolTy, CE->getExprLoc());
+}
+
ExprResult Sema::CreateBuiltinUnaryOp(SourceLocation OpLoc,
UnaryOperatorKind Opc, Expr *InputExpr,
bool IsAfterAmp) {
@@ -15753,6 +15818,8 @@ ExprResult Sema::CreateBuiltinUnaryOp(SourceLocation OpLoc,
// Vector logical not returns the signed variant of the operand type.
resultType = GetSignedVectorType(resultType);
break;
+ } else if (IsAMDGPUPredicateBI(InputExpr)) {
+ break;
} else {
return ExprError(Diag(OpLoc, diag::err_typecheck_unary_expr)
<< resultType << Input.get()->getSourceRange());
@@ -20469,6 +20536,88 @@ void Sema::DiagnoseEqualityWithExtraParens(ParenExpr *ParenE) {
}
}
+static bool ValidateAMDGPUPredicateBI(Sema &Sema, CallExpr *CE) {
+ if (CE->getDirectCallee()->getName() == "__builtin_amdgcn_processor_is") {
+ auto GFX = dyn_cast<StringLiteral>(CE->getArg(0)->IgnoreParenCasts());
+ if (!GFX) {
+ Sema.Diag(CE->getExprLoc(),
+ diag::err_amdgcn_processor_is_arg_not_literal);
+ return false;
+ }
+ auto N = GFX->getString();
+ if (!Sema.getASTContext().getTargetInfo().isValidCPUName(N) &&
+ (!Sema.getASTContext().getAuxTargetInfo() ||
+ !Sema.getASTContext().getAuxTargetInfo()->isValidCPUName(N))) {
+ Sema.Diag(CE->getExprLoc(),
+ diag::err_amdgcn_processor_is_arg_invalid_value) << N;
+ return false;
+ }
+ } else {
+ auto Arg = CE->getArg(0);
+ if (!Arg || Arg->getType() != Sema.getASTContext().BuiltinFnTy) {
+ Sema.Diag(CE->getExprLoc(),
+ diag::err_amdgcn_is_invocable_arg_invalid_value) << Arg;
+ return false;
+ }
+ }
+
+ return true;
+}
+
+static Expr *MaybeHandleAMDGPUPredicateBI(Sema &Sema, Expr *E, bool &Invalid) {
+ if (auto UO = dyn_cast<UnaryOperator>(E)) {
+ auto SE = dyn_cast<CallExpr>(UO->getSubExpr());
+ if (IsAMDGPUPredicateBI(SE)) {
+ assert(
+ UO->getOpcode() == UnaryOperator::Opcode::UO_LNot &&
+ "__builtin_amdgcn_processor_is and __builtin_amdgcn_is_invocable "
+ "can only be used as operands of logical ops!");
+
+ if (!ValidateAMDGPUPredicateBI(Sema, SE)) {
+ Invalid = true;
+ return nullptr;
+ }
+
+ UO->setSubExpr(ExpandAMDGPUPredicateBI(Sema.getASTContext(), SE));
+ UO->setType(Sema.getASTContext().getLogicalOperationType());
+
+ return UO;
+ }
+ }
+ if (auto BO = dyn_cast<BinaryOperator>(E)) {
+ auto LHS = dyn_cast<CallExpr>(BO->getLHS());
+ auto RHS = dyn_cast<CallExpr>(BO->getRHS());
+ if (IsAMDGPUPredicateBI(LHS) && IsAMDGPUPredicateBI(RHS)) {
+ assert(
+ BO->isLogicalOp() &&
+ "__builtin_amdgcn_processor_is and __builtin_amdgcn_is_invocable "
+ "can only be used as operands of logical ops!");
+
+ if (!ValidateAMDGPUPredicateBI(Sema, LHS) ||
+ !ValidateAMDGPUPredicateBI(Sema, RHS)) {
+ Invalid = true;
+ return nullptr;
+ }
+
+ BO->setLHS(ExpandAMDGPUPredicateBI(Sema.getASTContext(), LHS));
+ BO->setRHS(ExpandAMDGPUPredicateBI(Sema.getASTContext(), RHS));
+ BO->setType(Sema.getASTContext().getLogicalOperationType());
+
+ return BO;
+ }
+ }
+ if (auto CE = dyn_cast<CallExpr>(E))
+ if (IsAMDGPUPredicateBI(CE)) {
+ if (!ValidateAMDGPUPredicateBI(Sema, CE)) {
+ Invalid = true;
+ return nullptr;
+ }
+ return ExpandAMDGPUPredicateBI(Sema.getASTContext(), CE);
+ }
+
+ return nullptr;
+}
+
ExprResult Sema::CheckBooleanCondition(SourceLocation Loc, Expr *E,
bool IsConstexpr) {
DiagnoseAssignmentAsCondition(E);
@@ -20480,6 +20629,14 @@ ExprResult Sema::CheckBooleanCondition(SourceLocation Loc, Expr *E,
E = result.get();
if (!E->isTypeDependent()) {
+ if (E->getType()->isVoidType()) {
+ bool IsInvalidPredicate = false;
+ if (auto BIC = MaybeHandleAMDGPUPredicateBI(*this, E, IsInvalidPredicate))
+ return BIC;
+ else if (IsInvalidPredicate)
+ return ExprError();
+ }
+
if (getLangOpts().CPlusPlus)
return CheckCXXBooleanCondition(E, IsConstexpr); // C++ 6.4p4
diff --git a/clang/test/CodeGen/amdgpu-builtin-cpu-is.c b/clang/test/CodeGen/amdgpu-builtin-cpu-is.c
new file mode 100644
index 0000000000000..6e261d9f5d239
--- /dev/null
+++ b/clang/test/CodeGen/amdgpu-builtin-cpu-is.c
@@ -0,0 +1,65 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --check-globals all --version 5
+// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -target-cpu gfx900 -emit-llvm %s -o - | FileCheck --check-prefix=AMDGCN-GFX900 %s
+// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -target-cpu gfx1010 -emit-llvm %s -o - | FileCheck --check-prefix=AMDGCN-GFX1010 %s
+// RUN: %clang_cc1 -triple spirv64-amd-amdhsa -emit-llvm %s -o - | FileCheck --check-prefix=AMDGCNSPIRV %s
+
+// Test that, depending on triple and, if applicable, target-cpu, one of three
+// things happens:
+// 1) for gfx900 we emit a call to trap (concrete target, matches)
+// 2) for gfx1010 we emit an empty kernel (concrete target, does not match)
+// 3) for AMDGCNSPIRV we emit llvm.amdgcn.is.gfx900 as a bool global, and
+// load from it to provide the condition a br (abstract target)
+//.
+// AMDGCN-GFX900: @__oclc_ABI_version = weak_odr hidden local_unnamed_addr addrspace(4) constant i32 600
+//.
+// AMDGCN-GFX1010: @__oclc_ABI_version = weak_odr hidden local_unnamed_addr addrspace(4) constant i32 600
+//.
+// AMDGCNSPIRV: @llvm.amdgcn.is.gfx900 = external addrspace(1) externally_initialized constant i1
+//.
+// AMDGCN-GFX900-LABEL: define dso_local void @foo(
+// AMDGCN-GFX900-SAME: ) #[[ATTR0:[0-9]+]] {
+// AMDGCN-GFX900-NEXT: [[ENTRY:.*:]]
+// AMDGCN-GFX900-NEXT: call void @llvm.trap()
+// AMDGCN-GFX900-NEXT: ret void
+//
+// AMDGCN-GFX1010-LABEL: define dso_local void @foo(
+// AMDGCN-GFX1010-SAME: ) #[[ATTR0:[0-9]+]] {
+// AMDGCN-GFX1010-NEXT: [[ENTRY:.*:]]
+// AMDGCN-GFX1010-NEXT: ret void
+//
+// AMDGCNSPIRV-LABEL: define spir_func void @foo(
+// AMDGCNSPIRV-SAME: ) addrspace(4) #[[ATTR0:[0-9]+]] {
+// AMDGCNSPIRV-NEXT: [[ENTRY:.*:]]
+// AMDGCNSPIRV-NEXT: [[TMP0:%.*]] = load i1, ptr addrspace(1) @llvm.amdgcn.is.gfx900, align 1
+// AMDGCNSPIRV-NEXT: br i1 [[TMP0]], label %[[IF_THEN:.*]], label %[[IF_END:.*]]
+// AMDGCNSPIRV: [[IF_THEN]]:
+// AMDGCNSPIRV-NEXT: call addrspace(4) void @llvm.trap()
+// AMDGCNSPIRV-NEXT: br label %[[IF_END]]
+// AMDGCNSPIRV: [[IF_END]]:
+// AMDGCNSPIRV-NEXT: ret void
+//
+void foo() {
+ if (__builtin_cpu_is("gfx90...
[truncated]
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very cool, in general I'm a fan of being able to use LLVM-IR as a more general target. We already hack around these things in practice, so I think it's only beneficial to formalize is in a more correct way, even if LLVM-IR wasn't 'strictly' intended to be this kind of serialization format.
// AMDGCNSPIRV-NEXT: ret void | ||
// | ||
void foo() { | ||
if (__builtin_amdgcn_is_invocable(__builtin_amdgcn_permlanex16)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this intended to handle builtins that require certain target features to be set?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we get a test? Something simple like +dpp
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we get a test? Something simple like
+dpp
?
Sure, but if possible, could you clarify what you would like to be tested / what you expect to see, so that we avoid churning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue with how the ROCm device libs does it, is that certain builtins require target features to be used. It hacks around this with the __attribute__((target))
. I just want to know that you can call a builtin that requires +ddp
features without that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is worth a release note item.
Indeed! I botched moving the changes from my internal scratchpad, and the rel notes got lost; fixing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A side question, is it legal to use the builtin in unstructured control flow, like here: https://godbolt.org/z/no7Kzv19r ? Note, if the answer is "no", then enforcing the builtin to initialize something would (probably) automatically prevent this case, as clang would error out with:
error: cannot jump from this goto statement to its label note: jump bypasses variable initialization
Many thanks for asking this, I had completely ignored this scenario! I've added handling for this which is symmetric with what we already do for if constexpr
or if available
; indeed, the answer to your question was "no":)
@erichkeane @AaronBallman if, when you have time, you could please indicate if the new direction is at least generally aligned with what you had in mind, it'd be appreciated! |
Gentle ping. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the proposed approach is a reasonable direction. WDYT @erichkeane ?
The builtins return a value of type ``__amdgpu_feature_predicate_t``, which is a | ||
target specific type that behaves as if its C++ definition was the following: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this builtin work in C? If so, the docs should be updated to make it clear that this behavior applies to C as well as C++ and explain what it means in a bit more detail (presume that C users have no idea how C++ idioms work).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it does work in C, although I'm not convinced it will see significant use. That being said, there's no good reason to make it C++ only. I have tried to add an explanation for that context / provide a fleshed out example, please do let me know if it's more or less aligned with what you had in mind. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First, please take a look at the LLVM coding standard re the use of 'auto'.
Second: The use of a special type for these builtins is a little novel (though I see the predicate type already exists?), but I guess I'm ok with it. I have some concerns with how the conversions for it work, particularly being represented always as an i1
, but the tests you have look about right.
I would like to see a test that is effectively:
bool f() {
return __builtin_amdgcn_processor_is(...);
}
(and maybe one returning 'auto' to make sure it is deduced properly).
clang/docs/ReleaseNotes.rst
Outdated
@@ -758,6 +758,10 @@ AMDGPU Support | |||
^^^^^^^^^^^^^^ | |||
|
|||
- Bump the default code object version to 6. ROCm 6.3 is required to run any program compiled with COV6. | |||
- Introduced a new target specific builtin ``__builtin_amdgcn_processor_is``, | |||
a late / deferred query for the current target processor |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a late / deferred query for the current target processor | |
a late / deferred query for the current target processor. |
clang/docs/ReleaseNotes.rst
Outdated
- Introduced a new target specific builtin ``__builtin_amdgcn_processor_is``, | ||
a late / deferred query for the current target processor | ||
- Introduced a new target specific builtin ``__builtin_amdgcn_is_invocable``, | ||
which enables fine-grained, per-builtin, feature availability |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
which enables fine-grained, per-builtin, feature availability | |
which enables fine-grained, per-builtin, feature availability. |
|
||
/// Expand a valid use of the feature identification builtins into its | ||
/// corresponding sequence of instructions. | ||
Expr *ExpandAMDGPUPredicateBI(CallExpr *CE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Expr *ExpandAMDGPUPredicateBI(CallExpr *CE); | |
Expr *ExpandAMDGPUPredicateBuiltIn(CallExpr *CE); |
clang/lib/Sema/SemaAMDGPU.cpp
Outdated
@@ -366,4 +367,72 @@ void SemaAMDGPU::handleAMDGPUMaxNumWorkGroupsAttr(Decl *D, | |||
addAMDGPUMaxNumWorkGroupsAttr(D, AL, AL.getArgAsExpr(0), YExpr, ZExpr); | |||
} | |||
|
|||
Expr *SemaAMDGPU::ExpandAMDGPUPredicateBI(CallExpr *CE) { | |||
auto &Ctx = getASTContext(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
None of these are supposed to be 'auto' unless the type itself is on the RHS. Goes for most of this function.
clang/lib/Sema/SemaExpr.cpp
Outdated
auto *FD = cast<FunctionDecl>(Fn->getReferencedDeclOfCallee()); | ||
|
||
if (FD->getName() == "__builtin_amdgcn_is_invocable") { | ||
auto FnPtrTy = Context.getPointerType(FD->getType()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More 'auto' use.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First, please take a look at the LLVM coding standard re the use of 'auto'.
Second: The use of a special type for these builtins is a little novel (though I see the predicate type already exists?), but I guess I'm ok with it. I have some concerns with how the conversions for it work, particularly being represented always as an
i1
, but the tests you have look about right.I would like to see a test that is effectively:
bool f() { return __builtin_amdgcn_processor_is(...); }
(and maybe one returning 'auto' to make sure it is deduced properly).
Apologies, I missed this earlier, and only got around to adding them now. Please do note that for cases where the function return type is bool
one cannot directly return a predicate value, as it does not get implicitly casted to bool
. This matches the behaviour of a type with an explicit
conversion operator, which is what we're modelling.
This change adds two semi-magical builtins for AMDGPU:
__builtin_amdgcn_processor_is
, which is similar in observable behaviour with__builtin_cpu_is
, except that it is never "evaluated" at run time;__builtin_amdgcn_is_invocable
, which is behaviourally similar with__has_builtin
, except that it is not a macro (i.e. not evaluated at preprocessing time).Neither of these are
constexpr
, even though when compiling for concrete (i.e.gfxXXX
/gfxXXX-generic
) targets they get evaluated in Clang, so they shouldn't tear the AST too badly / at all for multi-pass compilation cases like HIP. They can only be used in specific contexts (as args to control structures).The motivation for adding these is two-fold:
I've tried to keep the overall footprint of the change small. The changes to Sema are a bit unpleasant, but there was a strong desire to have Clang validate these, and to constrain their uses, and this was the most compact solution I could come up with (suggestions welcome).
In the end, I will note there is nothing that is actually AMDGPU specific here, so it is possible that in the future, assuming interests from other targets / users, we'd just promote them to generic intrinsics.