-
Notifications
You must be signed in to change notification settings - Fork 14k
[FMV][AArch64] Do not emit ifunc resolver on use. #97761
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
It was raised in llvm#81494 that we are not generating correct code when there is no TU-local caller. The suggestion was to emit a resolver: * Whenever there is a use in the TU. * When the TU has a definition of the default version. See the comment for more details: llvm#81494 (comment) This got addressed with llvm#84405. Generating a resolver on use means that we may end up with multiple resolvers across different translation units. Those resolvers may not be the same because each translation unit may contain different version declarations (user's fault). Therefore the order of linking the final image determines which of these weak symbols gets selected, resulting in non consisted behavior. I am proposing to stop emitting a resolver on use and only do so in the translation unit which contains the default defition. This way we guarantee the existance of a single resolver. Now, when a versioned function is used we want to emit a declaration of the function symbol omitting the multiversion mangling. I have added a requirement to ACLE mandating that all the function versions are declared in the translation unit which contains the default definition: ARM-software/acle#328
@llvm/pr-subscribers-clang-codegen Author: Alexandros Lamprineas (labrinea) ChangesIt was raised in #81494 that we are not generating correct code when there is no TU-local caller. The suggestion was to emit a resolver:
See the comment for more details: This got addressed with #84405. Generating a resolver on use means that we may end up with multiple resolvers across different translation units. Those resolvers may not be the same because each translation unit may contain different version declarations (user's fault). Therefore the order of linking the final image determines which of these weak symbols gets selected, resulting in non consisted behavior. I am proposing to stop emitting a resolver on use and only do so in the translation unit which contains the default defition. This way we guarantee the existance of a single resolver. Now, when a versioned function is used we want to emit a declaration of the function symbol omitting the multiversion mangling. I have added a requirement to ACLE mandating that all the function versions are declared in the translation unit which contains the default definition: ARM-software/acle#328 Patch is 103.86 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/97761.diff 7 Files Affected:
diff --git a/clang/lib/CodeGen/CodeGenModule.cpp b/clang/lib/CodeGen/CodeGenModule.cpp
index 99e986d371cac..ede1b641584f3 100644
--- a/clang/lib/CodeGen/CodeGenModule.cpp
+++ b/clang/lib/CodeGen/CodeGenModule.cpp
@@ -3772,8 +3772,7 @@ void CodeGenModule::EmitGlobal(GlobalDecl GD) {
// Forward declarations are emitted lazily on first use.
if (!FD->doesThisDeclarationHaveABody()) {
if (!FD->doesDeclarationForceExternallyVisibleDefinition() &&
- (!FD->isMultiVersion() ||
- !FD->getASTContext().getTargetInfo().getTriple().isAArch64()))
+ (!FD->isMultiVersion() || !getTarget().getTriple().isAArch64()))
return;
StringRef MangledName = getMangledName(GD);
@@ -4167,23 +4166,6 @@ llvm::GlobalValue::LinkageTypes getMultiversionLinkage(CodeGenModule &CGM,
return llvm::GlobalValue::WeakODRLinkage;
}
-static FunctionDecl *createDefaultTargetVersionFrom(const FunctionDecl *FD) {
- auto *DeclCtx = const_cast<DeclContext *>(FD->getDeclContext());
- TypeSourceInfo *TInfo = FD->getTypeSourceInfo();
- StorageClass SC = FD->getStorageClass();
- DeclarationName Name = FD->getNameInfo().getName();
-
- FunctionDecl *NewDecl =
- FunctionDecl::Create(FD->getASTContext(), DeclCtx, FD->getBeginLoc(),
- FD->getEndLoc(), Name, TInfo->getType(), TInfo, SC);
-
- NewDecl->setIsMultiVersion();
- NewDecl->addAttr(TargetVersionAttr::CreateImplicit(
- NewDecl->getASTContext(), "default", NewDecl->getSourceRange()));
-
- return NewDecl;
-}
-
void CodeGenModule::emitMultiVersionFunctions() {
std::vector<GlobalDecl> MVFuncsToEmit;
MultiVersionFuncs.swap(MVFuncsToEmit);
@@ -4210,9 +4192,7 @@ void CodeGenModule::emitMultiVersionFunctions() {
return cast<llvm::Function>(Func);
};
- bool HasDefaultDecl = !FD->isTargetVersionMultiVersion();
- bool ShouldEmitResolver =
- !getContext().getTargetInfo().getTriple().isAArch64();
+ bool ShouldEmitResolver = !getTarget().getTriple().isAArch64();
SmallVector<CodeGenFunction::MultiVersionResolverOption, 10> Options;
getContext().forEachMultiversionedFunctionVersion(
@@ -4224,10 +4204,8 @@ void CodeGenModule::emitMultiVersionFunctions() {
llvm::Function *Func = createFunction(CurFD);
Options.emplace_back(Func, TA->getArchitecture(), Feats);
} else if (const auto *TVA = CurFD->getAttr<TargetVersionAttr>()) {
- bool HasDefaultDef = TVA->isDefaultVersion() &&
- CurFD->doesThisDeclarationHaveABody();
- HasDefaultDecl |= TVA->isDefaultVersion();
- ShouldEmitResolver |= (CurFD->isUsed() || HasDefaultDef);
+ ShouldEmitResolver |= (TVA->isDefaultVersion() &&
+ CurFD->doesThisDeclarationHaveABody());
TVA->getFeatures(Feats);
llvm::Function *Func = createFunction(CurFD);
Options.emplace_back(Func, /*Architecture*/ "", Feats);
@@ -4258,13 +4236,6 @@ void CodeGenModule::emitMultiVersionFunctions() {
if (!ShouldEmitResolver)
continue;
- if (!HasDefaultDecl) {
- FunctionDecl *NewFD = createDefaultTargetVersionFrom(FD);
- llvm::Function *Func = createFunction(NewFD);
- llvm::SmallVector<StringRef, 1> Feats;
- Options.emplace_back(Func, /*Architecture*/ "", Feats);
- }
-
llvm::Constant *ResolverConstant = GetOrCreateMultiVersionResolver(GD);
if (auto *IFunc = dyn_cast<llvm::GlobalIFunc>(ResolverConstant)) {
ResolverConstant = IFunc->getResolver();
@@ -4315,6 +4286,14 @@ void CodeGenModule::emitMultiVersionFunctions() {
emitMultiVersionFunctions();
}
+static void replaceDeclarationWith(llvm::GlobalValue *Old,
+ llvm::Constant *New) {
+ assert(cast<llvm::Function>(Old)->isDeclaration() && "Not a declaration");
+ New->takeName(Old);
+ Old->replaceAllUsesWith(New);
+ Old->eraseFromParent();
+}
+
void CodeGenModule::emitCPUDispatchDefinition(GlobalDecl GD) {
const auto *FD = cast<FunctionDecl>(GD.getDecl());
assert(FD && "Not a FunctionDecl?");
@@ -4419,12 +4398,9 @@ void CodeGenModule::emitCPUDispatchDefinition(GlobalDecl GD) {
// Fix up function declarations that were created for cpu_specific before
// cpu_dispatch was known
if (!isa<llvm::GlobalIFunc>(IFunc)) {
- assert(cast<llvm::Function>(IFunc)->isDeclaration());
auto *GI = llvm::GlobalIFunc::create(DeclTy, 0, Linkage, "", ResolverFunc,
&getModule());
- GI->takeName(IFunc);
- IFunc->replaceAllUsesWith(GI);
- IFunc->eraseFromParent();
+ replaceDeclarationWith(IFunc, GI);
IFunc = GI;
}
@@ -4454,7 +4430,8 @@ void CodeGenModule::AddDeferredMultiVersionResolverToEmit(GlobalDecl GD) {
}
/// If a dispatcher for the specified mangled name is not in the module, create
-/// and return an llvm Function with the specified type.
+/// and return it. The dispatcher is either an llvm Function with the specified
+/// type, or a global ifunc.
llvm::Constant *CodeGenModule::GetOrCreateMultiVersionResolver(GlobalDecl GD) {
const auto *FD = cast<FunctionDecl>(GD.getDecl());
assert(FD && "Not a FunctionDecl?");
@@ -4482,8 +4459,15 @@ llvm::Constant *CodeGenModule::GetOrCreateMultiVersionResolver(GlobalDecl GD) {
ResolverName += ".resolver";
}
- // If the resolver has already been created, just return it.
- if (llvm::GlobalValue *ResolverGV = GetGlobalValue(ResolverName))
+ // If the resolver has already been created, just return it. This lookup may
+ // yield a function declaration instead of a resolver on AArch64. That is
+ // because we didn't know whether a resolver will be generated when we first
+ // encountered a use of the symbol named after this resolver. Therefore,
+ // targets which support ifuncs should not return here unless we actually
+ // found an ifunc.
+ llvm::GlobalValue *ResolverGV = GetGlobalValue(ResolverName);
+ if (ResolverGV &&
+ (isa<llvm::GlobalIFunc>(ResolverGV) || !getTarget().supportsIFunc()))
return ResolverGV;
const CGFunctionInfo &FI = getTypes().arrangeGlobalDeclaration(GD);
@@ -4509,7 +4493,8 @@ llvm::Constant *CodeGenModule::GetOrCreateMultiVersionResolver(GlobalDecl GD) {
"", Resolver, &getModule());
GIF->setName(ResolverName);
SetCommonAttributes(FD, GIF);
-
+ if (ResolverGV)
+ replaceDeclarationWith(ResolverGV, GIF);
return GIF;
}
@@ -4518,6 +4503,8 @@ llvm::Constant *CodeGenModule::GetOrCreateMultiVersionResolver(GlobalDecl GD) {
assert(isa<llvm::GlobalValue>(Resolver) &&
"Resolver should be created for the first time");
SetCommonAttributes(FD, cast<llvm::GlobalValue>(Resolver));
+ if (ResolverGV)
+ replaceDeclarationWith(ResolverGV, Resolver);
return Resolver;
}
@@ -4547,6 +4534,7 @@ llvm::Constant *CodeGenModule::GetOrCreateLLVMFunction(
ForDefinition_t IsForDefinition) {
const Decl *D = GD.getDecl();
+ std::string NameWithoutMultiVersionMangling;
// Any attempts to use a MultiVersion function should result in retrieving
// the iFunc instead. Name Mangling will handle the rest of the changes.
if (const FunctionDecl *FD = cast_or_null<FunctionDecl>(D)) {
@@ -4568,14 +4556,24 @@ llvm::Constant *CodeGenModule::GetOrCreateLLVMFunction(
if (FD->isMultiVersion()) {
UpdateMultiVersionNames(GD, FD, MangledName);
- if (FD->getASTContext().getTargetInfo().getTriple().isAArch64() &&
- !FD->isUsed())
- AddDeferredMultiVersionResolverToEmit(GD);
- else if (!IsForDefinition)
- return GetOrCreateMultiVersionResolver(GD);
+ if (!IsForDefinition) {
+ // On AArch64 we do not immediatelly emit an ifunc resolver when a
+ // function is used. Instead we defer the emission until we see a
+ // default definition. In the meantime we just reference the symbol
+ // without FMV mangling (it may or may not be replaced later).
+ if (getTarget().getTriple().isAArch64()) {
+ AddDeferredMultiVersionResolverToEmit(GD);
+ NameWithoutMultiVersionMangling = getMangledNameImpl(
+ *this, GD, FD, /*OmitMultiVersionMangling=*/true);
+ } else
+ return GetOrCreateMultiVersionResolver(GD);
+ }
}
}
+ if (!NameWithoutMultiVersionMangling.empty())
+ MangledName = NameWithoutMultiVersionMangling;
+
// Lookup the entry, lazily creating it if necessary.
llvm::GlobalValue *Entry = GetGlobalValue(MangledName);
if (Entry) {
diff --git a/clang/test/CodeGen/aarch64-mixed-target-attributes.c b/clang/test/CodeGen/aarch64-mixed-target-attributes.c
index 3c047fec6ceed..d779abd395b5f 100644
--- a/clang/test/CodeGen/aarch64-mixed-target-attributes.c
+++ b/clang/test/CodeGen/aarch64-mixed-target-attributes.c
@@ -261,9 +261,9 @@ __attribute__((target_version("jscvt"))) int default_def_with_version_decls(void
// CHECK: attributes #[[ATTR3]] = { noinline nounwind optnone "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+lse,-v9.5a" }
// CHECK: attributes #[[ATTR4]] = { noinline nounwind optnone "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+fp-armv8,+neon,+rdm,-v9.5a" }
// CHECK: attributes #[[ATTR5:[0-9]+]] = { "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+dotprod,+fp-armv8,+neon,-v9.5a" }
-// CHECK: attributes #[[ATTR6:[0-9]+]] = { "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+fp-armv8,+jsconv,+neon,-v9.5a" }
-// CHECK: attributes #[[ATTR7:[0-9]+]] = { "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="-v9.5a" }
-// CHECK: attributes #[[ATTR8:[0-9]+]] = { "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+lse,-v9.5a" }
+// CHECK: attributes #[[ATTR6:[0-9]+]] = { "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="-v9.5a" }
+// CHECK: attributes #[[ATTR7:[0-9]+]] = { "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+lse,-v9.5a" }
+// CHECK: attributes #[[ATTR8:[0-9]+]] = { "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+fp-armv8,+jsconv,+neon,-v9.5a" }
//.
// CHECK-NOFMV: attributes #[[ATTR0]] = { noinline nounwind optnone "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="-fmv" }
//.
diff --git a/clang/test/CodeGen/attr-target-clones-aarch64.c b/clang/test/CodeGen/attr-target-clones-aarch64.c
index 60f9c7f1fc24e..846b08298cc72 100644
--- a/clang/test/CodeGen/attr-target-clones-aarch64.c
+++ b/clang/test/CodeGen/attr-target-clones-aarch64.c
@@ -32,8 +32,8 @@ inline int __attribute__((target_clones("fp16", "sve2-bitperm+fcma", "default"))
// CHECK: @ftc_dup1 = weak_odr ifunc i32 (), ptr @ftc_dup1.resolver
// CHECK: @ftc_dup2 = weak_odr ifunc i32 (), ptr @ftc_dup2.resolver
// CHECK: @ftc_dup3 = weak_odr ifunc i32 (), ptr @ftc_dup3.resolver
-// CHECK: @ftc_inline1 = weak_odr ifunc i32 (), ptr @ftc_inline1.resolver
// CHECK: @ftc_inline2 = weak_odr ifunc i32 (), ptr @ftc_inline2.resolver
+// CHECK: @ftc_inline1 = weak_odr ifunc i32 (), ptr @ftc_inline1.resolver
// CHECK: @ftc_inline3 = weak_odr ifunc i32 (), ptr @ftc_inline3.resolver
//.
// CHECK-MTE-BTI: @__aarch64_cpu_features = external dso_local global { i64 }
@@ -42,8 +42,8 @@ inline int __attribute__((target_clones("fp16", "sve2-bitperm+fcma", "default"))
// CHECK-MTE-BTI: @ftc_dup1 = weak_odr ifunc i32 (), ptr @ftc_dup1.resolver
// CHECK-MTE-BTI: @ftc_dup2 = weak_odr ifunc i32 (), ptr @ftc_dup2.resolver
// CHECK-MTE-BTI: @ftc_dup3 = weak_odr ifunc i32 (), ptr @ftc_dup3.resolver
-// CHECK-MTE-BTI: @ftc_inline1 = weak_odr ifunc i32 (), ptr @ftc_inline1.resolver
// CHECK-MTE-BTI: @ftc_inline2 = weak_odr ifunc i32 (), ptr @ftc_inline2.resolver
+// CHECK-MTE-BTI: @ftc_inline1 = weak_odr ifunc i32 (), ptr @ftc_inline1.resolver
// CHECK-MTE-BTI: @ftc_inline3 = weak_odr ifunc i32 (), ptr @ftc_inline3.resolver
//.
// CHECK: Function Attrs: noinline nounwind optnone
@@ -210,12 +210,6 @@ inline int __attribute__((target_clones("fp16", "sve2-bitperm+fcma", "default"))
//
//
// CHECK: Function Attrs: noinline nounwind optnone
-// CHECK-LABEL: @ftc_inline2._Mfp16(
-// CHECK-NEXT: entry:
-// CHECK-NEXT: ret i32 2
-//
-//
-// CHECK: Function Attrs: noinline nounwind optnone
// CHECK-LABEL: @ftc_direct(
// CHECK-NEXT: entry:
// CHECK-NEXT: ret i32 4
@@ -236,86 +230,6 @@ inline int __attribute__((target_clones("fp16", "sve2-bitperm+fcma", "default"))
// CHECK-NEXT: ret i32 [[ADD5]]
//
//
-// CHECK-LABEL: @ftc_inline1.resolver(
-// CHECK-NEXT: resolver_entry:
-// CHECK-NEXT: call void @__init_cpu_features_resolver()
-// CHECK-NEXT: [[TMP0:%.*]] = load i64, ptr @__aarch64_cpu_features, align 8
-// CHECK-NEXT: [[TMP1:%.*]] = and i64 [[TMP0]], 18014535948435456
-// CHECK-NEXT: [[TMP2:%.*]] = icmp eq i64 [[TMP1]], 18014535948435456
-// CHECK-NEXT: [[TMP3:%.*]] = and i1 true, [[TMP2]]
-// CHECK-NEXT: br i1 [[TMP3]], label [[RESOLVER_RETURN:%.*]], label [[RESOLVER_ELSE:%.*]]
-// CHECK: resolver_return:
-// CHECK-NEXT: ret ptr @ftc_inline1._Msve2-aesMwfxt
-// CHECK: resolver_else:
-// CHECK-NEXT: [[TMP4:%.*]] = load i64, ptr @__aarch64_cpu_features, align 8
-// CHECK-NEXT: [[TMP5:%.*]] = and i64 [[TMP4]], 140737492549632
-// CHECK-NEXT: [[TMP6:%.*]] = icmp eq i64 [[TMP5]], 140737492549632
-// CHECK-NEXT: [[TMP7:%.*]] = and i1 true, [[TMP6]]
-// CHECK-NEXT: br i1 [[TMP7]], label [[RESOLVER_RETURN1:%.*]], label [[RESOLVER_ELSE2:%.*]]
-// CHECK: resolver_return1:
-// CHECK-NEXT: ret ptr @ftc_inline1._MpredresMrcpc
-// CHECK: resolver_else2:
-// CHECK-NEXT: [[TMP8:%.*]] = load i64, ptr @__aarch64_cpu_features, align 8
-// CHECK-NEXT: [[TMP9:%.*]] = and i64 [[TMP8]], 513
-// CHECK-NEXT: [[TMP10:%.*]] = icmp eq i64 [[TMP9]], 513
-// CHECK-NEXT: [[TMP11:%.*]] = and i1 true, [[TMP10]]
-// CHECK-NEXT: br i1 [[TMP11]], label [[RESOLVER_RETURN3:%.*]], label [[RESOLVER_ELSE4:%.*]]
-// CHECK: resolver_return3:
-// CHECK-NEXT: ret ptr @ftc_inline1._MrngMsimd
-// CHECK: resolver_else4:
-// CHECK-NEXT: ret ptr @ftc_inline1.default
-//
-//
-// CHECK-LABEL: @ftc_inline2.resolver(
-// CHECK-NEXT: resolver_entry:
-// CHECK-NEXT: call void @__init_cpu_features_resolver()
-// CHECK-NEXT: [[TMP0:%.*]] = load i64, ptr @__aarch64_cpu_features, align 8
-// CHECK-NEXT: [[TMP1:%.*]] = and i64 [[TMP0]], 549757911040
-// CHECK-NEXT: [[TMP2:%.*]] = icmp eq i64 [[TMP1]], 549757911040
-// CHECK-NEXT: [[TMP3:%.*]] = and i1 true, [[TMP2]]
-// CHECK-NEXT: br i1 [[TMP3]], label [[RESOLVER_RETURN:%.*]], label [[RESOLVER_ELSE:%.*]]
-// CHECK: resolver_return:
-// CHECK-NEXT: ret ptr @ftc_inline2._MfcmaMsve2-bitperm
-// CHECK: resolver_else:
-// CHECK-NEXT: [[TMP4:%.*]] = load i64, ptr @__aarch64_cpu_features, align 8
-// CHECK-NEXT: [[TMP5:%.*]] = and i64 [[TMP4]], 65536
-// CHECK-NEXT: [[TMP6:%.*]] = icmp eq i64 [[TMP5]], 65536
-// CHECK-NEXT: [[TMP7:%.*]] = and i1 true, [[TMP6]]
-// CHECK-NEXT: br i1 [[TMP7]], label [[RESOLVER_RETURN1:%.*]], label [[RESOLVER_ELSE2:%.*]]
-// CHECK: resolver_return1:
-// CHECK-NEXT: ret ptr @ftc_inline2._Mfp16
-// CHECK: resolver_else2:
-// CHECK-NEXT: ret ptr @ftc_inline2.default
-//
-//
-// CHECK-LABEL: @ftc_inline3.resolver(
-// CHECK-NEXT: resolver_entry:
-// CHECK-NEXT: call void @__init_cpu_features_resolver()
-// CHECK-NEXT: [[TMP0:%.*]] = load i64, ptr @__aarch64_cpu_features, align 8
-// CHECK-NEXT: [[TMP1:%.*]] = and i64 [[TMP0]], 70369817919488
-// CHECK-NEXT: [[TMP2:%.*]] = icmp eq i64 [[TMP1]], 70369817919488
-// CHECK-NEXT: [[TMP3:%.*]] = and i1 true, [[TMP2]]
-// CHECK-NEXT: br i1 [[TMP3]], label [[RESOLVER_RETURN:%.*]], label [[RESOLVER_ELSE:%.*]]
-// CHECK: resolver_return:
-// CHECK-NEXT: ret ptr @ftc_inline3._MsbMsve
-// CHECK: resolver_else:
-// CHECK-NEXT: [[TMP4:%.*]] = load i64, ptr @__aarch64_cpu_features, align 8
-// CHECK-NEXT: [[TMP5:%.*]] = and i64 [[TMP4]], 1125899906842624
-// CHECK-NEXT: [[TMP6:%.*]] = icmp eq i64 [[TMP5]], 1125899906842624
-// CHECK-NEXT: [[TMP7:%.*]] = and i1 true, [[TMP6]]
-// CHECK-NEXT: br i1 [[TMP7]], label [[RESOLVER_RETURN1:%.*]], label [[RESOLVER_ELSE2:%.*]]
-// CHECK: resolver_return1:
-// CHECK-NEXT: ret ptr @ftc_inline3._Mbti
-// CHECK: resolver_else2:
-// CHECK-NEXT: ret ptr @ftc_inline3.default
-//
-//
-// CHECK: Function Attrs: noinline nounwind optnone
-// CHECK-LABEL: @ftc_inline2._MfcmaMsve2-bitperm(
-// CHECK-NEXT: entry:
-// CHECK-NEXT: ret i32 2
-//
-//
// CHECK: Function Attrs: noinline nounwind optnone
// CHECK-LABEL: @ftc.default(
// CHECK-NEXT: entry:
@@ -347,11 +261,45 @@ inline int __attribute__((target_clones("fp16", "sve2-bitperm+fcma", "default"))
//
//
// CHECK: Function Attrs: noinline nounwind optnone
+// CHECK-LABEL: @ftc_inline2._Mfp16(
+// CHECK-NEXT: entry:
+// CHECK-NEXT: ret i32 2
+//
+//
+// CHECK: Function Attrs: noinline nounwind optnone
+// CHECK-LABEL: @ftc_inline2._MfcmaMsve2-bitperm(
+// CHECK-NEXT: entry:
+// CHECK-NEXT: ret i32 2
+//
+//
+// CHECK: Function Attrs: noinline nounwind optnone
// CHECK-LABEL: @ftc_inline2.default(
// CHECK-NEXT: entry:
// CHECK-NEXT: ret i32 2
//
//
+// CHECK-LABEL: @ftc_inline2.resolver(
+// CHECK-NEXT: resolver_entry:
+// CHECK-NEXT: call void @__init_cpu_features_resolver()
+// CHECK-NEXT: [[TMP0:%.*]] = load i64, ptr @__aarch64_cpu_features, align 8
+// CHECK-NEXT: [[TMP1:%.*]] = and i64 [[TMP0]], 549757911040
+// CHECK-NEXT: [[TMP2:%.*]] = icmp eq i64 [[TMP1]], 549757911040
+// CHECK-NEXT: [[TMP3:%.*]] = and i1 true, [[TMP2]]
+// CHECK-NEXT: br i1 [[TMP3]], label [[RESOLVER_RETURN:%.*]], label [[RESOLVER_ELSE:%.*]]
+// CHECK: resolver_return:
+// CHECK-NEXT: ret ptr @ftc_inline2._MfcmaMsve2-bitperm
+// CHECK: resolver_else:
+// CHECK-NEXT: [[TMP4:%.*]] = load i64, ptr @__aarch64_cpu_features, align 8
+// CHECK-NEXT: [[TMP5:%.*]] = and i64 [[TMP4]], 65536
+// CHECK-NEXT: [[TMP6:%.*]] = icmp eq i64 [[TMP5]], 65536
+// CHECK-NEXT: [[TMP7:%.*]] = and i1 true, [[TMP6]]
+// CHECK-NEXT: br i1 [[TMP7]], label [[RESOLVER_RETURN1:%.*]], label [[RESOLVER_ELSE2:%.*]]
+// CHECK: resolver_return1:
+// CHECK-NEXT: ret ptr @ftc_inline2._Mfp16
+// CHECK: resolver_else2:
+// CHECK-NEXT: ret ptr @ftc_inline2.default
+//
+//
// CHECK: Function Attrs: noinline nounwind optnone
// CHECK-LABEL: @ftc_inline1._MrngMsimd(
// CHECK-NEXT: entry:
@@ -376,6 +324,36 @@ inline int __attribute__((target_clones("fp16", "sve2-bitperm+fcma", "default"))
// CHECK-NEXT: ret i32 1
//
//
+// CHECK-LABEL: @ftc_inline1.resolver(
+// CHECK-NEXT: resolver_entry:
+// CHECK-NEXT: call void @__init_cpu_features_resolver()
+// CHECK-NEXT: [[TMP0:%.*]] = load i64, ptr @__aarch64_cpu_features, align 8
+// CHECK-NEXT: [[TMP1:%.*]] = and i64 [[TMP0]], 18014535948435456
+// CHECK-NEXT: [[TMP2:%.*]] = icmp eq i64 [[TMP1]], 18014535948435456
+// CHECK-NEXT: [[TMP3:%.*]] = and i1 true, [[TMP2]]
+// CHECK-NEXT: br i1 [[TMP3]], label [[RESOLVER_RETURN:%.*]], label [[RESOLVER_ELSE:%.*]]
+// CHECK: resolver_return:
+// CHECK-NEXT: ret ptr @ftc_inline1._Msve2-aesMwfxt
+// CHECK: resolver_else:
+// CHECK-NEXT: [[TMP4:%.*]] = load i64, ptr @__aarch64_cpu_features, align 8
+// CHECK-NEXT: [[TMP5:%.*]] = and i64 [[TMP4]], 140737492549632
+// CHECK-NEXT: [[TMP6:%.*]] = icmp eq i64 [[TMP5]], 140737492549632
+// CHECK-NEXT: [[TMP7:%.*]] = and i1 true, [[TMP6]]
+// CHECK-NEXT: br i1 [[TMP7]], label [[RESOLVER_RETURN1:%.*]], label [[RESOLVER_ELSE2:%.*]]
+// CHECK: resolver_return1:
+// CHECK-NEXT: ret ptr @ftc_inline1._MpredresMrcpc
+// CHECK: resolver_else2:
+// CHECK-NEXT: [[TMP8:%.*]] = load i64, ptr @__aarch64_cpu_features, align 8
+// CHECK-NEXT: [[TMP9:%.*]] = and i64 [[TMP8]], 513
+// CHECK-NEXT: [[TMP10:%.*]] = icmp eq i64 [[TMP9]], 513
+// CHECK-NEXT: [[TMP11:%.*]] = and i1 true, [[TMP10]]
+// CHECK-NEXT: br i1 [[TMP11]], label [[RESOLVER_RETURN3:%.*]], label [[RESOLVER_ELSE4:%.*]]
+// CHECK: ...
[truncated]
|
@llvm/pr-subscribers-clang Author: Alexandros Lamprineas (labrinea) ChangesIt was raised in #81494 that we are not generating correct code when there is no TU-local caller. The suggestion was to emit a resolver:
See the comment for more details: This got addressed with #84405. Generating a resolver on use means that we may end up with multiple resolvers across different translation units. Those resolvers may not be the same because each translation unit may contain different version declarations (user's fault). Therefore the order of linking the final image determines which of these weak symbols gets selected, resulting in non consisted behavior. I am proposing to stop emitting a resolver on use and only do so in the translation unit which contains the default defition. This way we guarantee the existance of a single resolver. Now, when a versioned function is used we want to emit a declaration of the function symbol omitting the multiversion mangling. I have added a requirement to ACLE mandating that all the function versions are declared in the translation unit which contains the default definition: ARM-software/acle#328 Patch is 103.86 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/97761.diff 7 Files Affected:
diff --git a/clang/lib/CodeGen/CodeGenModule.cpp b/clang/lib/CodeGen/CodeGenModule.cpp
index 99e986d371cac..ede1b641584f3 100644
--- a/clang/lib/CodeGen/CodeGenModule.cpp
+++ b/clang/lib/CodeGen/CodeGenModule.cpp
@@ -3772,8 +3772,7 @@ void CodeGenModule::EmitGlobal(GlobalDecl GD) {
// Forward declarations are emitted lazily on first use.
if (!FD->doesThisDeclarationHaveABody()) {
if (!FD->doesDeclarationForceExternallyVisibleDefinition() &&
- (!FD->isMultiVersion() ||
- !FD->getASTContext().getTargetInfo().getTriple().isAArch64()))
+ (!FD->isMultiVersion() || !getTarget().getTriple().isAArch64()))
return;
StringRef MangledName = getMangledName(GD);
@@ -4167,23 +4166,6 @@ llvm::GlobalValue::LinkageTypes getMultiversionLinkage(CodeGenModule &CGM,
return llvm::GlobalValue::WeakODRLinkage;
}
-static FunctionDecl *createDefaultTargetVersionFrom(const FunctionDecl *FD) {
- auto *DeclCtx = const_cast<DeclContext *>(FD->getDeclContext());
- TypeSourceInfo *TInfo = FD->getTypeSourceInfo();
- StorageClass SC = FD->getStorageClass();
- DeclarationName Name = FD->getNameInfo().getName();
-
- FunctionDecl *NewDecl =
- FunctionDecl::Create(FD->getASTContext(), DeclCtx, FD->getBeginLoc(),
- FD->getEndLoc(), Name, TInfo->getType(), TInfo, SC);
-
- NewDecl->setIsMultiVersion();
- NewDecl->addAttr(TargetVersionAttr::CreateImplicit(
- NewDecl->getASTContext(), "default", NewDecl->getSourceRange()));
-
- return NewDecl;
-}
-
void CodeGenModule::emitMultiVersionFunctions() {
std::vector<GlobalDecl> MVFuncsToEmit;
MultiVersionFuncs.swap(MVFuncsToEmit);
@@ -4210,9 +4192,7 @@ void CodeGenModule::emitMultiVersionFunctions() {
return cast<llvm::Function>(Func);
};
- bool HasDefaultDecl = !FD->isTargetVersionMultiVersion();
- bool ShouldEmitResolver =
- !getContext().getTargetInfo().getTriple().isAArch64();
+ bool ShouldEmitResolver = !getTarget().getTriple().isAArch64();
SmallVector<CodeGenFunction::MultiVersionResolverOption, 10> Options;
getContext().forEachMultiversionedFunctionVersion(
@@ -4224,10 +4204,8 @@ void CodeGenModule::emitMultiVersionFunctions() {
llvm::Function *Func = createFunction(CurFD);
Options.emplace_back(Func, TA->getArchitecture(), Feats);
} else if (const auto *TVA = CurFD->getAttr<TargetVersionAttr>()) {
- bool HasDefaultDef = TVA->isDefaultVersion() &&
- CurFD->doesThisDeclarationHaveABody();
- HasDefaultDecl |= TVA->isDefaultVersion();
- ShouldEmitResolver |= (CurFD->isUsed() || HasDefaultDef);
+ ShouldEmitResolver |= (TVA->isDefaultVersion() &&
+ CurFD->doesThisDeclarationHaveABody());
TVA->getFeatures(Feats);
llvm::Function *Func = createFunction(CurFD);
Options.emplace_back(Func, /*Architecture*/ "", Feats);
@@ -4258,13 +4236,6 @@ void CodeGenModule::emitMultiVersionFunctions() {
if (!ShouldEmitResolver)
continue;
- if (!HasDefaultDecl) {
- FunctionDecl *NewFD = createDefaultTargetVersionFrom(FD);
- llvm::Function *Func = createFunction(NewFD);
- llvm::SmallVector<StringRef, 1> Feats;
- Options.emplace_back(Func, /*Architecture*/ "", Feats);
- }
-
llvm::Constant *ResolverConstant = GetOrCreateMultiVersionResolver(GD);
if (auto *IFunc = dyn_cast<llvm::GlobalIFunc>(ResolverConstant)) {
ResolverConstant = IFunc->getResolver();
@@ -4315,6 +4286,14 @@ void CodeGenModule::emitMultiVersionFunctions() {
emitMultiVersionFunctions();
}
+static void replaceDeclarationWith(llvm::GlobalValue *Old,
+ llvm::Constant *New) {
+ assert(cast<llvm::Function>(Old)->isDeclaration() && "Not a declaration");
+ New->takeName(Old);
+ Old->replaceAllUsesWith(New);
+ Old->eraseFromParent();
+}
+
void CodeGenModule::emitCPUDispatchDefinition(GlobalDecl GD) {
const auto *FD = cast<FunctionDecl>(GD.getDecl());
assert(FD && "Not a FunctionDecl?");
@@ -4419,12 +4398,9 @@ void CodeGenModule::emitCPUDispatchDefinition(GlobalDecl GD) {
// Fix up function declarations that were created for cpu_specific before
// cpu_dispatch was known
if (!isa<llvm::GlobalIFunc>(IFunc)) {
- assert(cast<llvm::Function>(IFunc)->isDeclaration());
auto *GI = llvm::GlobalIFunc::create(DeclTy, 0, Linkage, "", ResolverFunc,
&getModule());
- GI->takeName(IFunc);
- IFunc->replaceAllUsesWith(GI);
- IFunc->eraseFromParent();
+ replaceDeclarationWith(IFunc, GI);
IFunc = GI;
}
@@ -4454,7 +4430,8 @@ void CodeGenModule::AddDeferredMultiVersionResolverToEmit(GlobalDecl GD) {
}
/// If a dispatcher for the specified mangled name is not in the module, create
-/// and return an llvm Function with the specified type.
+/// and return it. The dispatcher is either an llvm Function with the specified
+/// type, or a global ifunc.
llvm::Constant *CodeGenModule::GetOrCreateMultiVersionResolver(GlobalDecl GD) {
const auto *FD = cast<FunctionDecl>(GD.getDecl());
assert(FD && "Not a FunctionDecl?");
@@ -4482,8 +4459,15 @@ llvm::Constant *CodeGenModule::GetOrCreateMultiVersionResolver(GlobalDecl GD) {
ResolverName += ".resolver";
}
- // If the resolver has already been created, just return it.
- if (llvm::GlobalValue *ResolverGV = GetGlobalValue(ResolverName))
+ // If the resolver has already been created, just return it. This lookup may
+ // yield a function declaration instead of a resolver on AArch64. That is
+ // because we didn't know whether a resolver will be generated when we first
+ // encountered a use of the symbol named after this resolver. Therefore,
+ // targets which support ifuncs should not return here unless we actually
+ // found an ifunc.
+ llvm::GlobalValue *ResolverGV = GetGlobalValue(ResolverName);
+ if (ResolverGV &&
+ (isa<llvm::GlobalIFunc>(ResolverGV) || !getTarget().supportsIFunc()))
return ResolverGV;
const CGFunctionInfo &FI = getTypes().arrangeGlobalDeclaration(GD);
@@ -4509,7 +4493,8 @@ llvm::Constant *CodeGenModule::GetOrCreateMultiVersionResolver(GlobalDecl GD) {
"", Resolver, &getModule());
GIF->setName(ResolverName);
SetCommonAttributes(FD, GIF);
-
+ if (ResolverGV)
+ replaceDeclarationWith(ResolverGV, GIF);
return GIF;
}
@@ -4518,6 +4503,8 @@ llvm::Constant *CodeGenModule::GetOrCreateMultiVersionResolver(GlobalDecl GD) {
assert(isa<llvm::GlobalValue>(Resolver) &&
"Resolver should be created for the first time");
SetCommonAttributes(FD, cast<llvm::GlobalValue>(Resolver));
+ if (ResolverGV)
+ replaceDeclarationWith(ResolverGV, Resolver);
return Resolver;
}
@@ -4547,6 +4534,7 @@ llvm::Constant *CodeGenModule::GetOrCreateLLVMFunction(
ForDefinition_t IsForDefinition) {
const Decl *D = GD.getDecl();
+ std::string NameWithoutMultiVersionMangling;
// Any attempts to use a MultiVersion function should result in retrieving
// the iFunc instead. Name Mangling will handle the rest of the changes.
if (const FunctionDecl *FD = cast_or_null<FunctionDecl>(D)) {
@@ -4568,14 +4556,24 @@ llvm::Constant *CodeGenModule::GetOrCreateLLVMFunction(
if (FD->isMultiVersion()) {
UpdateMultiVersionNames(GD, FD, MangledName);
- if (FD->getASTContext().getTargetInfo().getTriple().isAArch64() &&
- !FD->isUsed())
- AddDeferredMultiVersionResolverToEmit(GD);
- else if (!IsForDefinition)
- return GetOrCreateMultiVersionResolver(GD);
+ if (!IsForDefinition) {
+ // On AArch64 we do not immediatelly emit an ifunc resolver when a
+ // function is used. Instead we defer the emission until we see a
+ // default definition. In the meantime we just reference the symbol
+ // without FMV mangling (it may or may not be replaced later).
+ if (getTarget().getTriple().isAArch64()) {
+ AddDeferredMultiVersionResolverToEmit(GD);
+ NameWithoutMultiVersionMangling = getMangledNameImpl(
+ *this, GD, FD, /*OmitMultiVersionMangling=*/true);
+ } else
+ return GetOrCreateMultiVersionResolver(GD);
+ }
}
}
+ if (!NameWithoutMultiVersionMangling.empty())
+ MangledName = NameWithoutMultiVersionMangling;
+
// Lookup the entry, lazily creating it if necessary.
llvm::GlobalValue *Entry = GetGlobalValue(MangledName);
if (Entry) {
diff --git a/clang/test/CodeGen/aarch64-mixed-target-attributes.c b/clang/test/CodeGen/aarch64-mixed-target-attributes.c
index 3c047fec6ceed..d779abd395b5f 100644
--- a/clang/test/CodeGen/aarch64-mixed-target-attributes.c
+++ b/clang/test/CodeGen/aarch64-mixed-target-attributes.c
@@ -261,9 +261,9 @@ __attribute__((target_version("jscvt"))) int default_def_with_version_decls(void
// CHECK: attributes #[[ATTR3]] = { noinline nounwind optnone "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+lse,-v9.5a" }
// CHECK: attributes #[[ATTR4]] = { noinline nounwind optnone "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+fp-armv8,+neon,+rdm,-v9.5a" }
// CHECK: attributes #[[ATTR5:[0-9]+]] = { "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+dotprod,+fp-armv8,+neon,-v9.5a" }
-// CHECK: attributes #[[ATTR6:[0-9]+]] = { "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+fp-armv8,+jsconv,+neon,-v9.5a" }
-// CHECK: attributes #[[ATTR7:[0-9]+]] = { "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="-v9.5a" }
-// CHECK: attributes #[[ATTR8:[0-9]+]] = { "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+lse,-v9.5a" }
+// CHECK: attributes #[[ATTR6:[0-9]+]] = { "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="-v9.5a" }
+// CHECK: attributes #[[ATTR7:[0-9]+]] = { "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+lse,-v9.5a" }
+// CHECK: attributes #[[ATTR8:[0-9]+]] = { "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+fp-armv8,+jsconv,+neon,-v9.5a" }
//.
// CHECK-NOFMV: attributes #[[ATTR0]] = { noinline nounwind optnone "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="-fmv" }
//.
diff --git a/clang/test/CodeGen/attr-target-clones-aarch64.c b/clang/test/CodeGen/attr-target-clones-aarch64.c
index 60f9c7f1fc24e..846b08298cc72 100644
--- a/clang/test/CodeGen/attr-target-clones-aarch64.c
+++ b/clang/test/CodeGen/attr-target-clones-aarch64.c
@@ -32,8 +32,8 @@ inline int __attribute__((target_clones("fp16", "sve2-bitperm+fcma", "default"))
// CHECK: @ftc_dup1 = weak_odr ifunc i32 (), ptr @ftc_dup1.resolver
// CHECK: @ftc_dup2 = weak_odr ifunc i32 (), ptr @ftc_dup2.resolver
// CHECK: @ftc_dup3 = weak_odr ifunc i32 (), ptr @ftc_dup3.resolver
-// CHECK: @ftc_inline1 = weak_odr ifunc i32 (), ptr @ftc_inline1.resolver
// CHECK: @ftc_inline2 = weak_odr ifunc i32 (), ptr @ftc_inline2.resolver
+// CHECK: @ftc_inline1 = weak_odr ifunc i32 (), ptr @ftc_inline1.resolver
// CHECK: @ftc_inline3 = weak_odr ifunc i32 (), ptr @ftc_inline3.resolver
//.
// CHECK-MTE-BTI: @__aarch64_cpu_features = external dso_local global { i64 }
@@ -42,8 +42,8 @@ inline int __attribute__((target_clones("fp16", "sve2-bitperm+fcma", "default"))
// CHECK-MTE-BTI: @ftc_dup1 = weak_odr ifunc i32 (), ptr @ftc_dup1.resolver
// CHECK-MTE-BTI: @ftc_dup2 = weak_odr ifunc i32 (), ptr @ftc_dup2.resolver
// CHECK-MTE-BTI: @ftc_dup3 = weak_odr ifunc i32 (), ptr @ftc_dup3.resolver
-// CHECK-MTE-BTI: @ftc_inline1 = weak_odr ifunc i32 (), ptr @ftc_inline1.resolver
// CHECK-MTE-BTI: @ftc_inline2 = weak_odr ifunc i32 (), ptr @ftc_inline2.resolver
+// CHECK-MTE-BTI: @ftc_inline1 = weak_odr ifunc i32 (), ptr @ftc_inline1.resolver
// CHECK-MTE-BTI: @ftc_inline3 = weak_odr ifunc i32 (), ptr @ftc_inline3.resolver
//.
// CHECK: Function Attrs: noinline nounwind optnone
@@ -210,12 +210,6 @@ inline int __attribute__((target_clones("fp16", "sve2-bitperm+fcma", "default"))
//
//
// CHECK: Function Attrs: noinline nounwind optnone
-// CHECK-LABEL: @ftc_inline2._Mfp16(
-// CHECK-NEXT: entry:
-// CHECK-NEXT: ret i32 2
-//
-//
-// CHECK: Function Attrs: noinline nounwind optnone
// CHECK-LABEL: @ftc_direct(
// CHECK-NEXT: entry:
// CHECK-NEXT: ret i32 4
@@ -236,86 +230,6 @@ inline int __attribute__((target_clones("fp16", "sve2-bitperm+fcma", "default"))
// CHECK-NEXT: ret i32 [[ADD5]]
//
//
-// CHECK-LABEL: @ftc_inline1.resolver(
-// CHECK-NEXT: resolver_entry:
-// CHECK-NEXT: call void @__init_cpu_features_resolver()
-// CHECK-NEXT: [[TMP0:%.*]] = load i64, ptr @__aarch64_cpu_features, align 8
-// CHECK-NEXT: [[TMP1:%.*]] = and i64 [[TMP0]], 18014535948435456
-// CHECK-NEXT: [[TMP2:%.*]] = icmp eq i64 [[TMP1]], 18014535948435456
-// CHECK-NEXT: [[TMP3:%.*]] = and i1 true, [[TMP2]]
-// CHECK-NEXT: br i1 [[TMP3]], label [[RESOLVER_RETURN:%.*]], label [[RESOLVER_ELSE:%.*]]
-// CHECK: resolver_return:
-// CHECK-NEXT: ret ptr @ftc_inline1._Msve2-aesMwfxt
-// CHECK: resolver_else:
-// CHECK-NEXT: [[TMP4:%.*]] = load i64, ptr @__aarch64_cpu_features, align 8
-// CHECK-NEXT: [[TMP5:%.*]] = and i64 [[TMP4]], 140737492549632
-// CHECK-NEXT: [[TMP6:%.*]] = icmp eq i64 [[TMP5]], 140737492549632
-// CHECK-NEXT: [[TMP7:%.*]] = and i1 true, [[TMP6]]
-// CHECK-NEXT: br i1 [[TMP7]], label [[RESOLVER_RETURN1:%.*]], label [[RESOLVER_ELSE2:%.*]]
-// CHECK: resolver_return1:
-// CHECK-NEXT: ret ptr @ftc_inline1._MpredresMrcpc
-// CHECK: resolver_else2:
-// CHECK-NEXT: [[TMP8:%.*]] = load i64, ptr @__aarch64_cpu_features, align 8
-// CHECK-NEXT: [[TMP9:%.*]] = and i64 [[TMP8]], 513
-// CHECK-NEXT: [[TMP10:%.*]] = icmp eq i64 [[TMP9]], 513
-// CHECK-NEXT: [[TMP11:%.*]] = and i1 true, [[TMP10]]
-// CHECK-NEXT: br i1 [[TMP11]], label [[RESOLVER_RETURN3:%.*]], label [[RESOLVER_ELSE4:%.*]]
-// CHECK: resolver_return3:
-// CHECK-NEXT: ret ptr @ftc_inline1._MrngMsimd
-// CHECK: resolver_else4:
-// CHECK-NEXT: ret ptr @ftc_inline1.default
-//
-//
-// CHECK-LABEL: @ftc_inline2.resolver(
-// CHECK-NEXT: resolver_entry:
-// CHECK-NEXT: call void @__init_cpu_features_resolver()
-// CHECK-NEXT: [[TMP0:%.*]] = load i64, ptr @__aarch64_cpu_features, align 8
-// CHECK-NEXT: [[TMP1:%.*]] = and i64 [[TMP0]], 549757911040
-// CHECK-NEXT: [[TMP2:%.*]] = icmp eq i64 [[TMP1]], 549757911040
-// CHECK-NEXT: [[TMP3:%.*]] = and i1 true, [[TMP2]]
-// CHECK-NEXT: br i1 [[TMP3]], label [[RESOLVER_RETURN:%.*]], label [[RESOLVER_ELSE:%.*]]
-// CHECK: resolver_return:
-// CHECK-NEXT: ret ptr @ftc_inline2._MfcmaMsve2-bitperm
-// CHECK: resolver_else:
-// CHECK-NEXT: [[TMP4:%.*]] = load i64, ptr @__aarch64_cpu_features, align 8
-// CHECK-NEXT: [[TMP5:%.*]] = and i64 [[TMP4]], 65536
-// CHECK-NEXT: [[TMP6:%.*]] = icmp eq i64 [[TMP5]], 65536
-// CHECK-NEXT: [[TMP7:%.*]] = and i1 true, [[TMP6]]
-// CHECK-NEXT: br i1 [[TMP7]], label [[RESOLVER_RETURN1:%.*]], label [[RESOLVER_ELSE2:%.*]]
-// CHECK: resolver_return1:
-// CHECK-NEXT: ret ptr @ftc_inline2._Mfp16
-// CHECK: resolver_else2:
-// CHECK-NEXT: ret ptr @ftc_inline2.default
-//
-//
-// CHECK-LABEL: @ftc_inline3.resolver(
-// CHECK-NEXT: resolver_entry:
-// CHECK-NEXT: call void @__init_cpu_features_resolver()
-// CHECK-NEXT: [[TMP0:%.*]] = load i64, ptr @__aarch64_cpu_features, align 8
-// CHECK-NEXT: [[TMP1:%.*]] = and i64 [[TMP0]], 70369817919488
-// CHECK-NEXT: [[TMP2:%.*]] = icmp eq i64 [[TMP1]], 70369817919488
-// CHECK-NEXT: [[TMP3:%.*]] = and i1 true, [[TMP2]]
-// CHECK-NEXT: br i1 [[TMP3]], label [[RESOLVER_RETURN:%.*]], label [[RESOLVER_ELSE:%.*]]
-// CHECK: resolver_return:
-// CHECK-NEXT: ret ptr @ftc_inline3._MsbMsve
-// CHECK: resolver_else:
-// CHECK-NEXT: [[TMP4:%.*]] = load i64, ptr @__aarch64_cpu_features, align 8
-// CHECK-NEXT: [[TMP5:%.*]] = and i64 [[TMP4]], 1125899906842624
-// CHECK-NEXT: [[TMP6:%.*]] = icmp eq i64 [[TMP5]], 1125899906842624
-// CHECK-NEXT: [[TMP7:%.*]] = and i1 true, [[TMP6]]
-// CHECK-NEXT: br i1 [[TMP7]], label [[RESOLVER_RETURN1:%.*]], label [[RESOLVER_ELSE2:%.*]]
-// CHECK: resolver_return1:
-// CHECK-NEXT: ret ptr @ftc_inline3._Mbti
-// CHECK: resolver_else2:
-// CHECK-NEXT: ret ptr @ftc_inline3.default
-//
-//
-// CHECK: Function Attrs: noinline nounwind optnone
-// CHECK-LABEL: @ftc_inline2._MfcmaMsve2-bitperm(
-// CHECK-NEXT: entry:
-// CHECK-NEXT: ret i32 2
-//
-//
// CHECK: Function Attrs: noinline nounwind optnone
// CHECK-LABEL: @ftc.default(
// CHECK-NEXT: entry:
@@ -347,11 +261,45 @@ inline int __attribute__((target_clones("fp16", "sve2-bitperm+fcma", "default"))
//
//
// CHECK: Function Attrs: noinline nounwind optnone
+// CHECK-LABEL: @ftc_inline2._Mfp16(
+// CHECK-NEXT: entry:
+// CHECK-NEXT: ret i32 2
+//
+//
+// CHECK: Function Attrs: noinline nounwind optnone
+// CHECK-LABEL: @ftc_inline2._MfcmaMsve2-bitperm(
+// CHECK-NEXT: entry:
+// CHECK-NEXT: ret i32 2
+//
+//
+// CHECK: Function Attrs: noinline nounwind optnone
// CHECK-LABEL: @ftc_inline2.default(
// CHECK-NEXT: entry:
// CHECK-NEXT: ret i32 2
//
//
+// CHECK-LABEL: @ftc_inline2.resolver(
+// CHECK-NEXT: resolver_entry:
+// CHECK-NEXT: call void @__init_cpu_features_resolver()
+// CHECK-NEXT: [[TMP0:%.*]] = load i64, ptr @__aarch64_cpu_features, align 8
+// CHECK-NEXT: [[TMP1:%.*]] = and i64 [[TMP0]], 549757911040
+// CHECK-NEXT: [[TMP2:%.*]] = icmp eq i64 [[TMP1]], 549757911040
+// CHECK-NEXT: [[TMP3:%.*]] = and i1 true, [[TMP2]]
+// CHECK-NEXT: br i1 [[TMP3]], label [[RESOLVER_RETURN:%.*]], label [[RESOLVER_ELSE:%.*]]
+// CHECK: resolver_return:
+// CHECK-NEXT: ret ptr @ftc_inline2._MfcmaMsve2-bitperm
+// CHECK: resolver_else:
+// CHECK-NEXT: [[TMP4:%.*]] = load i64, ptr @__aarch64_cpu_features, align 8
+// CHECK-NEXT: [[TMP5:%.*]] = and i64 [[TMP4]], 65536
+// CHECK-NEXT: [[TMP6:%.*]] = icmp eq i64 [[TMP5]], 65536
+// CHECK-NEXT: [[TMP7:%.*]] = and i1 true, [[TMP6]]
+// CHECK-NEXT: br i1 [[TMP7]], label [[RESOLVER_RETURN1:%.*]], label [[RESOLVER_ELSE2:%.*]]
+// CHECK: resolver_return1:
+// CHECK-NEXT: ret ptr @ftc_inline2._Mfp16
+// CHECK: resolver_else2:
+// CHECK-NEXT: ret ptr @ftc_inline2.default
+//
+//
// CHECK: Function Attrs: noinline nounwind optnone
// CHECK-LABEL: @ftc_inline1._MrngMsimd(
// CHECK-NEXT: entry:
@@ -376,6 +324,36 @@ inline int __attribute__((target_clones("fp16", "sve2-bitperm+fcma", "default"))
// CHECK-NEXT: ret i32 1
//
//
+// CHECK-LABEL: @ftc_inline1.resolver(
+// CHECK-NEXT: resolver_entry:
+// CHECK-NEXT: call void @__init_cpu_features_resolver()
+// CHECK-NEXT: [[TMP0:%.*]] = load i64, ptr @__aarch64_cpu_features, align 8
+// CHECK-NEXT: [[TMP1:%.*]] = and i64 [[TMP0]], 18014535948435456
+// CHECK-NEXT: [[TMP2:%.*]] = icmp eq i64 [[TMP1]], 18014535948435456
+// CHECK-NEXT: [[TMP3:%.*]] = and i1 true, [[TMP2]]
+// CHECK-NEXT: br i1 [[TMP3]], label [[RESOLVER_RETURN:%.*]], label [[RESOLVER_ELSE:%.*]]
+// CHECK: resolver_return:
+// CHECK-NEXT: ret ptr @ftc_inline1._Msve2-aesMwfxt
+// CHECK: resolver_else:
+// CHECK-NEXT: [[TMP4:%.*]] = load i64, ptr @__aarch64_cpu_features, align 8
+// CHECK-NEXT: [[TMP5:%.*]] = and i64 [[TMP4]], 140737492549632
+// CHECK-NEXT: [[TMP6:%.*]] = icmp eq i64 [[TMP5]], 140737492549632
+// CHECK-NEXT: [[TMP7:%.*]] = and i1 true, [[TMP6]]
+// CHECK-NEXT: br i1 [[TMP7]], label [[RESOLVER_RETURN1:%.*]], label [[RESOLVER_ELSE2:%.*]]
+// CHECK: resolver_return1:
+// CHECK-NEXT: ret ptr @ftc_inline1._MpredresMrcpc
+// CHECK: resolver_else2:
+// CHECK-NEXT: [[TMP8:%.*]] = load i64, ptr @__aarch64_cpu_features, align 8
+// CHECK-NEXT: [[TMP9:%.*]] = and i64 [[TMP8]], 513
+// CHECK-NEXT: [[TMP10:%.*]] = icmp eq i64 [[TMP9]], 513
+// CHECK-NEXT: [[TMP11:%.*]] = and i1 true, [[TMP10]]
+// CHECK-NEXT: br i1 [[TMP11]], label [[RESOLVER_RETURN3:%.*]], label [[RESOLVER_ELSE4:%.*]]
+// CHECK: ...
[truncated]
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's really hard to tell what is changing here because the existing tests are so non-specific.
// CHECK: attributes #[[ATTR6:[0-9]+]] = { "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="-v9.5a" } | ||
// CHECK: attributes #[[ATTR7:[0-9]+]] = { "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+lse,-v9.5a" } | ||
// CHECK: attributes #[[ATTR8:[0-9]+]] = { "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+fp-armv8,+jsconv,+neon,-v9.5a" } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These attributes are not actually matched against anything, is there any point testing them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the intention is to check that we are correctly enabling the FMV implied features. If those are fully covered in clang/test/CodeGen/aarch64-fmv-dependencies.c
, which was added later, then I guess we can remove them from every other codegen test file. But this should be done in a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By the way do you know how to disable the attribute checks?
* added comments for resolver emission * rewrote the ShouldEmitResolver expression as suggested * added a definition to fmv_one and fmv_two in clang/test/CodeGen/attr-target-version.c * moved new test from clang/test/CodeGenCXX/attr-target-version.cpp to a separate file * regenerated the affected tests
I appreciate that. The tests would benefit from some tidying up, we should prioritize this at some point. In case it helps most of the test changes are due to symbols being generated in different order. |
* reworded a comment and added a negative check in the new tests
Summary: It was raised in #81494 that we are not generating correct code when there is no TU-local caller. The suggestion was to emit a resolver: * Whenever there is a use in the TU. * When the TU has a definition of the default version. See the comment for more details: #81494 (comment) This got addressed with #84405. Generating a resolver on use means that we may end up with multiple resolvers across different translation units. Those resolvers may not be the same because each translation unit may contain different version declarations (user's fault). Therefore the order of linking the final image determines which of these weak symbols gets selected, resulting in non consisted behavior. I am proposing to stop emitting a resolver on use and only do so in the translation unit which contains the default definition. This way we guarantee the existence of a single resolver. Now, when a versioned function is used we want to emit a declaration of the function symbol omitting the multiversion mangling. I have added a requirement to ACLE mandating that all the function versions are declared in the translation unit which contains the default definition: ARM-software/acle#328 Test Plan: Reviewers: Subscribers: Tasks: Tags: Differential Revision: https://phabricator.intern.facebook.com/D60250878
It was raised in #81494 that we are not generating correct code when there is no TU-local caller.
The suggestion was to emit a resolver:
See the comment for more details:
#81494 (comment)
This got addressed with #84405.
Generating a resolver on use means that we may end up with multiple resolvers across different translation units. Those resolvers may not be the same because each translation unit may contain different version declarations (user's fault). Therefore the order of linking the final image determines which of these weak symbols gets selected, resulting in non consisted behavior. I am proposing to stop emitting a resolver on use and only do so in the translation unit which contains the default definition. This way we guarantee the existence of a single resolver. Now, when a versioned function is used we want to emit a declaration of the function symbol omitting the multiversion mangling.
I have added a requirement to ACLE mandating that all the function versions are declared in the translation unit which contains the default definition: ARM-software/acle#328