Skip to content

[FMV] Allow multi versioning without default declaration. #85454

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Mar 25, 2024

Conversation

labrinea
Copy link
Collaborator

@labrinea labrinea commented Mar 15, 2024

This was a limitation which has now been lifted. Please read the
thread below for more details:

#84405 (comment)

Basically it allows to separate versioned implementations across
different TUs without having to share private header files which
contain the default declaration.

The ACLE spec has been updated accordingly to make this explicit:
"Each version declaration should be visible at the translation
unit in which the corresponding function version resides."

ARM-software/acle#310

If a resolver is required (because there is a caller in the TU),
then a default declaration is implicitly generated.

@llvmbot llvmbot added clang Clang issues not falling into any other category clang:frontend Language frontend issues, e.g. anything involving "Sema" clang:codegen IR generation bugs: mangling, exceptions, etc. labels Mar 15, 2024
@llvmbot
Copy link
Member

llvmbot commented Mar 15, 2024

@llvm/pr-subscribers-clang

@llvm/pr-subscribers-clang-codegen

Author: Alexandros Lamprineas (labrinea)

Changes

This was a limitation which has now been lifted upon request.
Please read the thread below for more details:

#84405 (comment)

Basically it allows to separate versioned implementations across
different TUs without having to share private header files which
contain the default declaration.


Patch is 76.13 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/85454.diff

8 Files Affected:

  • (modified) clang/lib/CodeGen/CodeGenModule.cpp (+42-13)
  • (modified) clang/lib/CodeGen/CodeGenModule.h (+5)
  • (modified) clang/lib/Sema/SemaDecl.cpp (+3-3)
  • (modified) clang/test/CodeGen/attr-target-version.c (+372-186)
  • (modified) clang/test/CodeGenCXX/attr-target-version.cpp (+225-54)
  • (modified) clang/test/Sema/aarch64-sme-func-attrs.c (+5-3)
  • (modified) clang/test/Sema/attr-target-version.c (+4-3)
  • (modified) clang/test/SemaCXX/attr-target-version.cpp (+8-2)
diff --git a/clang/lib/CodeGen/CodeGenModule.cpp b/clang/lib/CodeGen/CodeGenModule.cpp
index 8ceecff28cbc63..a5eb46277d5f63 100644
--- a/clang/lib/CodeGen/CodeGenModule.cpp
+++ b/clang/lib/CodeGen/CodeGenModule.cpp
@@ -3447,6 +3447,9 @@ bool CodeGenModule::MayBeEmittedEagerly(const ValueDecl *Global) {
       // Implicit template instantiations may change linkage if they are later
       // explicitly instantiated, so they should not be emitted eagerly.
       return false;
+    // Defer until all versions have been semantically checked.
+    if (FD->hasAttr<TargetVersionAttr>() && !FD->isMultiVersion())
+      return false;
   }
   if (const auto *VD = dyn_cast<VarDecl>(Global)) {
     if (Context.getInlineVariableDefinitionKind(VD) ==
@@ -3995,10 +3998,13 @@ void CodeGenModule::EmitMultiVersionFunctionDefinition(GlobalDecl GD,
         EmitGlobalFunctionDefinition(GD.getWithMultiVersionIndex(I), nullptr);
     // Ensure that the resolver function is also emitted.
     GetOrCreateMultiVersionResolver(GD);
-  } else if (FD->hasAttr<TargetVersionAttr>()) {
-    GetOrCreateMultiVersionResolver(GD);
   } else
     EmitGlobalFunctionDefinition(GD, GV);
+
+  // Defer the resolver emission until we can reason whether the TU
+  // contains a default target version implementation.
+  if (FD->isTargetVersionMultiVersion())
+    AddDeferredMultiVersionResolverToEmit(GD);
 }
 
 void CodeGenModule::EmitGlobalDefinition(GlobalDecl GD, llvm::GlobalValue *GV) {
@@ -4091,10 +4097,11 @@ void CodeGenModule::emitMultiVersionFunctions() {
     const auto *FD = cast<FunctionDecl>(GD.getDecl());
     assert(FD && "Expected a FunctionDecl");
 
+    bool EmitResolver = !FD->isTargetVersionMultiVersion();
     SmallVector<CodeGenFunction::MultiVersionResolverOption, 10> Options;
     if (FD->isTargetMultiVersion()) {
       getContext().forEachMultiversionedFunctionVersion(
-          FD, [this, &GD, &Options](const FunctionDecl *CurFD) {
+          FD, [this, &GD, &Options, &EmitResolver](const FunctionDecl *CurFD) {
             GlobalDecl CurGD{
                 (CurFD->isDefined() ? CurFD->getDefinition() : CurFD)};
             StringRef MangledName = getMangledName(CurGD);
@@ -4120,6 +4127,9 @@ void CodeGenModule::emitMultiVersionFunctions() {
                                    TA->getArchitecture(), Feats);
             } else {
               const auto *TVA = CurFD->getAttr<TargetVersionAttr>();
+              if (CurFD->isUsed() || (TVA->isDefaultVersion() &&
+                                      CurFD->doesThisDeclarationHaveABody()))
+                EmitResolver = true;
               llvm::SmallVector<StringRef, 8> Feats;
               TVA->getFeatures(Feats);
               Options.emplace_back(cast<llvm::Function>(Func),
@@ -4175,22 +4185,27 @@ void CodeGenModule::emitMultiVersionFunctions() {
       continue;
     }
 
+    if (!EmitResolver)
+      continue;
+
     llvm::Constant *ResolverConstant = GetOrCreateMultiVersionResolver(GD);
     if (auto *IFunc = dyn_cast<llvm::GlobalIFunc>(ResolverConstant)) {
       ResolverConstant = IFunc->getResolver();
       if (FD->isTargetClonesMultiVersion() ||
           FD->isTargetVersionMultiVersion()) {
-        const CGFunctionInfo &FI = getTypes().arrangeGlobalDeclaration(GD);
-        llvm::FunctionType *DeclTy = getTypes().GetFunctionType(FI);
         std::string MangledName = getMangledNameImpl(
             *this, GD, FD, /*OmitMultiVersionMangling=*/true);
-        // In prior versions of Clang, the mangling for ifuncs incorrectly
-        // included an .ifunc suffix. This alias is generated for backward
-        // compatibility. It is deprecated, and may be removed in the future.
-        auto *Alias = llvm::GlobalAlias::create(
-            DeclTy, 0, getMultiversionLinkage(*this, GD),
-            MangledName + ".ifunc", IFunc, &getModule());
-        SetCommonAttributes(FD, Alias);
+        if (!GetGlobalValue(MangledName + ".ifunc")) {
+          const CGFunctionInfo &FI = getTypes().arrangeGlobalDeclaration(GD);
+          llvm::FunctionType *DeclTy = getTypes().GetFunctionType(FI);
+          // In prior versions of Clang, the mangling for ifuncs incorrectly
+          // included an .ifunc suffix. This alias is generated for backward
+          // compatibility. It is deprecated, and may be removed in the future.
+          auto *Alias = llvm::GlobalAlias::create(
+              DeclTy, 0, getMultiversionLinkage(*this, GD),
+              MangledName + ".ifunc", IFunc, &getModule());
+          SetCommonAttributes(FD, Alias);
+        }
       }
     }
     llvm::Function *ResolverFunc = cast<llvm::Function>(ResolverConstant);
@@ -4347,6 +4362,20 @@ void CodeGenModule::emitCPUDispatchDefinition(GlobalDecl GD) {
   }
 }
 
+/// Adds a declaration to the list of multi version functions if not present.
+void CodeGenModule::AddDeferredMultiVersionResolverToEmit(GlobalDecl GD) {
+  const auto *FD = cast<FunctionDecl>(GD.getDecl());
+  assert(FD && "Not a FunctionDecl?");
+
+  if (FD->isTargetVersionMultiVersion()) {
+    std::string MangledName =
+        getMangledNameImpl(*this, GD, FD, /*OmitMultiVersionMangling=*/true);
+    if (!DeferredResolversToEmit.insert(MangledName).second)
+      return;
+  }
+  MultiVersionFuncs.push_back(GD);
+}
+
 /// If a dispatcher for the specified mangled name is not in the module, create
 /// and return an llvm Function with the specified type.
 llvm::Constant *CodeGenModule::GetOrCreateMultiVersionResolver(GlobalDecl GD) {
@@ -4386,7 +4415,7 @@ llvm::Constant *CodeGenModule::GetOrCreateMultiVersionResolver(GlobalDecl GD) {
   // The resolver needs to be created. For target and target_clones, defer
   // creation until the end of the TU.
   if (FD->isTargetMultiVersion() || FD->isTargetClonesMultiVersion())
-    MultiVersionFuncs.push_back(GD);
+    AddDeferredMultiVersionResolverToEmit(GD);
 
   // For cpu_specific, don't create an ifunc yet because we don't know if the
   // cpu_dispatch will be emitted in this translation unit.
diff --git a/clang/lib/CodeGen/CodeGenModule.h b/clang/lib/CodeGen/CodeGenModule.h
index ec34680fd3f7e6..1cc447765e2c97 100644
--- a/clang/lib/CodeGen/CodeGenModule.h
+++ b/clang/lib/CodeGen/CodeGenModule.h
@@ -348,6 +348,8 @@ class CodeGenModule : public CodeGenTypeCache {
   /// yet.
   llvm::DenseMap<StringRef, GlobalDecl> DeferredDecls;
 
+  llvm::StringSet<llvm::BumpPtrAllocator> DeferredResolversToEmit;
+
   /// This is a list of deferred decls which we have seen that *are* actually
   /// referenced. These get code generated when the module is done.
   std::vector<GlobalDecl> DeferredDeclsToEmit;
@@ -1588,6 +1590,9 @@ class CodeGenModule : public CodeGenTypeCache {
       llvm::AttributeList ExtraAttrs = llvm::AttributeList(),
       ForDefinition_t IsForDefinition = NotForDefinition);
 
+  // Adds a declaration to the list of multi version functions if not present.
+  void AddDeferredMultiVersionResolverToEmit(GlobalDecl GD);
+
   // References to multiversion functions are resolved through an implicitly
   // defined resolver function. This function is responsible for creating
   // the resolver symbol for the provided declaration. The value returned
diff --git a/clang/lib/Sema/SemaDecl.cpp b/clang/lib/Sema/SemaDecl.cpp
index 7acce77458a372..3aedaf36195be3 100644
--- a/clang/lib/Sema/SemaDecl.cpp
+++ b/clang/lib/Sema/SemaDecl.cpp
@@ -11447,9 +11447,9 @@ static bool CheckMultiVersionFirstFunction(Sema &S, FunctionDecl *FD) {
          "Function lacks multiversion attribute");
   const auto *TA = FD->getAttr<TargetAttr>();
   const auto *TVA = FD->getAttr<TargetVersionAttr>();
-  // Target and target_version only causes MV if it is default, otherwise this
-  // is a normal function.
-  if ((TA && !TA->isDefaultVersion()) || (TVA && !TVA->isDefaultVersion()))
+  // The target attribute only causes MV if this declaration is the default,
+  // otherwise it is treated as a normal function.
+  if (TA && !TA->isDefaultVersion())
     return false;
 
   if ((TA || TVA) && CheckMultiVersionValue(S, FD)) {
diff --git a/clang/test/CodeGen/attr-target-version.c b/clang/test/CodeGen/attr-target-version.c
index b7112c783da913..5393127ed77c69 100644
--- a/clang/test/CodeGen/attr-target-version.c
+++ b/clang/test/CodeGen/attr-target-version.c
@@ -1,5 +1,5 @@
 // NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --function-signature --check-attributes --check-globals --include-generated-funcs
-// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +ls64 -target-feature +fullfp16 -S -emit-llvm -o - %s | FileCheck %s
+// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature -v9.5a -S -emit-llvm -o - %s | FileCheck %s
 // RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature -fmv -S -emit-llvm -o - %s | FileCheck %s -check-prefix=CHECK-NOFMV
 
 int __attribute__((target_version("rng+flagm+fp16fml"))) fmv(void) { return 1; }
@@ -11,15 +11,15 @@ int __attribute__((target_version("fp+aes"))) fmv(void) { return 6; }
 int __attribute__((target_version("crc+ls64_v"))) fmv(void) { return 7; }
 int __attribute__((target_version("bti"))) fmv(void) { return 8; }
 int __attribute__((target_version("sme2"))) fmv(void) { return 9; }
-int __attribute__((target_version("default"))) fmv(void) { return 0; }
+int __attribute__((target_version("default"))) fmv(void);
 int __attribute__((target_version("ls64+simd"))) fmv_one(void) { return 1; }
 int __attribute__((target_version("dpb"))) fmv_one(void) { return 2; }
-int __attribute__((target_version("default"))) fmv_one(void) { return 0; }
+int __attribute__((target_version("default"))) fmv_one(void);
 int __attribute__((target_version("fp"))) fmv_two(void) { return 1; }
 int __attribute__((target_version("simd"))) fmv_two(void) { return 2; }
 int __attribute__((target_version("dgh"))) fmv_two(void) { return 3; }
 int __attribute__((target_version("fp16+simd"))) fmv_two(void) { return 4; }
-int __attribute__((target_version("default"))) fmv_two(void) { return 0; }
+int __attribute__((target_version("default"))) fmv_two(void);
 int foo() {
   return fmv()+fmv_one()+fmv_two();
 }
@@ -84,9 +84,33 @@ int hoo(void) {
   return fp1() + fp2();
 }
 
+// This should generate one target version but no resolver.
+__attribute__((target_version("default"))) int unused_with_forward_default_decl(void);
+__attribute__((target_version("mops"))) int unused_with_forward_default_decl(void) { return 0; }
 
+// This should also generate one target version but no resolver.
+extern int unused_with_implicit_extern_forward_default_decl(void);
+__attribute__((target_version("dotprod")))
+int unused_with_implicit_extern_forward_default_decl(void) { return 0; }
 
+// This should also generate one target version but no resolver.
+__attribute__((target_version("aes"))) int unused_with_default_decl(void) { return 0; }
+__attribute__((target_version("default"))) int unused_with_default_decl(void);
 
+// This should generate two target versions and the resolver.
+__attribute__((target_version("sve"))) int unused_with_default_def(void) { return 0; }
+__attribute__((target_version("default"))) int unused_with_default_def(void) { return 1; }
+
+// This should also generate two target versions and the resolver.
+__attribute__((target_version("fp16"))) int unused_with_implicit_default_def(void) { return 0; }
+int unused_with_implicit_default_def(void) { return 1; }
+
+// This should also generate two target versions and the resolver.
+int unused_with_implicit_forward_default_def(void) { return 0; }
+__attribute__((target_version("lse"))) int unused_with_implicit_forward_default_def(void) { return 1; }
+
+// This should generate a target version despite the default not being declared.
+__attribute__((target_version("rdm"))) int unused_without_default(void) { return 0; }
 
 //.
 // CHECK: @__aarch64_cpu_features = external dso_local global { i64 }
@@ -97,13 +121,19 @@ int hoo(void) {
 // CHECK: @fmv_c.ifunc = weak_odr alias void (), ptr @fmv_c
 // CHECK: @fmv_inline.ifunc = weak_odr alias i32 (), ptr @fmv_inline
 // CHECK: @fmv_d.ifunc = internal alias i32 (), ptr @fmv_d
+// CHECK: @unused_with_default_def.ifunc = weak_odr alias i32 (), ptr @unused_with_default_def
+// CHECK: @unused_with_implicit_default_def.ifunc = weak_odr alias i32 (), ptr @unused_with_implicit_default_def
+// CHECK: @unused_with_implicit_forward_default_def.ifunc = weak_odr alias i32 (), ptr @unused_with_implicit_forward_default_def
 // CHECK: @fmv = weak_odr ifunc i32 (), ptr @fmv.resolver
 // CHECK: @fmv_one = weak_odr ifunc i32 (), ptr @fmv_one.resolver
 // CHECK: @fmv_two = weak_odr ifunc i32 (), ptr @fmv_two.resolver
-// CHECK: @fmv_e = weak_odr ifunc i32 (), ptr @fmv_e.resolver
-// CHECK: @fmv_c = weak_odr ifunc void (), ptr @fmv_c.resolver
 // CHECK: @fmv_inline = weak_odr ifunc i32 (), ptr @fmv_inline.resolver
+// CHECK: @fmv_e = weak_odr ifunc i32 (), ptr @fmv_e.resolver
 // CHECK: @fmv_d = internal ifunc i32 (), ptr @fmv_d.resolver
+// CHECK: @fmv_c = weak_odr ifunc void (), ptr @fmv_c.resolver
+// CHECK: @unused_with_default_def = weak_odr ifunc i32 (), ptr @unused_with_default_def.resolver
+// CHECK: @unused_with_implicit_default_def = weak_odr ifunc i32 (), ptr @unused_with_implicit_default_def.resolver
+// CHECK: @unused_with_implicit_forward_default_def = weak_odr ifunc i32 (), ptr @unused_with_implicit_forward_default_def.resolver
 //.
 // CHECK: Function Attrs: noinline nounwind optnone
 // CHECK-LABEL: define {{[^@]+}}@fmv._MflagmMfp16fmlMrng
@@ -113,22 +143,106 @@ int hoo(void) {
 //
 //
 // CHECK: Function Attrs: noinline nounwind optnone
-// CHECK-LABEL: define {{[^@]+}}@fmv_one._Mls64Msimd
+// CHECK-LABEL: define {{[^@]+}}@fmv._Mflagm2Msme-i16i64
 // CHECK-SAME: () #[[ATTR1:[0-9]+]] {
 // CHECK-NEXT:  entry:
+// CHECK-NEXT:    ret i32 2
+//
+//
+// CHECK: Function Attrs: noinline nounwind optnone
+// CHECK-LABEL: define {{[^@]+}}@fmv._MlseMsha2
+// CHECK-SAME: () #[[ATTR2:[0-9]+]] {
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    ret i32 3
+//
+//
+// CHECK: Function Attrs: noinline nounwind optnone
+// CHECK-LABEL: define {{[^@]+}}@fmv._MdotprodMls64_accdata
+// CHECK-SAME: () #[[ATTR3:[0-9]+]] {
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    ret i32 4
+//
+//
+// CHECK: Function Attrs: noinline nounwind optnone
+// CHECK-LABEL: define {{[^@]+}}@fmv._Mfp16fmlMmemtag
+// CHECK-SAME: () #[[ATTR4:[0-9]+]] {
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    ret i32 5
+//
+//
+// CHECK: Function Attrs: noinline nounwind optnone
+// CHECK-LABEL: define {{[^@]+}}@fmv._MaesMfp
+// CHECK-SAME: () #[[ATTR5:[0-9]+]] {
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    ret i32 6
+//
+//
+// CHECK: Function Attrs: noinline nounwind optnone
+// CHECK-LABEL: define {{[^@]+}}@fmv._McrcMls64_v
+// CHECK-SAME: () #[[ATTR6:[0-9]+]] {
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    ret i32 7
+//
+//
+// CHECK: Function Attrs: noinline nounwind optnone
+// CHECK-LABEL: define {{[^@]+}}@fmv._Mbti
+// CHECK-SAME: () #[[ATTR7:[0-9]+]] {
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    ret i32 8
+//
+//
+// CHECK: Function Attrs: noinline nounwind optnone
+// CHECK-LABEL: define {{[^@]+}}@fmv._Msme2
+// CHECK-SAME: () #[[ATTR8:[0-9]+]] {
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    ret i32 9
+//
+//
+// CHECK: Function Attrs: noinline nounwind optnone
+// CHECK-LABEL: define {{[^@]+}}@fmv_one._Mls64Msimd
+// CHECK-SAME: () #[[ATTR5]] {
+// CHECK-NEXT:  entry:
 // CHECK-NEXT:    ret i32 1
 //
 //
 // CHECK: Function Attrs: noinline nounwind optnone
+// CHECK-LABEL: define {{[^@]+}}@fmv_one._Mdpb
+// CHECK-SAME: () #[[ATTR9:[0-9]+]] {
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    ret i32 2
+//
+//
+// CHECK: Function Attrs: noinline nounwind optnone
 // CHECK-LABEL: define {{[^@]+}}@fmv_two._Mfp
-// CHECK-SAME: () #[[ATTR1]] {
+// CHECK-SAME: () #[[ATTR5]] {
 // CHECK-NEXT:  entry:
 // CHECK-NEXT:    ret i32 1
 //
 //
 // CHECK: Function Attrs: noinline nounwind optnone
+// CHECK-LABEL: define {{[^@]+}}@fmv_two._Msimd
+// CHECK-SAME: () #[[ATTR5]] {
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    ret i32 2
+//
+//
+// CHECK: Function Attrs: noinline nounwind optnone
+// CHECK-LABEL: define {{[^@]+}}@fmv_two._Mdgh
+// CHECK-SAME: () #[[ATTR10:[0-9]+]] {
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    ret i32 3
+//
+//
+// CHECK: Function Attrs: noinline nounwind optnone
+// CHECK-LABEL: define {{[^@]+}}@fmv_two._Mfp16Msimd
+// CHECK-SAME: () #[[ATTR11:[0-9]+]] {
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    ret i32 4
+//
+//
+// CHECK: Function Attrs: noinline nounwind optnone
 // CHECK-LABEL: define {{[^@]+}}@foo
-// CHECK-SAME: () #[[ATTR2:[0-9]+]] {
+// CHECK-SAME: () #[[ATTR10]] {
 // CHECK-NEXT:  entry:
 // CHECK-NEXT:    [[CALL:%.*]] = call i32 @fmv()
 // CHECK-NEXT:    [[CALL1:%.*]] = call i32 @fmv_one()
@@ -158,16 +272,16 @@ int hoo(void) {
 // CHECK-NEXT:    ret ptr @fmv._Mflagm2Msme-i16i64
 // CHECK:       resolver_else2:
 // CHECK-NEXT:    [[TMP8:%.*]] = load i64, ptr @__aarch64_cpu_features, align 8
-// CHECK-NEXT:    [[TMP9:%.*]] = and i64 [[TMP8]], 16
-// CHECK-NEXT:    [[TMP10:%.*]] = icmp eq i64 [[TMP9]], 16
+// CHECK-NEXT:    [[TMP9:%.*]] = and i64 [[TMP8]], 9007199254741008
+// CHECK-NEXT:    [[TMP10:%.*]] = icmp eq i64 [[TMP9]], 9007199254741008
 // CHECK-NEXT:    [[TMP11:%.*]] = and i1 true, [[TMP10]]
 // CHECK-NEXT:    br i1 [[TMP11]], label [[RESOLVER_RETURN3:%.*]], label [[RESOLVER_ELSE4:%.*]]
 // CHECK:       resolver_return3:
 // CHECK-NEXT:    ret ptr @fmv._MdotprodMls64_accdata
 // CHECK:       resolver_else4:
 // CHECK-NEXT:    [[TMP12:%.*]] = load i64, ptr @__aarch64_cpu_features, align 8
-// CHECK-NEXT:    [[TMP13:%.*]] = and i64 [[TMP12]], 1024
-// CHECK-NEXT:    [[TMP14:%.*]] = icmp eq i64 [[TMP13]], 1024
+// CHECK-NEXT:    [[TMP13:%.*]] = and i64 [[TMP12]], 4503599627371520
+// CHECK-NEXT:    [[TMP14:%.*]] = icmp eq i64 [[TMP13]], 4503599627371520
 // CHECK-NEXT:    [[TMP15:%.*]] = and i1 true, [[TMP14]]
 // CHECK-NEXT:    br i1 [[TMP15]], label [[RESOLVER_RETURN5:%.*]], label [[RESOLVER_ELSE6:%.*]]
 // CHECK:       resolver_return5:
@@ -182,8 +296,8 @@ int hoo(void) {
 // CHECK-NEXT:    ret ptr @fmv._Mfp16fmlMmemtag
 // CHECK:       resolver_else8:
 // CHECK-NEXT:    [[TMP20:%.*]] = load i64, ptr @__aarch64_cpu_features, align 8
-// CHECK-NEXT:    [[TMP21:%.*]] = and i64 [[TMP20]], 16384
-// CHECK-NEXT:    [[TMP22:%.*]] = icmp eq i64 [[TMP21]], 16384
+// CHECK-NEXT:    [[TMP21:%.*]] = and i64 [[TMP20]], 16640
+// CHECK-NEXT:    [[TMP22:%.*]] = icmp eq i64 [[TMP21]], 16640
 // CHECK-NEXT:    [[TMP23:%.*]] = and i1 true, [[TMP22]]
 // CHECK-NEXT:    br i1 [[TMP23]], label [[RESOLVER_RETURN9:%.*]], label [[RESOLVER_ELSE10:%.*]]
 // CHECK:       resolver_return9:
@@ -218,43 +332,95 @@ int hoo(void) {
 //
 // CHECK-LABEL: define {{[^@]+}}@fmv_one.resolver() comdat {
 // CHECK-NEXT:  resolver_entry:
+// CHECK-NEXT:    call void @__init_cpu_features_resolver()
+// CHECK-NEXT:    [[TMP0:%.*]] = load i64, ptr @__aarch64_cpu_features, align 8
+// CHECK-NEXT:    [[TMP1:%.*]] = and i64 [[TMP0]], 2251799813685760
+// CHECK-NEXT:    [[TMP2:%.*]] = icmp eq i64 [[TMP1]], 2251799813685760
+// CHECK-NEXT:    [[TMP3:%.*]] = and i1 true, [[TMP2]]
+// CHECK-NEXT:    br i1 [[TMP3]], label [[RESOLVER_RETURN:%.*]], label [[RESOLVER_ELSE:%.*]]
+// CHECK:       resolver_return:
 // CHECK-NEXT:    ret ptr @fmv_one._Mls64Msimd
+// CHECK:       resolver_else:
+// CHECK-NEXT:    [[TMP4:%.*]] = load i64, ptr @__aarch64_cpu_features, align 8
+// CHECK-NEXT:    [[TMP5:%.*]] = and i64 [[TMP4]], 262144
+// CHECK-NEXT:    [[TMP6:%.*]] = icmp eq i64 [[TMP5]], 262144
+// CHECK-NEXT:    [[TMP7:%.*]] = and i1 true, [[TMP6]]
+// CHECK-NEXT:    br i1 [[TMP7]], label [[RESOLVER_RETURN1:%.*]], label [[RESOLVER_ELSE2:%.*]]
+// CHECK:       resolver_return1:
+// CHECK-NEXT:    ret ptr @fmv_one._Mdpb
+// CHECK:       resolver_else2:
+// CHECK-NEXT:    ret ptr @fmv_one.default
 //
 //
 // CHECK-LABEL: define {{[^@]+}}@fmv_two.resolver() comdat {
 // CHECK-NEXT:  resolver_entry:
+// CHECK-NEXT:    call void @__init_cpu_features_resolver()
+// CHECK-NEXT:    [[TMP0:%.*]] = load i64, ptr @__aarch64_cpu_features, align 8
+// CHECK-NEXT:    [[TMP1:%.*]] = and i64 [[TMP0]], 66048
+// CHECK-NEXT:    [[TMP2:%.*]] = icmp eq i64 [[TMP1]], 66048
+// CHECK-NEXT:    [[TMP3:%.*]] = and i1 true, [[TMP2]]
+// CHECK-NEXT:    br i1 [[TMP3]], label [[RESOLVER_RETURN:%.*]], label [[RESOLVER_ELSE:%.*]]
+// CHECK:       resolver_return:
 // CHECK-NEX...
[truncated]

@labrinea
Copy link
Collaborator Author

I think I messed up the review, this is based on #84405 so it shows unrelated changes. Ignore for now. I'll rebase

@labrinea
Copy link
Collaborator Author

Hmm, I am now having second thoughts about this change. All is well without a default declaration as long as we don't have a caller in the TU. In that case (if there is a caller) a resolver is needed, but without a default declaration we can't create a complete resolver (the resolver contains calls to the target versions it can see in the TU) :/

Copy link

github-actions bot commented Mar 19, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

@labrinea labrinea force-pushed the allow-fmv-without-default-declaration branch 2 times, most recently from 63585c5 to 60a6fcd Compare March 22, 2024 11:20
@labrinea
Copy link
Collaborator Author

labrinea commented Mar 22, 2024

Update: The ACLE spec is now explicit about this (see ARM-software/acle#310), so we can safely implement the change in the compiler. Implementation-wise I have moved the static cast to llvm::Function inside the createFunction() lambda. @jroelofs I know you've already approved this on labrinea#1 but since the PR wasn't properly stacked do you mind having another look please? Thanks!

@labrinea labrinea changed the title [FMV] Allow fmv without default declaration. [FMV] Allow multi versioning without default declaration. Mar 22, 2024
@jroelofs
Copy link
Contributor

re: stacking PRs: SPR is pretty handy: https://getcord.github.io/spr/

This was a limitation which has now been lifted. Please read the
thread below for more details:

llvm#84405 (comment)

Basically it allows to separate versioned implementations across
different TUs without having to share private header files which
contain the default declaration.

The ACLE spec has been updated accordingly to make this explicit:
"Each version declaration should be visible at the translation
 unit in which the corresponding function version resides."

ARM-software/acle#310

If a resolver is required (because there is a caller in the TU),
then a default declaration is implicitly generated.
@labrinea labrinea force-pushed the allow-fmv-without-default-declaration branch from 60a6fcd to 289d36f Compare March 23, 2024 18:04
Copy link

✅ With the latest revision this PR passed the Python code formatter.

@labrinea labrinea merged commit 772e316 into llvm:main Mar 25, 2024
@labrinea labrinea deleted the allow-fmv-without-default-declaration branch March 25, 2024 11:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:codegen IR generation bugs: mangling, exceptions, etc. clang:frontend Language frontend issues, e.g. anything involving "Sema" clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants