Skip to content

[C++20][Modules] Load function body from the module that gives canonical decl #111992

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

dmpolukhin
Copy link
Contributor

@dmpolukhin dmpolukhin commented Oct 11, 2024

Summary:
Fix crash from reproducer provided in
#109167 (comment)
Also fix issues with merged inline friend functions merged during deserialization.

Test Plan: check-clang

if (auto *RD = dyn_cast<CXXRecordDecl>(FD->getLexicalParent()))
if (RD->isDependentContext() && !RD->isThisDeclarationADefinition())
continue;
if (FD->getFriendObjectKind())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it mean we can still potentially have this problem for friend functions?
Or are there some invariants in Clang/C++ itself that stop that from happening?

If we have a small example, it'd be interesting to check that adding friend does not cause the problem to resurface.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'm not sure that there is no sequence of deserialization that might also lead to similar issue. The whole logic about which lazy desrialization and which decl will be canonical and finally definition is very fragile I think :(

Initially I was thinking about removing the whole if because it can transform canonical decl with body may not be a definition after deserialization. But removing this logic completely breaks this test case. But it seems that it was added only for friend decls and nothing else.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original comments confuses me. I feel it is not consistent (or at least not straight forward) to the codes. I agree the original mechanism is fragile.

Could you try to describe the problem in more detail? Let's try to fix the fundamental problem if possible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ChuanqiXu9 and @ilya-biryukov I'm not sure that I fully understand the example test case. Do you understand what does the example really do?

Things unclear to me marked with // >>>.

// RUN: %clang_cc1 -std=c++14 -fmodules %s -verify
// RUN: %clang_cc1 -std=c++14 -fmodules %s -verify -triple i686-windows
// expected-no-diagnostics

// >>> it looks like undocumented pragma?
#pragma clang module build A 

// >>> Is it modulemap file syntax?
module A {}
#pragma clang module contents
#pragma clang module begin A

// >>> What is the context of this code, is it GMF or part of module A, is it implicitly exported?
template<typename T> struct ct { friend auto operator-(ct, ct) { struct X {}; return X(); } void x(); };
#pragma clang module end
#pragma clang module endbuild

#pragma clang module build B
module B {}
#pragma clang module contents
#pragma clang module begin B

// >>> Seems like it was part of GMF otherwise it is duplicating declaration?
template<typename T> struct ct { friend auto operator-(ct, ct) { struct X{}; return X(); } void x(); };
inline auto f() { return ct<float>() - ct<float>(); }
#pragma clang module end
#pragma clang module endbuild

// >>> Seems like here we are in different translation unit

// Force the definition of ct in module A to be the primary definition.
#pragma clang module import A
// >>> Why does it define something from module A !?
// >>> Should we see here error like: `declaration of 'x' in the global module follows declaration in module A`
template<typename T> void ct<T>::x() {}

// Attempt to cause the definition of operator- in the ct primary template in
// module B to be the primary definition of that function. If that happens,
// we'll be left with a class template ct that appears to not contain a
// definition of the inline friend function.
#pragma clang module import B
auto v = f();

ct<int> make();
void h() {
  make() - make();
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah let's try to understand the fundamental issue and fix it fundamentally.

// >>> it looks like undocumented pragma?

This is the pragma for clang modules. I am not familiar with it too @ilya-biryukov is it possible to get some google forks familiar with this involved?

// >>> What is the context of this code, is it GMF or part of module A, is it implicitly exported?

It is not GMF, nor a named module in C++20 sense nor exported in C++20 sense. It is a clang module. So the concept of GMF, named module and export ness (and the decl level) is not meaningful here.

// >>> Seems like it was part of GMF

It won't be a GMF for sure.

// >>> Why does it define something from module A !?

So given it is not the named module in C++20, so the ODR rule in C++20 doesn't apply here. Correct me if I am wrong. I feel clang modules implement the header's semantics, and the name of their modules are simply an annotation and doesn't have semantical effect. So I am not surprise that it is a valid code since it basically a concentration of two headers.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// >>> it looks like undocumented pragma?

The pragma is coming from -frewrite-imports, which is a way to generate multi-module reproducers based in a single file. It does the "obvious" thing: reads the module map until #pragma clang module contents, then builds a unique PCM from the code up until #pragma clang module end, which can later be imported by #pragma clang module import <Name>, where <Name> should match with what's written in #pragma clang module build <Name>.

// >>> Is it modulemap file syntax?

Yes!

This testcase should probably be written to use split-file, it's clearly more common these days, so having these pragmas in tests unrelated to those pragmas seems like a bad idea.

re the Clang modules: @ChuanqiXu9's about Clang modules are correct here. It's okay to have multiple declarations in different Clang modules as long as they follow ODR (consist of the same tokens, etc). They will be "merged" together (I believe this is similar to what's happening when we import two C++20 modules and both have the same declaration in the GMF; but I'm on the contrary, not an expert in C++20 modules, so please correct me if I'm wrong).

The Clang modules are not a concatenation of two headers, rather implement "merging" of ASTs from those two headers, as if each class was defined with a unique header guard and consecutive #include would only skip over the second one (in fact, we even have optimizations that skip over class tokens if the same class was already imported in some Clang module).

Please let me know if that helps and if there are still unclear things in that test.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've sent #112380 to rewrite this test with split-file, hope that helps to get a better understanding of what's happening there.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this is similar to what's happening when we import two C++20 modules and both have the same declaration in the GMF; but I'm on the contrary, not an expert in C++20 modules, so please correct me if I'm wrong

Correct relatively. But we forbid the checks for GMF (https://clang.llvm.org/docs/StandardCPlusPlusModules.html#object-definition-consistency) see reasons in the document.

@dmpolukhin dmpolukhin requested a review from ChuanqiXu9 October 11, 2024 16:59
@dmpolukhin dmpolukhin marked this pull request as ready for review October 11, 2024 17:06
@llvmbot llvmbot added clang Clang issues not falling into any other category clang:modules C++20 modules and Clang Header Modules labels Oct 11, 2024
@llvmbot
Copy link
Member

llvmbot commented Oct 11, 2024

@llvm/pr-subscribers-clang-modules

@llvm/pr-subscribers-clang

Author: Dmitry Polukhin (dmpolukhin)

Changes

Summary:
Fix crash from reproducer provided in #109167 (comment)

Test Plan: TBD (need to further reduce the reproducer)


Full diff: https://github.com/llvm/llvm-project/pull/111992.diff

2 Files Affected:

  • (modified) clang/lib/Serialization/ASTReader.cpp (+7-4)
  • (added) clang/test/Headers/crash-instantiated-in-scope-cxx-modules4.cpp (+110)
diff --git a/clang/lib/Serialization/ASTReader.cpp b/clang/lib/Serialization/ASTReader.cpp
index e638129897692f..f2853dcf77ae94 100644
--- a/clang/lib/Serialization/ASTReader.cpp
+++ b/clang/lib/Serialization/ASTReader.cpp
@@ -10057,15 +10057,18 @@ void ASTReader::finishPendingActions() {
       // For a function defined inline within a class template, force the
       // canonical definition to be the one inside the canonical definition of
       // the template. This ensures that we instantiate from a correct view
-      // of the template.
+      // of the template. This behaviour seems to be important only for inline
+      // friend functions. For normal member functions, it might results in
+      // selecting canonical decl from module A but body from module B.
       //
       // Sadly we can't do this more generally: we can't be sure that all
       // copies of an arbitrary class definition will have the same members
       // defined (eg, some member functions may not be instantiated, and some
       // special members may or may not have been implicitly defined).
-      if (auto *RD = dyn_cast<CXXRecordDecl>(FD->getLexicalParent()))
-        if (RD->isDependentContext() && !RD->isThisDeclarationADefinition())
-          continue;
+      if (FD->getFriendObjectKind())
+        if (auto *RD = dyn_cast<CXXRecordDecl>(FD->getLexicalParent()))
+          if (RD->isDependentContext() && !RD->isThisDeclarationADefinition())
+            continue;
 
       // FIXME: Check for =delete/=default?
       const FunctionDecl *Defn = nullptr;
diff --git a/clang/test/Headers/crash-instantiated-in-scope-cxx-modules4.cpp b/clang/test/Headers/crash-instantiated-in-scope-cxx-modules4.cpp
new file mode 100644
index 00000000000000..087eb135dc5f53
--- /dev/null
+++ b/clang/test/Headers/crash-instantiated-in-scope-cxx-modules4.cpp
@@ -0,0 +1,110 @@
+// RUN: rm -fR %t
+// RUN: split-file %s %t
+// RUN: cd %t
+// RUN: %clang_cc1 -verify -std=c++20 -x c++ -fmodule-map-file=modules.map -fmodule-name=foo1 -emit-module modules.map -o foo1.pcm
+// RUN: %clang_cc1 -verify -std=c++20 -x c++ -fmodule-map-file=modules.map -fmodule-name=foo2 -emit-module modules.map -o foo2.pcm
+// RUN: %clang_cc1 -verify -std=c++20 -x c++ -fmodule-map-file=modules.map -fmodule-file=foo1.pcm -fmodule-file=foo2.pcm server.cc
+
+//--- functional
+#pragma once
+
+namespace std {
+template <class> class function {};
+} // namespace std
+
+//--- foo.h
+#pragma once
+
+class MethodHandler {
+ public:
+  virtual ~MethodHandler() {}
+  struct HandlerParameter {
+    HandlerParameter();
+  };
+  virtual void RunHandler(const HandlerParameter &param);
+};
+
+template <class RequestType, class ResponseType>
+class CallbackUnaryHandler : public MethodHandler {
+ public:
+  explicit CallbackUnaryHandler();
+
+  void RunHandler(const HandlerParameter &param) final {
+    void *call = nullptr;
+    (void)[call](bool){};
+  }
+};
+
+//--- foo1.h
+// expected-no-diagnostics
+#pragma once
+
+#include "functional"
+
+#include "foo.h"
+
+class A;
+
+class ClientAsyncResponseReaderHelper {
+   public:
+      using t = std::function<void(A)>;
+        static void SetupRequest(t finish);
+};
+
+//--- foo2.h
+// expected-no-diagnostics
+#pragma once
+
+#include "foo.h"
+
+template <class BaseClass>
+class a : public BaseClass {
+ public:
+  a() { [[maybe_unused]] CallbackUnaryHandler<int, int> a; }
+};
+
+//--- modules.map
+module "foo" {
+  export *
+  module "foo.h" {
+    export *
+    textual header "foo.h"
+  }
+}
+
+module "foo1" {
+  export *
+  module "foo1.h" {
+    export *
+    header "foo1.h"
+  }
+
+  use "foo"
+}
+
+module "foo2" {
+  export *
+  module "foo2.h" {
+    export *
+    header "foo2.h"
+  }
+
+  use "foo"
+}
+
+//--- server.cc
+// expected-no-diagnostics
+#include "functional"
+
+#include "foo1.h"
+#include "foo2.h"
+
+std::function<void()> on_emit;
+
+template <class RequestType, class ResponseType>
+class CallbackUnaryHandler;
+
+class s {};
+class hs final : public a<s> {
+  explicit hs() {}
+};

@ChuanqiXu9
Copy link
Member

@dmpolukhin I am still confusing about the problem. I mean, why your previous patch will break the reproducer and why this patch can "fix" it? I feel the current patch is somewhat workaround. It's not your fault. The original codes are somewhat tricky already. But let's try to find methods to fix it more fundamentally.

ilya-biryukov added a commit that referenced this pull request Oct 16, 2024
Instead of the pragmas, which are less familiar to people.
This is a follow-up of a discussion from #111992.
@dmpolukhin
Copy link
Contributor Author

dmpolukhin commented Oct 16, 2024

@dmpolukhin I am still confusing about the problem. I mean, why your previous patch will break the reproducer and why this patch can "fix" it? I feel the current patch is somewhat workaround. It's not your fault. The original codes are somewhat tricky already. But let's try to find methods to fix it more fundamentally.

I understand why my previous patch broke Google's reproducer I attached to this PR. It happens because it deserializes eagerly lambda classes from the module that should be chosen as a pattern for template instantiation (function CallbackUnaryHandler::RunHandler). However code:

        if (auto *RD = dyn_cast<CXXRecordDecl>(FD->getLexicalParent()))
          if (RD->isDependentContext() && !RD->isThisDeclarationADefinition())
            continue;

skips setting body for the function so it cannot be used as the pattern and function body from different module is selected that now doesn't match the lambda. As I wrote in #109167 for captured variables to work correctly we need to make sure that enclosing function and lambda are coming from the same module. This code in some sense is doing similar thing, it forces canonical definition for the function defined inline within a class template to be from the same module as class template.

What I don't understand why clang cannot handle inline friend function coming from one module but class template coming from another module. It works in cases of member functions but friend inline function inside class template are special in this sense and template instantiation cannot deduce return type. I agree that my code is just reduces scope of previous hack to work only in cases when it is really required (we have test case). For the full solution I think we have two options: (1) figure out why inline friend function and template have to be from the same module, relax this requirement and remove this check completely or (2) make sure that canonical decls for class template, inline functions and potential lambdas are always chosen from the same module (my code from #109167 allows deserializing related decls from the same module if needed). I'm exploring option (1) at the moment but template instantiation code is not very familiar for me so it might take some time.

@ChuanqiXu9
Copy link
Member

@dmpolukhin I am still confusing about the problem. I mean, why your previous patch will break the reproducer and why this patch can "fix" it? I feel the current patch is somewhat workaround. It's not your fault. The original codes are somewhat tricky already. But let's try to find methods to fix it more fundamentally.

I understand why my previous patch broke Google's reproducer I attached to this PR. It happens because it deserializes eagerly lambda classes from the module that should be chosen as a pattern for template instantiation (function CallbackUnaryHandler::RunHandler). However code:

        if (auto *RD = dyn_cast<CXXRecordDecl>(FD->getLexicalParent()))
          if (RD->isDependentContext() && !RD->isThisDeclarationADefinition())
            continue;

skips setting body for the function so it cannot be used as the pattern and function body from different module is selected that now doesn't match the lambda. As I wrote in #109167 for captured variables to work correctly we need to make sure that enclosing function and lambda are coming from the same module. This code in some sense is doing similar thing, it forces canonical definition for the function defined inline within a class template to be from the same module as class template.

What I don't understand why clang cannot handle inline friend function coming from one module but class template coming from another module. It works in cases of member functions but friend inline function inside class template are special in this sense and template instantiation cannot deduce return type. I agree that my code is just reduces scope of previous hack to work only in cases when it is really required (we have test case). For the full solution I think we have two options: (1) figure out why inline friend function and template have to be from the same module, relax this requirement and remove this check completely or (2) make sure that canonical decls for class template, inline functions and potential lambdas are always chosen from the same module (my code from #109167 allows deserializing related decls from the same module if needed). I'm exploring option (1) at the moment but template instantiation code is not very familiar for me so it might take some time.

Thanks for the analysis. I feel (2) is easier to me IIUC. I feel it is a natural extension of your previous work. WDYT?

@dmpolukhin dmpolukhin force-pushed the assert-not-instantiated-in-scope-cxx-modules4 branch from 7c2fbfe to 1270392 Compare October 17, 2024 13:57
@dmpolukhin
Copy link
Contributor Author

Thanks for the analysis. I feel (2) is easier to me IIUC. I feel it is a natural extension of your previous work. WDYT?

I implemented this approach and it works. I tried to limit eager deserialization to the minimal set of functions (at the moment it is friend inline friend functions). PTAL.

@ChuanqiXu9
Copy link
Member

Thanks for the analysis. I feel (2) is easier to me IIUC. I feel it is a natural extension of your previous work. WDYT?

I implemented this approach and it works. I tried to limit eager deserialization to the minimal set of functions (at the moment it is friend inline friend functions). PTAL.

LGTM. And a minor point is, the term "eager deserialization" makes me to think you're going to deserialize the decls eagerly, which made me shocked initially : )

let's wait for @ilya-biryukov and see if he can give a test for this on google's workloads before landing

@dmpolukhin
Copy link
Contributor Author

@ilya-biryukov could you please take a look?

@ilya-biryukov
Copy link
Contributor

@ilya-biryukov could you please take a look?

Sorry for the late reply, was on vacation. I'll ask @alexfh to run this through our release testing and report if there are any bugs left.

@dmpolukhin
Copy link
Contributor Author

FYI, in our internal testing we also found a case when this PR fixes issue. Also we don't see any examples when it may break something so I think we should merge it ASAP because it fixes known issues and don't have any known regressions.

@alexfh what is the ETA for release testing?

@alexfh
Copy link
Contributor

alexfh commented Nov 6, 2024

Sorry, we've been in a bit of a firefighting. I'll try to arrange some time for this till the EOW.

@ilya-biryukov
Copy link
Contributor

Update: we've found a failure with "function X has different bodies" but my suspicion is that it's surfacing an existing issue rather than introducing a new one. I've managed to workaround it in our codebase by modularizing a bit more of the codebase.

Sadly, because this stopped further release testing, we don't know if it's more widespread. @alexfh will rerun the full testing with the fix and we'll report back on the progress again.

PS sorry for the delays, as Alex has mentioned there's been a lot of other background issues we were firefighting that slowed down testing for this particular issue.

@dmpolukhin
Copy link
Contributor Author

Update: we've found a failure with "function X has different bodies" but my suspicion is that it's surfacing an existing issue rather than introducing a new one. I've managed to workaround it in our codebase by modularizing a bit more of the codebase.

Thank you for the update. I do think this issue is not directly related to this change and most probably just pre-existing issue detected. FYI, I recently disable ODR checks for header units like it was for named modules, see #111160 I don't know what happens for clang modules perhaps they also should be excluded from ODR check. In long run I would like to enable ODR check but with exclusion for common issue like set of visible overloaded functions visible that is hard to control and benign ODR violations.

@dmpolukhin
Copy link
Contributor Author

@alexfh and @ilya-biryukov do you have any updates?

@ilya-biryukov
Copy link
Contributor

Sorry, still trying to reproduce this and refraining from responding every day because I feel very close to getting a repro I can share. But that has been taking more time than I expected, please bear with me, I'm making progress.

@dmpolukhin
Copy link
Contributor Author

I do believe that the issue about ODR violation cannot be caused by this change itself. I suggest merging this PR as is and I'll take a look at the reproducer as soon as you create it. This PR fixes known issues with reproducers so I'm in doubt what makes this change so special that we cannot land it as normal PRs.

Said that I think we should invest into a tool to reduce issues in case of modules as we discussed. I also have a reproducer for C++ modules related issues that I cannot reduce for almost 2 weeks already. It slows us down a lot :(

@ilya-biryukov
Copy link
Contributor

I fully agree, both getting the reproducers and reducing them later is such a pain that we cannot be efficient without that tooling.
My team is going to start working on it soonish, we really feel the need. We'll make sure to loop you in to look for potential collaborations.

In the meantime, still making progress on this particular issues and still not done :(

@ilya-biryukov
Copy link
Contributor

I have managed to get a small example, but it only crashes for me if I run it directly through our internal build system (not even if I use the exact same Clang binary).

There are a few things that might affect that:

  • the names of the modules used internally are different; I have simplified them to save some typing,
  • the flags we pass are different, but most of them should not matter (various macros only used internally, etc).

I suspect there are some hash tables somewhere and the order of something gets affected by those two factors above or something else and we end up loosing the repro.
I will need a little more time to produce the exact same command lines and module map inputs and reduce them further, but I wanted to share the actual code that produces the "inline function not defined" error; just in case folks would have ideas on what might go wrong in those examples before I get a full repro.

friends.tgz

@dmpolukhin
Copy link
Contributor Author

@ilya-biryukov unfortunately it seems that the reproducer is very sensitive to the exact clang revision. I tried it with current main b8eef18 and this PR on top of the revision. In both cases I don't see any problems at all (no errors, no warnings). So you might have to share exact revision and any patches on top of them if any.

The warning/error about "inline function not defined" is very familiar to me and I'm reducing it now in our code. But I see it with and without my changes on many revisions. But depending on revision the warning moves from one function to another. My change alters deserialization order but the error itself is in something else that I was not able to identify yet. Our workaround is to modularize more and avoid merging definitions from several modules. In your example strong_int.h is not a part of its own module so clang merges it and it causes issue for friend inside template class.

if (auto *RD = dyn_cast<CXXRecordDecl>(FD->getLexicalParent()))
if (RD->isDependentContext() && !RD->isThisDeclarationADefinition())
continue;
if (FD->getFriendObjectKind())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this is similar to what's happening when we import two C++20 modules and both have the same declaration in the GMF; but I'm on the contrary, not an expert in C++20 modules, so please correct me if I'm wrong

Correct relatively. But we forbid the checks for GMF (https://clang.llvm.org/docs/StandardCPlusPlusModules.html#object-definition-consistency) see reasons in the document.

@@ -10057,15 +10057,18 @@ void ASTReader::finishPendingActions() {
// For a function defined inline within a class template, force the
// canonical definition to be the one inside the canonical definition of
// the template. This ensures that we instantiate from a correct view
// of the template.
// of the template. This behaviour seems to be important only for inline
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems seems to not be a good term in comments.

@@ -1976,14 +1976,16 @@ TemplateDeclInstantiator::VisitFunctionTemplateDecl(FunctionTemplateDecl *D) {
if (!InstParams)
return nullptr;

// Use canonical templated decl because only canonical decl has body
// if declarations were merged during loading from modules.
FunctionDecl *TemplatedDecl = D->getTemplatedDecl()->getCanonicalDecl();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Such problems are pretty common and I thought if we can make it generally. And also I am wondering the definition for canonical decl. But this is not the problem of this patch. This won't make a blocker issue.

I roughly remember that the definition of CanonicalDecl are the first declaration not the declaration has the definition. So may be it is better to write this as:

FunctionDecl *TemplatedDecl = D->getTemplatedDecl();
if (TemplatedDecl doesn't have a body)
     iterate redecls of TemplatedDecl and assign the decl that has a body to `TemplatedDecl`.

(maybe we should have an interface like something like getRedeclWithDef() if we don't have one)

And also I admit there some existing codes use getCanonicalDecl to try to do the similar things, but they are their problems.

@ilya-biryukov
Copy link
Contributor

We have some testing results in and it's causing a large-scale breakage. I think it's all coming from 2 root causes, but it's hard to know for sure because there are many thousands of broken targets.

The two problems I see are:

  • invalid no matching constructor for initialization of errors for some particular type,
  • a crash during template instantiations for some other particular type.

And a few things that seem minor at first glance, but would also need some checking.

  • some errors due to too long recursive template instantiations that we didn't see before (probably correct and coming from instantiating those bodies that were previously skipped?)
  • some compilation timeouts (probably just a flake or a slight increase that we should mitigate on our end).

And there may be some long tail that's not visible because these issues take all of my attention.
I will try to get the reproducers for each of those. Or would it make sense to wait until you finish the review and do that for the new version?

Copy link
Contributor Author

@dmpolukhin dmpolukhin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The two problems I see are:

  • invalid no matching constructor for initialization of errors for some particular type,

I think it is due to applying my changes too wide (i.e. to normal member functions and use first decl for them). I hope I fixed it in the latest version.

  • a crash during template instantiations for some other particular type.

In this case it is harder to make a reasonable guess but perhaps also the same root.

And a few things that seem minor at first glance, but would also need some checking.

  • some errors due to too long recursive template instantiations that we didn't see before (probably correct and coming from instantiating those bodies that were previously skipped?)
  • some compilation timeouts (probably just a flake or a slight increase that we should mitigate on our end).

And there may be some long tail that's not visible because these issues take all of my attention. I will try to get the reproducers for each of those. Or would it make sense to wait until you finish the review and do that for the new version?

Could you please try the latest version if it doesn't fix all issues, please create a reproducer to continue my investigation.

@@ -1976,14 +1976,16 @@ TemplateDeclInstantiator::VisitFunctionTemplateDecl(FunctionTemplateDecl *D) {
if (!InstParams)
return nullptr;

// Use canonical templated decl because only canonical decl has body
// if declarations were merged during loading from modules.
FunctionDecl *TemplatedDecl = D->getTemplatedDecl()->getCanonicalDecl();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FunctionDecl::hasBody is doing what you described for getRedeclWithDef, see https://github.com/llvm/llvm-project/blob/main/clang/lib/AST/Decl.cpp#L3163 and if I used it here or implement similar code here, it fails several clang tests. It is very unfortunate that Redeclarable is used for two things: real redeclaration and AST merges. Moreover it look like there is no way to distinguish these two cases. I need to use different decl only if the decl was doesThisDeclarationHaveABody() == true but it is no longer the case due to AST merge.

Copy link

github-actions bot commented Dec 11, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

@ilya-biryukov
Copy link
Contributor

We'll try the latest version and report back.

@dmpolukhin
Copy link
Contributor Author

FYI, I tested the latest version on our reproducers and it fixed all known issues due to friend inline functions.

@ilya-biryukov
Copy link
Contributor

The latest version works like a charm for us, there aren't any bugs. I'll let @ChuanqiXu9 stamp the final approval for the code.
Let me know if you'll need more testing.

Copy link
Member

@ChuanqiXu9 ChuanqiXu9 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for the long term waiting.

@ChuanqiXu9
Copy link
Member

@ilya-biryukov I remember last time, the codes you shared to me actually based an open source project. I am wondering if you can open source some use for such projects (even for testing purpose). I think it will be pretty helpful to speedup the overall development process.

@dmpolukhin dmpolukhin force-pushed the assert-not-instantiated-in-scope-cxx-modules4 branch from e2df412 to 5df865a Compare December 16, 2024 11:09
@dmpolukhin dmpolukhin merged commit 38b3d87 into llvm:main Dec 16, 2024
8 checks passed
@dmpolukhin
Copy link
Contributor Author

@ilya-biryukov and @ChuanqiXu9 thank you for your help with the review, testing and reproducers. Very appreciated!

@ilya-biryukov
Copy link
Contributor

@ilya-biryukov I remember last time, the codes you shared to me actually based an open source project. I am wondering if you can open source some use for such projects (even for testing purpose). I think it will be pretty helpful to speedup the overall development process.

Yes, we should, the biggest problem right now is that our use of header modules is quite unique and is all internal.
There are technically features in Bazel that would allow to do something similar, but the BUILD files and the way to prepare it is too difficult for anyone to master. On top of that, we also have a ton of targeted opt-ins and opt-outs for specific libraries to avoid compiler bugs. Worse yet, even though the code I shared only had Absl bits, the actual usage patterns that cause the bugs are always in the internal code and it takes ages to distill it to those essential small examples.

That being said, we're going to spend significant effort next year to make sure this improves and we can get small reproducers fast and help to surface and fix any problems that show up. We're in a fragile state right now and want to get to more solid ground with our use of modules.

@ilya-biryukov and @ChuanqiXu9 thank you for your help with the review, testing and reproducers. Very appreciated!

Thanks a lot for fixing this. We really appreciate that too. And I hope we can get to a point where we can give you the reproducers much faster some time next year.

@ilya-biryukov
Copy link
Contributor

ilya-biryukov commented Jan 10, 2025

Some code got rearranged after our initial testing, some errors slipped through and we started seeing some new errors after this landed.
I have filed #122493 to track it, let's continue the discussion there.

/// nodes that don't exist in the function.
llvm::DenseMap<GlobalDeclID, SmallVector<GlobalDeclID, 4>>
FunctionToLambdasMap;
/// Mapping from main decl ID to the related decls IDs.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the main decl? I can image some definitions, but it would be nice to have some clarifications included in the comment.

@@ -798,6 +798,17 @@ void ASTDeclWriter::VisitFunctionDecl(FunctionDecl *D) {
}
}

if (D->getFriendObjectKind()) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As stated in #122493 (comment), this is intended. It would be helpful to include comments explaining why this logic is applied exclusively to friend objects.

dmpolukhin added a commit that referenced this pull request Apr 3, 2025
…y a declaration during AST deserialization (#132214)

Fix for regression #130917, changes in #111992 were too broad. This change reduces scope of previous fix. Added `ExternalASTSource::wasThisDeclarationADefinition` to detect cases when FunctionDecl lost body due to declaration merges.
dmpolukhin added a commit to dmpolukhin/llvm-project that referenced this pull request Apr 3, 2025
…y a declaration during AST deserialization (llvm#132214)

Fix for regression llvm#130917, changes in llvm#111992 were too broad. This change reduces scope of previous fix. Added `ExternalASTSource::wasThisDeclarationADefinition` to detect cases when FunctionDecl lost body due to declaration merges.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:frontend Language frontend issues, e.g. anything involving "Sema" clang:modules C++20 modules and Clang Header Modules clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants