-
Notifications
You must be signed in to change notification settings - Fork 14k
[C++20][Modules] Load function body from the module that gives canonical decl #111992
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[C++20][Modules] Load function body from the module that gives canonical decl #111992
Conversation
if (auto *RD = dyn_cast<CXXRecordDecl>(FD->getLexicalParent())) | ||
if (RD->isDependentContext() && !RD->isThisDeclarationADefinition()) | ||
continue; | ||
if (FD->getFriendObjectKind()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it mean we can still potentially have this problem for friend
functions?
Or are there some invariants in Clang/C++ itself that stop that from happening?
If we have a small example, it'd be interesting to check that adding friend
does not cause the problem to resurface.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I'm not sure that there is no sequence of deserialization that might also lead to similar issue. The whole logic about which lazy desrialization and which decl will be canonical and finally definition is very fragile I think :(
Initially I was thinking about removing the whole if because it can transform canonical decl with body may not be a definition after deserialization. But removing this logic completely breaks this test case. But it seems that it was added only for friend decls and nothing else.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original comments confuses me. I feel it is not consistent (or at least not straight forward) to the codes. I agree the original mechanism is fragile.
Could you try to describe the problem in more detail? Let's try to fix the fundamental problem if possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ChuanqiXu9 and @ilya-biryukov I'm not sure that I fully understand the example test case. Do you understand what does the example really do?
Things unclear to me marked with // >>>
.
// RUN: %clang_cc1 -std=c++14 -fmodules %s -verify
// RUN: %clang_cc1 -std=c++14 -fmodules %s -verify -triple i686-windows
// expected-no-diagnostics
// >>> it looks like undocumented pragma?
#pragma clang module build A
// >>> Is it modulemap file syntax?
module A {}
#pragma clang module contents
#pragma clang module begin A
// >>> What is the context of this code, is it GMF or part of module A, is it implicitly exported?
template<typename T> struct ct { friend auto operator-(ct, ct) { struct X {}; return X(); } void x(); };
#pragma clang module end
#pragma clang module endbuild
#pragma clang module build B
module B {}
#pragma clang module contents
#pragma clang module begin B
// >>> Seems like it was part of GMF otherwise it is duplicating declaration?
template<typename T> struct ct { friend auto operator-(ct, ct) { struct X{}; return X(); } void x(); };
inline auto f() { return ct<float>() - ct<float>(); }
#pragma clang module end
#pragma clang module endbuild
// >>> Seems like here we are in different translation unit
// Force the definition of ct in module A to be the primary definition.
#pragma clang module import A
// >>> Why does it define something from module A !?
// >>> Should we see here error like: `declaration of 'x' in the global module follows declaration in module A`
template<typename T> void ct<T>::x() {}
// Attempt to cause the definition of operator- in the ct primary template in
// module B to be the primary definition of that function. If that happens,
// we'll be left with a class template ct that appears to not contain a
// definition of the inline friend function.
#pragma clang module import B
auto v = f();
ct<int> make();
void h() {
make() - make();
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah let's try to understand the fundamental issue and fix it fundamentally.
// >>> it looks like undocumented pragma?
This is the pragma for clang modules. I am not familiar with it too @ilya-biryukov is it possible to get some google forks familiar with this involved?
// >>> What is the context of this code, is it GMF or part of module A, is it implicitly exported?
It is not GMF, nor a named module in C++20 sense nor exported in C++20 sense. It is a clang module. So the concept of GMF
, named module and export ness (and the decl level) is not meaningful here.
// >>> Seems like it was part of GMF
It won't be a GMF for sure.
// >>> Why does it define something from module A !?
So given it is not the named module in C++20, so the ODR rule in C++20 doesn't apply here. Correct me if I am wrong. I feel clang modules implement the header's semantics, and the name of their modules are simply an annotation and doesn't have semantical effect. So I am not surprise that it is a valid code since it basically a concentration of two headers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// >>> it looks like undocumented pragma?
The pragma is coming from -frewrite-imports
, which is a way to generate multi-module reproducers based in a single file. It does the "obvious" thing: reads the module map until #pragma clang module contents
, then builds a unique PCM from the code up until #pragma clang module end
, which can later be imported by #pragma clang module import <Name>
, where <Name>
should match with what's written in #pragma clang module build <Name>
.
// >>> Is it modulemap file syntax?
Yes!
This testcase should probably be written to use split-file, it's clearly more common these days, so having these pragmas in tests unrelated to those pragmas seems like a bad idea.
re the Clang modules: @ChuanqiXu9's about Clang modules are correct here. It's okay to have multiple declarations in different Clang modules as long as they follow ODR (consist of the same tokens, etc). They will be "merged" together (I believe this is similar to what's happening when we import two C++20 modules and both have the same declaration in the GMF; but I'm on the contrary, not an expert in C++20 modules, so please correct me if I'm wrong).
The Clang modules are not a concatenation of two headers, rather implement "merging" of ASTs from those two headers, as if each class was defined with a unique header guard and consecutive #include
would only skip over the second one (in fact, we even have optimizations that skip over class tokens if the same class was already imported in some Clang module).
Please let me know if that helps and if there are still unclear things in that test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've sent #112380 to rewrite this test with split-file, hope that helps to get a better understanding of what's happening there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this is similar to what's happening when we import two C++20 modules and both have the same declaration in the GMF; but I'm on the contrary, not an expert in C++20 modules, so please correct me if I'm wrong
Correct relatively. But we forbid the checks for GMF (https://clang.llvm.org/docs/StandardCPlusPlusModules.html#object-definition-consistency) see reasons in the document.
@llvm/pr-subscribers-clang-modules @llvm/pr-subscribers-clang Author: Dmitry Polukhin (dmpolukhin) ChangesSummary: Test Plan: TBD (need to further reduce the reproducer) Full diff: https://github.com/llvm/llvm-project/pull/111992.diff 2 Files Affected:
diff --git a/clang/lib/Serialization/ASTReader.cpp b/clang/lib/Serialization/ASTReader.cpp
index e638129897692f..f2853dcf77ae94 100644
--- a/clang/lib/Serialization/ASTReader.cpp
+++ b/clang/lib/Serialization/ASTReader.cpp
@@ -10057,15 +10057,18 @@ void ASTReader::finishPendingActions() {
// For a function defined inline within a class template, force the
// canonical definition to be the one inside the canonical definition of
// the template. This ensures that we instantiate from a correct view
- // of the template.
+ // of the template. This behaviour seems to be important only for inline
+ // friend functions. For normal member functions, it might results in
+ // selecting canonical decl from module A but body from module B.
//
// Sadly we can't do this more generally: we can't be sure that all
// copies of an arbitrary class definition will have the same members
// defined (eg, some member functions may not be instantiated, and some
// special members may or may not have been implicitly defined).
- if (auto *RD = dyn_cast<CXXRecordDecl>(FD->getLexicalParent()))
- if (RD->isDependentContext() && !RD->isThisDeclarationADefinition())
- continue;
+ if (FD->getFriendObjectKind())
+ if (auto *RD = dyn_cast<CXXRecordDecl>(FD->getLexicalParent()))
+ if (RD->isDependentContext() && !RD->isThisDeclarationADefinition())
+ continue;
// FIXME: Check for =delete/=default?
const FunctionDecl *Defn = nullptr;
diff --git a/clang/test/Headers/crash-instantiated-in-scope-cxx-modules4.cpp b/clang/test/Headers/crash-instantiated-in-scope-cxx-modules4.cpp
new file mode 100644
index 00000000000000..087eb135dc5f53
--- /dev/null
+++ b/clang/test/Headers/crash-instantiated-in-scope-cxx-modules4.cpp
@@ -0,0 +1,110 @@
+// RUN: rm -fR %t
+// RUN: split-file %s %t
+// RUN: cd %t
+// RUN: %clang_cc1 -verify -std=c++20 -x c++ -fmodule-map-file=modules.map -fmodule-name=foo1 -emit-module modules.map -o foo1.pcm
+// RUN: %clang_cc1 -verify -std=c++20 -x c++ -fmodule-map-file=modules.map -fmodule-name=foo2 -emit-module modules.map -o foo2.pcm
+// RUN: %clang_cc1 -verify -std=c++20 -x c++ -fmodule-map-file=modules.map -fmodule-file=foo1.pcm -fmodule-file=foo2.pcm server.cc
+
+//--- functional
+#pragma once
+
+namespace std {
+template <class> class function {};
+} // namespace std
+
+//--- foo.h
+#pragma once
+
+class MethodHandler {
+ public:
+ virtual ~MethodHandler() {}
+ struct HandlerParameter {
+ HandlerParameter();
+ };
+ virtual void RunHandler(const HandlerParameter ¶m);
+};
+
+template <class RequestType, class ResponseType>
+class CallbackUnaryHandler : public MethodHandler {
+ public:
+ explicit CallbackUnaryHandler();
+
+ void RunHandler(const HandlerParameter ¶m) final {
+ void *call = nullptr;
+ (void)[call](bool){};
+ }
+};
+
+//--- foo1.h
+// expected-no-diagnostics
+#pragma once
+
+#include "functional"
+
+#include "foo.h"
+
+class A;
+
+class ClientAsyncResponseReaderHelper {
+ public:
+ using t = std::function<void(A)>;
+ static void SetupRequest(t finish);
+};
+
+//--- foo2.h
+// expected-no-diagnostics
+#pragma once
+
+#include "foo.h"
+
+template <class BaseClass>
+class a : public BaseClass {
+ public:
+ a() { [[maybe_unused]] CallbackUnaryHandler<int, int> a; }
+};
+
+//--- modules.map
+module "foo" {
+ export *
+ module "foo.h" {
+ export *
+ textual header "foo.h"
+ }
+}
+
+module "foo1" {
+ export *
+ module "foo1.h" {
+ export *
+ header "foo1.h"
+ }
+
+ use "foo"
+}
+
+module "foo2" {
+ export *
+ module "foo2.h" {
+ export *
+ header "foo2.h"
+ }
+
+ use "foo"
+}
+
+//--- server.cc
+// expected-no-diagnostics
+#include "functional"
+
+#include "foo1.h"
+#include "foo2.h"
+
+std::function<void()> on_emit;
+
+template <class RequestType, class ResponseType>
+class CallbackUnaryHandler;
+
+class s {};
+class hs final : public a<s> {
+ explicit hs() {}
+};
|
@dmpolukhin I am still confusing about the problem. I mean, why your previous patch will break the reproducer and why this patch can "fix" it? I feel the current patch is somewhat workaround. It's not your fault. The original codes are somewhat tricky already. But let's try to find methods to fix it more fundamentally. |
Instead of the pragmas, which are less familiar to people. This is a follow-up of a discussion from #111992.
I understand why my previous patch broke Google's reproducer I attached to this PR. It happens because it deserializes eagerly lambda classes from the module that should be chosen as a pattern for template instantiation (function
skips setting body for the function so it cannot be used as the pattern and function body from different module is selected that now doesn't match the lambda. As I wrote in #109167 for captured variables to work correctly we need to make sure that enclosing function and lambda are coming from the same module. This code in some sense is doing similar thing, it forces canonical definition for the function defined inline within a class template to be from the same module as class template. What I don't understand why clang cannot handle inline friend function coming from one module but class template coming from another module. It works in cases of member functions but friend inline function inside class template are special in this sense and template instantiation cannot deduce return type. I agree that my code is just reduces scope of previous hack to work only in cases when it is really required (we have test case). For the full solution I think we have two options: (1) figure out why inline friend function and template have to be from the same module, relax this requirement and remove this check completely or (2) make sure that canonical decls for class template, inline functions and potential lambdas are always chosen from the same module (my code from #109167 allows deserializing related decls from the same module if needed). I'm exploring option (1) at the moment but template instantiation code is not very familiar for me so it might take some time. |
Thanks for the analysis. I feel (2) is easier to me IIUC. I feel it is a natural extension of your previous work. WDYT? |
7c2fbfe
to
1270392
Compare
I implemented this approach and it works. I tried to limit eager deserialization to the minimal set of functions (at the moment it is friend inline friend functions). PTAL. |
LGTM. And a minor point is, the term "eager deserialization" makes me to think you're going to deserialize the decls eagerly, which made me shocked initially : ) let's wait for @ilya-biryukov and see if he can give a test for this on google's workloads before landing |
@ilya-biryukov could you please take a look? |
Sorry for the late reply, was on vacation. I'll ask @alexfh to run this through our release testing and report if there are any bugs left. |
FYI, in our internal testing we also found a case when this PR fixes issue. Also we don't see any examples when it may break something so I think we should merge it ASAP because it fixes known issues and don't have any known regressions. @alexfh what is the ETA for release testing? |
Sorry, we've been in a bit of a firefighting. I'll try to arrange some time for this till the EOW. |
Update: we've found a failure with Sadly, because this stopped further release testing, we don't know if it's more widespread. @alexfh will rerun the full testing with the fix and we'll report back on the progress again. PS sorry for the delays, as Alex has mentioned there's been a lot of other background issues we were firefighting that slowed down testing for this particular issue. |
Thank you for the update. I do think this issue is not directly related to this change and most probably just pre-existing issue detected. FYI, I recently disable ODR checks for header units like it was for named modules, see #111160 I don't know what happens for clang modules perhaps they also should be excluded from ODR check. In long run I would like to enable ODR check but with exclusion for common issue like set of visible overloaded functions visible that is hard to control and benign ODR violations. |
@alexfh and @ilya-biryukov do you have any updates? |
Sorry, still trying to reproduce this and refraining from responding every day because I feel very close to getting a repro I can share. But that has been taking more time than I expected, please bear with me, I'm making progress. |
I do believe that the issue about ODR violation cannot be caused by this change itself. I suggest merging this PR as is and I'll take a look at the reproducer as soon as you create it. This PR fixes known issues with reproducers so I'm in doubt what makes this change so special that we cannot land it as normal PRs. Said that I think we should invest into a tool to reduce issues in case of modules as we discussed. I also have a reproducer for C++ modules related issues that I cannot reduce for almost 2 weeks already. It slows us down a lot :( |
I fully agree, both getting the reproducers and reducing them later is such a pain that we cannot be efficient without that tooling. In the meantime, still making progress on this particular issues and still not done :( |
I have managed to get a small example, but it only crashes for me if I run it directly through our internal build system (not even if I use the exact same Clang binary). There are a few things that might affect that:
I suspect there are some hash tables somewhere and the order of something gets affected by those two factors above or something else and we end up loosing the repro. |
@ilya-biryukov unfortunately it seems that the reproducer is very sensitive to the exact clang revision. I tried it with current main b8eef18 and this PR on top of the revision. In both cases I don't see any problems at all (no errors, no warnings). So you might have to share exact revision and any patches on top of them if any. The warning/error about "inline function not defined" is very familiar to me and I'm reducing it now in our code. But I see it with and without my changes on many revisions. But depending on revision the warning moves from one function to another. My change alters deserialization order but the error itself is in something else that I was not able to identify yet. Our workaround is to modularize more and avoid merging definitions from several modules. In your example |
if (auto *RD = dyn_cast<CXXRecordDecl>(FD->getLexicalParent())) | ||
if (RD->isDependentContext() && !RD->isThisDeclarationADefinition()) | ||
continue; | ||
if (FD->getFriendObjectKind()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this is similar to what's happening when we import two C++20 modules and both have the same declaration in the GMF; but I'm on the contrary, not an expert in C++20 modules, so please correct me if I'm wrong
Correct relatively. But we forbid the checks for GMF (https://clang.llvm.org/docs/StandardCPlusPlusModules.html#object-definition-consistency) see reasons in the document.
@@ -10057,15 +10057,18 @@ void ASTReader::finishPendingActions() { | |||
// For a function defined inline within a class template, force the | |||
// canonical definition to be the one inside the canonical definition of | |||
// the template. This ensures that we instantiate from a correct view | |||
// of the template. | |||
// of the template. This behaviour seems to be important only for inline |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems
seems to not be a good term in comments.
@@ -1976,14 +1976,16 @@ TemplateDeclInstantiator::VisitFunctionTemplateDecl(FunctionTemplateDecl *D) { | |||
if (!InstParams) | |||
return nullptr; | |||
|
|||
// Use canonical templated decl because only canonical decl has body | |||
// if declarations were merged during loading from modules. | |||
FunctionDecl *TemplatedDecl = D->getTemplatedDecl()->getCanonicalDecl(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Such problems are pretty common and I thought if we can make it generally. And also I am wondering the definition for canonical decl. But this is not the problem of this patch. This won't make a blocker issue.
I roughly remember that the definition of CanonicalDecl
are the first declaration not the declaration has the definition. So may be it is better to write this as:
FunctionDecl *TemplatedDecl = D->getTemplatedDecl();
if (TemplatedDecl doesn't have a body)
iterate redecls of TemplatedDecl and assign the decl that has a body to `TemplatedDecl`.
(maybe we should have an interface like something like getRedeclWithDef()
if we don't have one)
And also I admit there some existing codes use getCanonicalDecl
to try to do the similar things, but they are their problems.
We have some testing results in and it's causing a large-scale breakage. I think it's all coming from 2 root causes, but it's hard to know for sure because there are many thousands of broken targets. The two problems I see are:
And a few things that seem minor at first glance, but would also need some checking.
And there may be some long tail that's not visible because these issues take all of my attention. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The two problems I see are:
- invalid
no matching constructor for initialization of
errors for some particular type,
I think it is due to applying my changes too wide (i.e. to normal member functions and use first decl for them). I hope I fixed it in the latest version.
- a crash during template instantiations for some other particular type.
In this case it is harder to make a reasonable guess but perhaps also the same root.
And a few things that seem minor at first glance, but would also need some checking.
- some errors due to too long recursive template instantiations that we didn't see before (probably correct and coming from instantiating those bodies that were previously skipped?)
- some compilation timeouts (probably just a flake or a slight increase that we should mitigate on our end).
And there may be some long tail that's not visible because these issues take all of my attention. I will try to get the reproducers for each of those. Or would it make sense to wait until you finish the review and do that for the new version?
Could you please try the latest version if it doesn't fix all issues, please create a reproducer to continue my investigation.
@@ -1976,14 +1976,16 @@ TemplateDeclInstantiator::VisitFunctionTemplateDecl(FunctionTemplateDecl *D) { | |||
if (!InstParams) | |||
return nullptr; | |||
|
|||
// Use canonical templated decl because only canonical decl has body | |||
// if declarations were merged during loading from modules. | |||
FunctionDecl *TemplatedDecl = D->getTemplatedDecl()->getCanonicalDecl(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FunctionDecl::hasBody
is doing what you described for getRedeclWithDef
, see https://github.com/llvm/llvm-project/blob/main/clang/lib/AST/Decl.cpp#L3163 and if I used it here or implement similar code here, it fails several clang tests. It is very unfortunate that Redeclarable
is used for two things: real redeclaration and AST merges. Moreover it look like there is no way to distinguish these two cases. I need to use different decl only if the decl was doesThisDeclarationHaveABody() == true
but it is no longer the case due to AST merge.
✅ With the latest revision this PR passed the C/C++ code formatter. |
We'll try the latest version and report back. |
FYI, I tested the latest version on our reproducers and it fixed all known issues due to friend inline functions. |
The latest version works like a charm for us, there aren't any bugs. I'll let @ChuanqiXu9 stamp the final approval for the code. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks for the long term waiting.
@ilya-biryukov I remember last time, the codes you shared to me actually based an open source project. I am wondering if you can open source some use for such projects (even for testing purpose). I think it will be pretty helpful to speedup the overall development process. |
…ical decl Summary: Fix crash from reproducer provided in llvm#109167 (comment) Test Plan: TBD
e2df412
to
5df865a
Compare
@ilya-biryukov and @ChuanqiXu9 thank you for your help with the review, testing and reproducers. Very appreciated! |
Yes, we should, the biggest problem right now is that our use of header modules is quite unique and is all internal. That being said, we're going to spend significant effort next year to make sure this improves and we can get small reproducers fast and help to surface and fix any problems that show up. We're in a fragile state right now and want to get to more solid ground with our use of modules.
Thanks a lot for fixing this. We really appreciate that too. And I hope we can get to a point where we can give you the reproducers much faster some time next year. |
Some code got rearranged after our initial testing, some errors slipped through and we started seeing some new errors after this landed. |
/// nodes that don't exist in the function. | ||
llvm::DenseMap<GlobalDeclID, SmallVector<GlobalDeclID, 4>> | ||
FunctionToLambdasMap; | ||
/// Mapping from main decl ID to the related decls IDs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the main decl? I can image some definitions, but it would be nice to have some clarifications included in the comment.
@@ -798,6 +798,17 @@ void ASTDeclWriter::VisitFunctionDecl(FunctionDecl *D) { | |||
} | |||
} | |||
|
|||
if (D->getFriendObjectKind()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As stated in #122493 (comment), this is intended. It would be helpful to include comments explaining why this logic is applied exclusively to friend objects.
…y a declaration during AST deserialization (llvm#132214) Fix for regression llvm#130917, changes in llvm#111992 were too broad. This change reduces scope of previous fix. Added `ExternalASTSource::wasThisDeclarationADefinition` to detect cases when FunctionDecl lost body due to declaration merges.
Summary:
Fix crash from reproducer provided in
#109167 (comment)
Also fix issues with merged inline friend functions merged during deserialization.
Test Plan: check-clang