Skip to content

[clang][modules] HeaderSearch::MarkFileModuleHeader sets textual headers' HeaderFileInfo non-external when it shouldn't #89005

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

ian-twilightcoder
Copy link
Contributor

@ian-twilightcoder ian-twilightcoder commented Apr 17, 2024

HeaderSearch::MarkFileModuleHeader is no longer properly checking for no-changes, and so sets the HeaderFileInfo for every textual header to non-external.

@llvmbot llvmbot added clang Clang issues not falling into any other category clang:frontend Language frontend issues, e.g. anything involving "Sema" labels Apr 17, 2024
@llvmbot
Copy link
Member

llvmbot commented Apr 17, 2024

@llvm/pr-subscribers-clang

Author: Ian Anderson (ian-twilightcoder)

Changes

HeaderSearch::MarkFileModuleHeader is no longer properly checking for no-changes, and so creates a new HeaderFileInfo for every textual header, causes PCM use to go ballistic.


Full diff: https://github.com/llvm/llvm-project/pull/89005.diff

1 Files Affected:

  • (modified) clang/lib/Lex/HeaderSearch.cpp (+9-1)
diff --git a/clang/lib/Lex/HeaderSearch.cpp b/clang/lib/Lex/HeaderSearch.cpp
index 0632882b296146..9409b97ba0e64a 100644
--- a/clang/lib/Lex/HeaderSearch.cpp
+++ b/clang/lib/Lex/HeaderSearch.cpp
@@ -1313,6 +1313,14 @@ OptionalFileEntryRef HeaderSearch::LookupSubframeworkHeader(
 // File Info Management.
 //===----------------------------------------------------------------------===//
 
+static bool
+headerFileInfoModuleBitsMatchRole(const HeaderFileInfo *HFI,
+                                  ModuleMap::ModuleHeaderRole Role) {
+  return (HFI->isModuleHeader == ModuleMap::isModular(Role)) &&
+         (HFI->isTextualModuleHeader ==
+          ((Role & ModuleMap::TextualHeader) != 0));
+}
+
 static void mergeHeaderFileInfoModuleBits(HeaderFileInfo &HFI,
                                           bool isModuleHeader,
                                           bool isTextualModuleHeader) {
@@ -1432,7 +1440,7 @@ void HeaderSearch::MarkFileModuleHeader(FileEntryRef FE,
     if ((Role & ModuleMap::ExcludedHeader))
       return;
     auto *HFI = getExistingFileInfo(FE);
-    if (HFI && HFI->isModuleHeader)
+    if (HFI && headerFileInfoModuleBitsMatchRole(HFI, Role))
       return;
   }
 

@ian-twilightcoder
Copy link
Contributor Author

I don't really know a great way to write a test for this. If someone could point me at a similar existing test or has an idea how to write a new test that would be much appreciated.

@ian-twilightcoder
Copy link
Contributor Author

@ilya-biryukov can you check that this fixes your running out of source location space problem please?

@jansvoboda11
Copy link
Contributor

As a test, maybe you could probe the resulting PCM with -module-file-info.

@ilya-biryukov
Copy link
Contributor

@ilya-biryukov can you check that this fixes your running out of source location space problem please?

Just tried it. The patch as is did not help.
I've also tried changing the previous line to getExistingFileInfo(, /*WantExternal=*/false) and it didn't help either.

Changing from if ((Role & ModuleMap::ExcludedHeader)) back to if (!ModuleMap::isModular(Role)) does help, though, but that clearly leads to an incorrect behavior as far as the code is concerned.

@jansvoboda11
Copy link
Contributor

getExistingFileInfo(, /WantExternal=/false)

Until recently that function still consulted ExternalSource for HeaderFileInfo that IsValid && !External && !Resolved. Maybe you want to try the new getExistingLocalFileInfo()?

@ian-twilightcoder
Copy link
Contributor Author

@ilya-biryukov can you check that this fixes your running out of source location space problem please?

Just tried it. The patch as is did not help. I've also tried changing the previous line to getExistingFileInfo(, /*WantExternal=*/false) and it didn't help either.

Changing from if ((Role & ModuleMap::ExcludedHeader)) back to if (!ModuleMap::isModular(Role)) does help, though, but that clearly leads to an incorrect behavior as far as the code is concerned.

Does it not help because headerFileInfoModuleBitsMatchRole is returning false? The previous code was doing WantExternal=true so I don't think we want getExistingLocalFileInfo() here? I think if we used getExistingLocalFileInfo(), we'd get nullptr back more often and fall down into the getFileInfo() case more often wouldn't we?

@ian-twilightcoder
Copy link
Contributor Author

Ok this one should fix the logic I think. @ilya-biryukov can you give this a try please?

@ian-twilightcoder
Copy link
Contributor Author

As a test, maybe you could probe the resulting PCM with -module-file-info.

What would I be looking for? Presumably the problem is we import the same module twice but fail to find the built pcm in the module cache and so we build it again? I don't know what else would cause runaway growth of pcm files... I actually don't really understand the problem Ilya is hitting... But Richard found (2 now!) good bugs in my code and I'm hoping fixing that will fix Ilya's problem.

@jansvoboda11
Copy link
Contributor

Also note that ASTWriter uses this logic in couple of places to avoid serializing unrelated headers:

if (!HFI || (HFI->isModuleHeader && !HFI->isCompilingModuleHeader))
  continue;

Since textual headers from other modules have isModuleHeader=false and isCompilingModuleHeader=false after #83660 we always serialize them, even if we just implicitly found their module map and never entered them. I didn't check how this patch interacts with that logic, just wanted to surface this.

@ian-twilightcoder
Copy link
Contributor Author

Also note that ASTWriter uses this logic in couple of places to avoid serializing unrelated headers:

if (!HFI || (HFI->isModuleHeader && !HFI->isCompilingModuleHeader))
  continue;

Since textual headers from other modules have isModuleHeader=false and isCompilingModuleHeader=false after #83660 we always serialize them, even if we just implicitly found their module map and never entered them. I didn't check how this patch interacts with that logic, just wanted to surface this.

#83660 shouldn't affect that logic. isModuleHeader and isCompilingModuleHeader should always have the same values after that change. It's supposed to just add an extra isTextualModuleHeader without changing any of the other bits.

@ian-twilightcoder
Copy link
Contributor Author

Ok this one should fix the logic I think. @ilya-biryukov can you give this a try please?

@ilya-biryukov sorry to ping you again, just nobody else knows how to test this.

@sam-mccall sam-mccall self-assigned this Apr 19, 2024
@sam-mccall
Copy link
Collaborator

Ilya is out on vacation, I'm able to test this and will get started on that now (apologies for delay & thanks for digging into this)

@sam-mccall
Copy link
Collaborator

Unfortunately with this patch I'm still seeing the same source-location-exhausted error.
I'm going to try to understand why...

@ian-twilightcoder
Copy link
Contributor Author

Unfortunately with this patch I'm still seeing the same source-location-exhausted error. I'm going to try to understand why...

Damn I really thought that was going to be the one. If you have any way I could reproduce or better write a test that would be great.

@zygoloid
Copy link
Collaborator

Unfortunately with this patch I'm still seeing the same source-location-exhausted error.

Can you try applying #89428 as well? I think we would need both in order to prune the module map from the serialized SLocEntries.

@sam-mccall
Copy link
Collaborator

Hmm, I locally reverted #87849 and still see the same issue.
I'll patch #89428 instead, but I don't see how it could do better - it's just putting the old and new behavior of #87849 behind a flag, right?

In any case I'll try it, and then work on constructing a reproducer (so far I've been mostly treating this as a black box)

@jansvoboda11
Copy link
Contributor

#89441 might be of interest too.

@sam-mccall
Copy link
Collaborator

#89441 fixes our build problems, with or without this patch applied.

It's possible this patch makes things better - I haven't checked for actual sloc usage yet, just whether the build fails.

So I'll focus on understanding that one and then return here.

(I think I'm close to a sharable reproducer too)

@jansvoboda11
Copy link
Contributor

Just so you're aware, it's possible that #87848 introduced some unexpected behavior change. It would be good to check if your issues are caused just by one patch in isolation or by a combination of more patches that landed recently.

@sam-mccall
Copy link
Collaborator

Thanks for the pointer to 87848 - reverting that one locally doesn't help though, even in combination with applying #89005 and #89428. So this change isn't on the critical path to fixing our builds, but still much appreciated and will take a look now.


Unsurprisingly, the builds that hit this limit are big and slow :-)

Here's a reproducer that seems to capture our case well: modulegen.zip
gen.sh in there creates a bunch of modules under modules_repro, modules_repro/build.sh builds them with $CLANG, and print.sh shows the sloc usage.

Good
<stdin>:2:23: remark: source manager location address space usage: [-Rsloc-usage]
    2 | #pragma clang __debug sloc_usage
      |                       ^
note: 14498B in local locations, 1447248B in locations loaded from AST files, for a total of 1461746B (0% of available space)
<built-in>:1:1: note: file entered 202 times using 1451554B of space
    1 | # 1 "<built-in>" 3
      | ^
<stdin>:1:1: note: file entered 1 time using 112B of space
    1 | #include "/usr/local/google/home/sammccall/modulegen/modules_repro/m1/header"
      | ^
Bad
<stdin>:2:23: remark: source manager location address space usage: [-Rsloc-usage]
    2 | #pragma clang __debug sloc_usage
      |                       ^
note: 14537B in local locations, 3956449B in locations loaded from AST files, for a total of 3970986B (0% of available space)
/usr/local/google/home/sammccall/modulegen/modules_repro/constant.modulemap:1:1: note: file entered 100 times using 2505300B of space
    1 | module "constant" {
      | ^
<built-in>:1:1: note: file entered 202 times using 1455493B of space
    1 | # 1 "<built-in>" 3
      | ^
<stdin>:1:1: note: file entered 1 time using 112B of space
    1 | #include "/usr/local/google/home/sammccall/modulegen/modules_repro/m1/header"

@ian-twilightcoder ian-twilightcoder changed the title [clang][modules] HeaderSearch::MarkFileModuleHeader creates extra HeaderFileInfo, breaks PCM reuse [clang][modules] HeaderSearch::MarkFileModuleHeader sets textual headers' HeaderFileInfo non-external when it shouldn't Apr 22, 2024
@ian-twilightcoder
Copy link
Contributor Author

I guess #89441 is the real issue that #83660 caused, but we should still fix this for correctness, even though I'm not sure what the visible consequence is of this bug.

@ian-twilightcoder
Copy link
Contributor Author

From Jan:

Maybe we can add the number of external HeaderFileInfos to HeaderSearch::PrintStats() and have a test that checks for it.
The flag to enable that output is -show-stats I think.

@ian-twilightcoder
Copy link
Contributor Author

From Jan:

Maybe we can add the number of external HeaderFileInfos to HeaderSearch::PrintStats() and have a test that checks for it.
The flag to enable that output is -show-stats I think.

We tried this, but it seems that by the time the modules compile, everything ends up being local instead of external anyway. I think a unit test is probably good enough for this one since we aren't even sure what the practical consequence is of this bug.

Copy link

github-actions bot commented Jun 14, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

@ian-twilightcoder ian-twilightcoder force-pushed the header-file-info-reuse branch 3 times, most recently from bdb3519 to eb296a4 Compare June 14, 2024 04:52
…ers' HeaderFileInfo non-external when it shouldn't

HeaderSearch::MarkFileModuleHeader is no longer properly checking for no-changes, and so sets the HeaderFileInfo for every `textual header` to non-external.
Copy link
Contributor

@jansvoboda11 jansvoboda11 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@ian-twilightcoder
Copy link
Contributor Author

ian-twilightcoder commented Jun 15, 2024

The only test failure is in CodeGen/PowerPC/subreg-lanemasks.mir and doesn't look related to this change.

@ian-twilightcoder ian-twilightcoder merged commit 29b0d57 into llvm:main Jun 15, 2024
5 of 7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:frontend Language frontend issues, e.g. anything involving "Sema" clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants