Skip to content

[clang][deps] Make dependency directives getter thread-safe #136178

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Apr 23, 2025

Conversation

jansvoboda11
Copy link
Contributor

@jansvoboda11 jansvoboda11 commented Apr 17, 2025

This PR fixes two issues in one go:

  1. The dependency directives getter (a std::function) was being stored in PreprocessorOptions. This goes against the principle where the options classes are supposed to be value-objects representing the -cc1 command line arguments. This is fixed by moving the getter directly to CompilerInstance and propagating it explicitly.
  2. The getter was capturing the ScanInstance VFS. That's fine in synchronous implicit module builds where the same VFS instance is used throughout, but breaks down once you try to build modules asynchronously (which forces the use of separate VFS instances). This is fixed by explicitly passing a FileManager into the getter and extracting the right instance of the scanning VFS out of it.

@llvmbot llvmbot added clang Clang issues not falling into any other category clang:frontend Language frontend issues, e.g. anything involving "Sema" labels Apr 17, 2025
@llvmbot
Copy link
Member

llvmbot commented Apr 17, 2025

@llvm/pr-subscribers-clang

Author: Jan Svoboda (jansvoboda11)

Changes

This PR fixes two issues in one go:

  1. The dependency directives getter (a std::function) was being stored in PreprocessorOptions. The goes against the principle where the options classes are supposed to be value-objects representing the -cc1 command line arguments. This is fixed by moving the getter directly to CompilerInstance and propagating it explicitly.
  2. The getter was capturing the ScanInstance VFS. That's fine in synchronous implicit module builds where the same VFS instance is used throughout, but breaks down once you try to build modules asynchronously (which forces the use of separate VFS instances). This is fixed by explicitly passing a FileManager into the getter and extracting the right instance of the scanning VFS out of it.

Full diff: https://github.com/llvm/llvm-project/pull/136178.diff

9 Files Affected:

  • (modified) clang/include/clang/Frontend/CompilerInstance.h (+8)
  • (modified) clang/include/clang/Lex/DependencyDirectivesScanner.h (+6)
  • (modified) clang/include/clang/Lex/Preprocessor.h (+10)
  • (modified) clang/include/clang/Lex/PreprocessorOptions.h (-13)
  • (modified) clang/include/clang/Tooling/DependencyScanning/DependencyScanningFilesystem.h (+10)
  • (modified) clang/lib/Frontend/CompilerInstance.cpp (+5)
  • (modified) clang/lib/Lex/PPLexerChange.cpp (+4-10)
  • (modified) clang/lib/Tooling/DependencyScanning/DependencyScanningWorker.cpp (+52-9)
  • (modified) clang/unittests/Lex/PPDependencyDirectivesTest.cpp (+5-7)
diff --git a/clang/include/clang/Frontend/CompilerInstance.h b/clang/include/clang/Frontend/CompilerInstance.h
index 41dc7f1fef461..ecd1f5cabc79e 100644
--- a/clang/include/clang/Frontend/CompilerInstance.h
+++ b/clang/include/clang/Frontend/CompilerInstance.h
@@ -16,6 +16,7 @@
 #include "clang/Frontend/CompilerInvocation.h"
 #include "clang/Frontend/PCHContainerOperations.h"
 #include "clang/Frontend/Utils.h"
+#include "clang/Lex/DependencyDirectivesScanner.h"
 #include "clang/Lex/HeaderSearchOptions.h"
 #include "clang/Lex/ModuleLoader.h"
 #include "llvm/ADT/ArrayRef.h"
@@ -99,6 +100,9 @@ class CompilerInstance : public ModuleLoader {
   /// The cache of PCM files.
   IntrusiveRefCntPtr<ModuleCache> ModCache;
 
+  /// Function for getting the dependency preprocessor directives of a file.
+  GetDependencyDirectivesFn GetDependencyDirectives;
+
   /// The preprocessor.
   std::shared_ptr<Preprocessor> PP;
 
@@ -697,6 +701,10 @@ class CompilerInstance : public ModuleLoader {
   /// and replace any existing one with it.
   void createPreprocessor(TranslationUnitKind TUKind);
 
+  void setDependencyDirectivesGetter(GetDependencyDirectivesFn Fn) {
+    GetDependencyDirectives = Fn;
+  }
+
   std::string getSpecificModuleCachePath(StringRef ModuleHash);
   std::string getSpecificModuleCachePath() {
     return getSpecificModuleCachePath(getInvocation().getModuleHash());
diff --git a/clang/include/clang/Lex/DependencyDirectivesScanner.h b/clang/include/clang/Lex/DependencyDirectivesScanner.h
index 0e115906fbfe5..c975311f8bf33 100644
--- a/clang/include/clang/Lex/DependencyDirectivesScanner.h
+++ b/clang/include/clang/Lex/DependencyDirectivesScanner.h
@@ -21,6 +21,7 @@
 #include "llvm/ADT/ArrayRef.h"
 
 namespace clang {
+class FileManager;
 
 namespace tok {
 enum TokenKind : unsigned short;
@@ -135,6 +136,11 @@ void printDependencyDirectivesAsSource(
     ArrayRef<dependency_directives_scan::Directive> Directives,
     llvm::raw_ostream &OS);
 
+// FIXME: Allow returning an error.
+using GetDependencyDirectivesFn = std::function<
+    std::optional<ArrayRef<dependency_directives_scan::Directive>>(
+        FileManager &, FileEntryRef)>;
+
 } // end namespace clang
 
 #endif // LLVM_CLANG_LEX_DEPENDENCYDIRECTIVESSCANNER_H
diff --git a/clang/include/clang/Lex/Preprocessor.h b/clang/include/clang/Lex/Preprocessor.h
index 24bb524783e93..8554068b19607 100644
--- a/clang/include/clang/Lex/Preprocessor.h
+++ b/clang/include/clang/Lex/Preprocessor.h
@@ -140,6 +140,12 @@ class Preprocessor {
   friend class VariadicMacroScopeGuard;
 
   llvm::unique_function<void(const clang::Token &)> OnToken;
+  /// Function for getting the dependency preprocessor directives of a file.
+  ///
+  /// These are directives derived from a special form of lexing where the
+  /// source input is scanned for the preprocessor directives that might have an
+  /// effect on the dependencies for a compilation unit.
+  GetDependencyDirectivesFn GetDependencyDirectives;
   const PreprocessorOptions &PPOpts;
   DiagnosticsEngine        *Diags;
   const LangOptions &LangOpts;
@@ -1326,6 +1332,10 @@ class Preprocessor {
     OnToken = std::move(F);
   }
 
+  void setDependencyDirectivesGetter(GetDependencyDirectivesFn Fn) {
+    GetDependencyDirectives = Fn;
+  }
+
   void setPreprocessToken(bool Preprocess) { PreprocessToken = Preprocess; }
 
   bool isMacroDefined(StringRef Id) {
diff --git a/clang/include/clang/Lex/PreprocessorOptions.h b/clang/include/clang/Lex/PreprocessorOptions.h
index c2e3d68333024..d4c4e1ccbf2c4 100644
--- a/clang/include/clang/Lex/PreprocessorOptions.h
+++ b/clang/include/clang/Lex/PreprocessorOptions.h
@@ -189,19 +189,6 @@ class PreprocessorOptions {
   /// with support for lifetime-qualified pointers.
   ObjCXXARCStandardLibraryKind ObjCXXARCStandardLibrary = ARCXX_nolib;
 
-  /// Function for getting the dependency preprocessor directives of a file.
-  ///
-  /// These are directives derived from a special form of lexing where the
-  /// source input is scanned for the preprocessor directives that might have an
-  /// effect on the dependencies for a compilation unit.
-  ///
-  /// Enables a client to cache the directives for a file and provide them
-  /// across multiple compiler invocations.
-  /// FIXME: Allow returning an error.
-  std::function<std::optional<ArrayRef<dependency_directives_scan::Directive>>(
-      FileEntryRef)>
-      DependencyDirectivesForFile;
-
   /// Set up preprocessor for RunAnalysis action.
   bool SetUpStaticAnalyzer = false;
 
diff --git a/clang/include/clang/Tooling/DependencyScanning/DependencyScanningFilesystem.h b/clang/include/clang/Tooling/DependencyScanning/DependencyScanningFilesystem.h
index d12814e7c9253..f07bd5e820dab 100644
--- a/clang/include/clang/Tooling/DependencyScanning/DependencyScanningFilesystem.h
+++ b/clang/include/clang/Tooling/DependencyScanning/DependencyScanningFilesystem.h
@@ -371,6 +371,16 @@ class DependencyScanningWorkerFilesystem
   /// false if not (i.e. this entry is not a file or its scan fails).
   bool ensureDirectiveTokensArePopulated(EntryRef Entry);
 
+  /// \returns The scanned preprocessor directive tokens of the file that are
+  /// used to speed up preprocessing, if available.
+  std::optional<ArrayRef<dependency_directives_scan::Directive>>
+  getDirectiveTokens(const Twine &Path) {
+    if (llvm::ErrorOr<EntryRef> Entry = getOrCreateFileSystemEntry(Path.str()))
+      if (ensureDirectiveTokensArePopulated(*Entry))
+        return Entry->getDirectiveTokens();
+    return std::nullopt;
+  }
+
   /// Check whether \p Path exists. By default checks cached result of \c
   /// status(), and falls back on FS if unable to do so.
   bool exists(const Twine &Path) override;
diff --git a/clang/lib/Frontend/CompilerInstance.cpp b/clang/lib/Frontend/CompilerInstance.cpp
index 243e0a3c15b05..0efe0ada4873f 100644
--- a/clang/lib/Frontend/CompilerInstance.cpp
+++ b/clang/lib/Frontend/CompilerInstance.cpp
@@ -535,6 +535,9 @@ void CompilerInstance::createPreprocessor(TranslationUnitKind TUKind) {
                            /*ShowAllHeaders=*/true, /*OutputPath=*/"",
                            /*ShowDepth=*/true, /*MSStyle=*/true);
   }
+
+  if (GetDependencyDirectives)
+    PP->setDependencyDirectivesGetter(GetDependencyDirectives);
 }
 
 std::string CompilerInstance::getSpecificModuleCachePath(StringRef ModuleHash) {
@@ -1237,6 +1240,8 @@ std::unique_ptr<CompilerInstance> CompilerInstance::cloneForModuleCompileImpl(
   // Make a copy for the new instance.
   Instance.FailedModules = FailedModules;
 
+  Instance.GetDependencyDirectives = GetDependencyDirectives;
+
   // If we're collecting module dependencies, we need to share a collector
   // between all of the module CompilerInstances. Other than that, we don't
   // want to produce any dependency output from the module build.
diff --git a/clang/lib/Lex/PPLexerChange.cpp b/clang/lib/Lex/PPLexerChange.cpp
index a373a52506a24..09ef921f67202 100644
--- a/clang/lib/Lex/PPLexerChange.cpp
+++ b/clang/lib/Lex/PPLexerChange.cpp
@@ -92,16 +92,10 @@ bool Preprocessor::EnterSourceFile(FileID FID, ConstSearchDirIterator CurDir,
   }
 
   Lexer *TheLexer = new Lexer(FID, *InputFile, *this, IsFirstIncludeOfFile);
-  if (getPreprocessorOpts().DependencyDirectivesForFile &&
-      FID != PredefinesFileID) {
-    if (OptionalFileEntryRef File = SourceMgr.getFileEntryRefForID(FID)) {
-      if (std::optional<ArrayRef<dependency_directives_scan::Directive>>
-              DepDirectives =
-                  getPreprocessorOpts().DependencyDirectivesForFile(*File)) {
-        TheLexer->DepDirectives = *DepDirectives;
-      }
-    }
-  }
+  if (GetDependencyDirectives && FID != PredefinesFileID)
+    if (OptionalFileEntryRef File = SourceMgr.getFileEntryRefForID(FID))
+      if (auto MaybeDepDirectives = GetDependencyDirectives(FileMgr, *File))
+        TheLexer->DepDirectives = *MaybeDepDirectives;
 
   EnterSourceFileWithLexer(TheLexer, CurDir);
   return false;
diff --git a/clang/lib/Tooling/DependencyScanning/DependencyScanningWorker.cpp b/clang/lib/Tooling/DependencyScanning/DependencyScanningWorker.cpp
index 6595f8ff5dc55..a9e35e0484ac9 100644
--- a/clang/lib/Tooling/DependencyScanning/DependencyScanningWorker.cpp
+++ b/clang/lib/Tooling/DependencyScanning/DependencyScanningWorker.cpp
@@ -357,6 +357,57 @@ static void canonicalizeDefines(PreprocessorOptions &PPOpts) {
   std::swap(PPOpts.Macros, NewMacros);
 }
 
+static GetDependencyDirectivesFn makeDepDirectivesGetter() {
+  /// This is a functor that conforms to \c GetDependencyDirectivesFn.
+  /// It ensures it's always invoked with the same \c FileManager and caches the
+  /// extraction of the scanning VFS for better performance.
+  struct DepDirectivesGetter {
+    DepDirectivesGetter() : DepFS(nullptr), FM(nullptr) {}
+
+    /// It's important copies do not carry over the cached members. The copies
+    /// are likely to be used from distinct \c CompilerInstance objects with
+    /// distinct \c FileManager \c llvm::vfs::FileSystem.
+    DepDirectivesGetter(const DepDirectivesGetter &)
+        : DepFS(nullptr), FM(nullptr) {}
+    DepDirectivesGetter &operator=(const DepDirectivesGetter &) {
+      DepFS = nullptr;
+      FM = nullptr;
+      return *this;
+    }
+
+    auto operator()(FileManager &FileMgr, FileEntryRef File) {
+      ensureConsistentFileManager(FileMgr);
+      ensurePopulatedFileSystem(FileMgr);
+      return DepFS->getDirectiveTokens(File.getName());
+    }
+
+  private:
+    DependencyScanningWorkerFilesystem *DepFS;
+    FileManager *FM;
+
+    void ensureConsistentFileManager(FileManager &FileMgr) {
+      if (!FM)
+        FM = &FileMgr;
+      assert(&FileMgr == FM);
+    }
+
+    void ensurePopulatedFileSystem(FileManager &FM) {
+      if (DepFS)
+        return;
+      FM.getVirtualFileSystem().visit([&](llvm::vfs::FileSystem &FS) {
+        auto *DFS = llvm::dyn_cast<DependencyScanningWorkerFilesystem>(&FS);
+        if (DFS) {
+          assert(!DepFS && "Found multiple scanning VFSs");
+          DepFS = DFS;
+        }
+      });
+      assert(DepFS && "Did not find scanning VFS");
+    }
+  };
+
+  return DepDirectivesGetter{};
+}
+
 /// A clang tool that runs the preprocessor in a mode that's optimized for
 /// dependency scanning for the given compiler invocation.
 class DependencyScanningAction : public tooling::ToolAction {
@@ -433,15 +484,7 @@ class DependencyScanningAction : public tooling::ToolAction {
       if (!ModulesCachePath.empty())
         DepFS->setBypassedPathPrefix(ModulesCachePath);
 
-      ScanInstance.getPreprocessorOpts().DependencyDirectivesForFile =
-          [LocalDepFS = DepFS](FileEntryRef File)
-          -> std::optional<ArrayRef<dependency_directives_scan::Directive>> {
-        if (llvm::ErrorOr<EntryRef> Entry =
-                LocalDepFS->getOrCreateFileSystemEntry(File.getName()))
-          if (LocalDepFS->ensureDirectiveTokensArePopulated(*Entry))
-            return Entry->getDirectiveTokens();
-        return std::nullopt;
-      };
+      ScanInstance.setDependencyDirectivesGetter(makeDepDirectivesGetter());
     }
 
     // Create a new FileManager to match the invocation's FileSystemOptions.
diff --git a/clang/unittests/Lex/PPDependencyDirectivesTest.cpp b/clang/unittests/Lex/PPDependencyDirectivesTest.cpp
index 03f1432d990cb..c52d5091eabfb 100644
--- a/clang/unittests/Lex/PPDependencyDirectivesTest.cpp
+++ b/clang/unittests/Lex/PPDependencyDirectivesTest.cpp
@@ -105,8 +105,7 @@ TEST_F(PPDependencyDirectivesTest, MacroGuard) {
   };
   SmallVector<std::unique_ptr<DepDirectives>> DepDirectivesObjects;
 
-  auto getDependencyDirectives = [&](FileEntryRef File)
-      -> std::optional<ArrayRef<dependency_directives_scan::Directive>> {
+  auto GetDependencyDirectives = [&](FileManager &FileMgr, FileEntryRef File) {
     DepDirectivesObjects.push_back(std::make_unique<DepDirectives>());
     StringRef Input = (*FileMgr.getBufferForFile(File))->getBuffer();
     bool Err = scanSourceForDependencyDirectives(
@@ -117,11 +116,6 @@ TEST_F(PPDependencyDirectivesTest, MacroGuard) {
   };
 
   PreprocessorOptions PPOpts;
-  PPOpts.DependencyDirectivesForFile = [&](FileEntryRef File)
-      -> std::optional<ArrayRef<dependency_directives_scan::Directive>> {
-    return getDependencyDirectives(File);
-  };
-
   HeaderSearchOptions HSOpts;
   TrivialModuleLoader ModLoader;
   HeaderSearch HeaderInfo(HSOpts, SourceMgr, Diags, LangOpts, Target.get());
@@ -130,6 +124,10 @@ TEST_F(PPDependencyDirectivesTest, MacroGuard) {
                   /*OwnsHeaderSearch =*/false);
   PP.Initialize(*Target);
 
+  PP.setDependencyDirectivesGetter([&](FileManager &FM, FileEntryRef File) {
+    return GetDependencyDirectives(FM, File);
+  });
+
   SmallVector<StringRef> IncludedFiles;
   PP.addPPCallbacks(std::make_unique<IncludeCollector>(PP, IncludedFiles));
   PP.EnterMainSourceFile();

Copy link
Member

@cyndyishida cyndyishida left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, though you may want to wait a bit longer for someone more familiar to also take a look.

@jansvoboda11 jansvoboda11 merged commit 060f3f0 into llvm:main Apr 23, 2025
6 of 10 checks passed
@jansvoboda11 jansvoboda11 deleted the get-dep-directive-tokens branch April 23, 2025 17:33
jansvoboda11 added a commit to jansvoboda11/llvm-project that referenced this pull request Apr 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:frontend Language frontend issues, e.g. anything involving "Sema" clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants