Skip to content

Conversation

@qiongsiwu
Copy link
Contributor

@qiongsiwu qiongsiwu commented Sep 22, 2025

This PR implements a new class CompilerInstanceWithContext to speedup dependency scanning by module names.

We observed situations (e.g. during Swift dependency scanning) where we query the dependency of a module by name, while holding the current working directory and the command line input constant. The current scanning algorithm creates a new instance of CompilerInstance every time when it scans for a new name. Not only is the repeated creation of compiler instance wasteful, we also lose information already obtained in earlier queries (for example information cached in HeaderSearch).

This PR creates a CompilerInstaneWithContext class. This class contains all the context a compiler instance may need to perform dependency scanning, supporting multiple by name queries and keeping an instance of CompilerInstance alive and in a valid state. This way, we reduce the creation of redundant CompilerInstances. More importantly, we can reuse information accumulated in this CompilerInstance to reduce the cost of subsequent by name scans.

A key property of the implementation is that scanning no longer runs through a full CompilerInstance::ExecuteAction (which calls EndSourceFile which does not allow subsequent queries). Rather, only the necessary steps of dependency scanning are kept. This keeps the CompilerInstance in a state that it can perform the next query.

The implementation consists of two logical parts:

  1. The implementation of the class CompilerInstanceWithContext. Since this class shares code with DependencyScanningWorker, some DependenyScanningWorker code is moved around so the code can be shared.
  2. The adoption of CompilerInstanceWithContext in clang-scan-deps. New interfaces are implemented for DependencyScanningTool and DependencyScanningWorker to make use of CompilerInstanceWithContext to perform by name queries.

Part of work for rdar://136303612.

@llvmbot llvmbot added clang Clang issues not falling into any other category clang:frontend Language frontend issues, e.g. anything involving "Sema" labels Sep 22, 2025
@llvmbot
Copy link
Member

llvmbot commented Sep 22, 2025

@llvm/pr-subscribers-clang

Author: Qiongsi Wu (qiongsiwu)

Changes

This PR implements a new class CompilerInstanceWithContext to speedup dependency scanning by module names.

We observed situations (e.g. during Swift dependency scanning) where we query the dependency of a module by name, while holding the current working directory and the command line input constant. The current scanning algorithm creates a new instance of CompilerInstance every time when it scans for a new name. Not only is the repeated creation of compiler instance wasteful, we also lose information already obtained in earlier queries (for example information cached in HeaderSearch).

This PR creates a CompilerInstaneWithContext class. This class contains all the context a compiler instance may need to perform dependency scanning, while supporting multiple by name queries while keeping an instance of CompilerInstance alive and in a valid state. This way, we reduce the creation of redundant CompilerInstances. More importantly, we can accumulate and reuse information accumulated in this CompilerInstance to reduce the cost of subsequent by name scans.

The implementation consists of two logical parts:

  1. The implementation of the class CompilerInstanceWithContext. Since this class shares code with DependencyScanningWorker, some DependenyScanningWork code is moved around so the code can be shared.
  2. The adoption of CompilerInstanceWithContext in clang-scan-deps. New interfaces are implemented for DependencyScanningTool and DependencyScanningWorker to make use of CompilerInstanceWithContext to perform by name queries.

Part of work for rdar://136303612.


Patch is 44.32 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/160207.diff

13 Files Affected:

  • (modified) clang/include/clang/Frontend/CompilerInstance.h (+6)
  • (modified) clang/include/clang/Frontend/Utils.h (+4)
  • (modified) clang/include/clang/Lex/Preprocessor.h (+1)
  • (added) clang/include/clang/Tooling/DependencyScanning/CompilerInstanceWithContext.h (+90)
  • (modified) clang/include/clang/Tooling/DependencyScanning/DependencyScanningTool.h (+37)
  • (modified) clang/include/clang/Tooling/DependencyScanning/DependencyScanningWorker.h (+74)
  • (modified) clang/include/clang/Tooling/DependencyScanning/ModuleDepCollector.h (+7-2)
  • (modified) clang/lib/Tooling/DependencyScanning/CMakeLists.txt (+1)
  • (added) clang/lib/Tooling/DependencyScanning/CompilerInstanceWithContext.cpp (+250)
  • (modified) clang/lib/Tooling/DependencyScanning/DependencyScanningTool.cpp (+23)
  • (modified) clang/lib/Tooling/DependencyScanning/DependencyScanningWorker.cpp (+134-140)
  • (modified) clang/lib/Tooling/DependencyScanning/ModuleDepCollector.cpp (+5-4)
  • (modified) clang/tools/clang-scan-deps/ClangScanDeps.cpp (+17-3)
diff --git a/clang/include/clang/Frontend/CompilerInstance.h b/clang/include/clang/Frontend/CompilerInstance.h
index a6b6993b708d0..2fdfbe01fbe78 100644
--- a/clang/include/clang/Frontend/CompilerInstance.h
+++ b/clang/include/clang/Frontend/CompilerInstance.h
@@ -948,6 +948,12 @@ class CompilerInstance : public ModuleLoader {
     DependencyCollectors.push_back(std::move(Listener));
   }
 
+  void clearDependencyCollectors() { DependencyCollectors.clear(); }
+
+  std::vector<std::shared_ptr<DependencyCollector>> &getDependencyCollectors() {
+    return DependencyCollectors;
+  }
+
   void setExternalSemaSource(IntrusiveRefCntPtr<ExternalSemaSource> ESS);
 
   ModuleCache &getModuleCache() const { return *ModCache; }
diff --git a/clang/include/clang/Frontend/Utils.h b/clang/include/clang/Frontend/Utils.h
index f86c2f5074de0..1b52d970ff1a3 100644
--- a/clang/include/clang/Frontend/Utils.h
+++ b/clang/include/clang/Frontend/Utils.h
@@ -40,6 +40,7 @@ class DiagnosticsEngine;
 class ExternalSemaSource;
 class FrontendOptions;
 class PCHContainerReader;
+class PPCallbacks;
 class Preprocessor;
 class PreprocessorOptions;
 class PreprocessorOutputOptions;
@@ -87,6 +88,9 @@ class DependencyCollector {
                                   bool IsSystem, bool IsModuleFile,
                                   bool IsMissing);
 
+  /// @return the PPCallback this collector added to the Preprocessor.
+  virtual PPCallbacks *getPPCallback() { return nullptr; };
+
 protected:
   /// Return true if the filename was added to the list of dependencies, false
   /// otherwise.
diff --git a/clang/include/clang/Lex/Preprocessor.h b/clang/include/clang/Lex/Preprocessor.h
index 39754847a93e4..953902b13783f 100644
--- a/clang/include/clang/Lex/Preprocessor.h
+++ b/clang/include/clang/Lex/Preprocessor.h
@@ -1327,6 +1327,7 @@ class Preprocessor {
                                                 std::move(Callbacks));
     Callbacks = std::move(C);
   }
+  void removePPCallbacks() { Callbacks.reset(); }
   /// \}
 
   /// Get the number of tokens processed so far.
diff --git a/clang/include/clang/Tooling/DependencyScanning/CompilerInstanceWithContext.h b/clang/include/clang/Tooling/DependencyScanning/CompilerInstanceWithContext.h
new file mode 100644
index 0000000000000..5a2cb25d9d972
--- /dev/null
+++ b/clang/include/clang/Tooling/DependencyScanning/CompilerInstanceWithContext.h
@@ -0,0 +1,90 @@
+//===- CompilerInstanceWithContext.h - clang scanning compiler instance ---===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef LLVM_CLANG_TOOLING_DEPENDENCYSCANNING_COMPILERINSTANCEWITHCONTEXT_H
+#define LLVM_CLANG_TOOLING_DEPENDENCYSCANNING_COMPILERINSTANCEWITHCONTEXT_H
+
+#include "clang/Basic/FileManager.h"
+#include "clang/Basic/LLVM.h"
+#include "clang/Driver/Compilation.h"
+#include "clang/Driver/Driver.h"
+#include "clang/Frontend/CompilerInstance.h"
+#include "clang/Frontend/CompilerInvocation.h"
+#include "clang/Frontend/TextDiagnosticPrinter.h"
+#include "clang/Serialization/ModuleCache.h"
+#include "clang/Tooling/DependencyScanning/ModuleDepCollector.h"
+#include <string>
+#include <vector>
+
+namespace clang {
+namespace tooling {
+namespace dependencies {
+
+// Forward declarations.
+class DependencyScanningWorker;
+class DependencyConsumer;
+class DependencyActionController;
+
+class CompilerInstanceWithContext {
+  // Context
+  DependencyScanningWorker &Worker;
+  llvm::StringRef CWD;
+  std::vector<std::string> CommandLine;
+  static const uint64_t MAX_NUM_NAMES = (1 << 12);
+  static const std::string FakeFileBuffer;
+
+  // Context - file systems
+  llvm::IntrusiveRefCntPtr<llvm::vfs::OverlayFileSystem> OverlayFS;
+  llvm::IntrusiveRefCntPtr<llvm::vfs::InMemoryFileSystem> InMemoryFS;
+  llvm::IntrusiveRefCntPtr<llvm::vfs::FileSystem> InMemoryOverlay;
+
+  // Context - Diagnostics engine, file manager and source mamanger.
+  std::string DiagnosticOutput;
+  llvm::raw_string_ostream DiagnosticsOS;
+  std::unique_ptr<TextDiagnosticPrinter> DiagPrinter;
+  IntrusiveRefCntPtr<DiagnosticsEngine> Diags;
+  std::unique_ptr<FileManager> FileMgr;
+  std::unique_ptr<SourceManager> SrcMgr;
+
+  // Context - compiler invocation
+  std::unique_ptr<clang::driver::Driver> Driver;
+  std::unique_ptr<clang::driver::Compilation> Compilation;
+  std::unique_ptr<CompilerInvocation> Invocation;
+  llvm::IntrusiveRefCntPtr<llvm::vfs::FileSystem> VFSFromCompilerInvocation;
+
+  // Context - output options
+  std::unique_ptr<DependencyOutputOptions> OutputOpts;
+
+  // Context - stable directory handling
+  llvm::SmallVector<StringRef> StableDirs;
+  PrebuiltModulesAttrsMap PrebuiltModuleVFSMap;
+
+  // Compiler Instance
+  IntrusiveRefCntPtr<ModuleCache> ModCache;
+  std::unique_ptr<CompilerInstance> CIPtr;
+
+  //   // Source location offset.
+  int32_t SrcLocOffset = 0;
+
+public:
+  CompilerInstanceWithContext(DependencyScanningWorker &Worker, StringRef CWD,
+                              const std::vector<std::string> &CMD)
+      : Worker(Worker), CWD(CWD), CommandLine(CMD),
+        DiagnosticsOS(DiagnosticOutput) {};
+
+  llvm::Error initialize();
+  llvm::Error computeDependencies(StringRef ModuleName,
+                                  DependencyConsumer &Consumer,
+                                  DependencyActionController &Controller);
+  llvm::Error finalize();
+};
+} // namespace dependencies
+} // namespace tooling
+} // namespace clang
+
+#endif
diff --git a/clang/include/clang/Tooling/DependencyScanning/DependencyScanningTool.h b/clang/include/clang/Tooling/DependencyScanning/DependencyScanningTool.h
index c3601a4e73e1f..109330aa8f20c 100644
--- a/clang/include/clang/Tooling/DependencyScanning/DependencyScanningTool.h
+++ b/clang/include/clang/Tooling/DependencyScanning/DependencyScanningTool.h
@@ -161,6 +161,43 @@ class DependencyScanningTool {
 
   llvm::vfs::FileSystem &getWorkerVFS() const { return Worker.getVFS(); }
 
+  /// The following three methods provides a new interface to perform
+  /// by name dependency scan. The new interface's intention is to improve
+  /// dependency scanning performance when a sequence of name is looked up
+  /// with the same current working directory and the command line.
+
+  /// @brief Initializing the context and the compiler instance to perform.
+  ///        This method must be called before performing scanning.
+  /// @param CWD The current working directory used during the scan.
+  /// @param CommandLine The commandline used for the scan.
+  /// @return Error if the initializaiton fails.
+  llvm::Error initializeCompilerInstacneWithContext(
+      StringRef CWD, const std::vector<std::string> &CommandLine);
+
+  /// @brief Computes the dependeny for the module named ModuleName.
+  /// @param ModuleName The name of the module for which this method computes
+  ///.                  dependencies.
+  /// @param AlreadySeen This stores modules which have previously been
+  ///                    reported. Use the same instance for all calls to this
+  ///                    function for a single \c DependencyScanningTool in a
+  ///                    single build. Note that this parameter is not part of
+  ///                    the context because it can be shared across different
+  ///                    worker threads and each worker thread may update it.
+  /// @param LookupModuleOutput This function is called to fill in
+  ///                           "-fmodule-file=", "-o" and other output
+  ///                           arguments for dependencies.
+  /// @return An instance of \c TranslationUnitDeps if the scan is successful.
+  ///         Otherwise it returns an error.
+  llvm::Expected<TranslationUnitDeps> computeDependenciesByNameWithContext(
+      StringRef ModuleName, const llvm::DenseSet<ModuleID> &AlreadySeen,
+      LookupModuleOutputCallback LookupModuleOutput);
+
+  /// @brief This method finializes the compiler instance. It finalizes the
+  ///        diagnostics and deletes the compiler instance. Call this method
+  ///        once all names for a same commandline are scanned.
+  /// @return Error if an error occured during finalization.
+  llvm::Error finalizeCompilerInstanceWithContext();
+
 private:
   DependencyScanningWorker Worker;
 };
diff --git a/clang/include/clang/Tooling/DependencyScanning/DependencyScanningWorker.h b/clang/include/clang/Tooling/DependencyScanning/DependencyScanningWorker.h
index 6060e4b43312e..d34489c568393 100644
--- a/clang/include/clang/Tooling/DependencyScanning/DependencyScanningWorker.h
+++ b/clang/include/clang/Tooling/DependencyScanning/DependencyScanningWorker.h
@@ -13,6 +13,7 @@
 #include "clang/Basic/FileManager.h"
 #include "clang/Basic/LLVM.h"
 #include "clang/Frontend/PCHContainerOperations.h"
+#include "clang/Tooling/DependencyScanning/CompilerInstanceWithContext.h"
 #include "clang/Tooling/DependencyScanning/DependencyScanningService.h"
 #include "clang/Tooling/DependencyScanning/ModuleDepCollector.h"
 #include "llvm/Support/Error.h"
@@ -74,6 +75,22 @@ class DependencyActionController {
                                          ModuleOutputKind Kind) = 0;
 };
 
+/// Some helper functions for the dependency scanning worker.
+std::string
+deduceDepTarget(const std::string &OutputFile,
+                const SmallVectorImpl<FrontendInputFile> &InputFiles);
+void canonicalizeDefines(PreprocessorOptions &PPOpts);
+void sanitizeDiagOpts(DiagnosticOptions &DiagOpts);
+std::unique_ptr<DiagnosticOptions>
+createDiagOptions(const std::vector<std::string> &CommandLine);
+
+using PrebuiltModuleFilesT = decltype(HeaderSearchOptions::PrebuiltModuleFiles);
+bool visitPrebuiltModule(StringRef PrebuiltModuleFilename, CompilerInstance &CI,
+                         PrebuiltModuleFilesT &ModuleFiles,
+                         PrebuiltModulesAttrsMap &PrebuiltModulesASTMap,
+                         DiagnosticsEngine &Diags,
+                         const ArrayRef<StringRef> StableDirs);
+
 /// An individual dependency scanning worker that is able to run on its own
 /// thread.
 ///
@@ -136,6 +153,34 @@ class DependencyScanningWorker {
                                   DependencyActionController &Controller,
                                   StringRef ModuleName);
 
+  /// The three method below implements a new interface for by name
+  /// dependency scanning. They together enable the dependency scanning worker
+  /// to more effectively perform scanning for a sequence of modules
+  /// by name when the CWD and CommandLine are holding constant.
+
+  /// @brief Initializing the context and the compiler instance to perform.
+  /// @param CWD The current working directory used during the scan.
+  /// @param CommandLine The commandline used for the scan.
+  /// @return Error if the initializaiton fails.
+  llvm::Error initializeCompierInstanceWithContext(
+      StringRef CWD, const std::vector<std::string> &CommandLine);
+
+  /// @brief Performaces dependency scanning for the module whose name is
+  ///        specified.
+  /// @param ModuleName  The name of the module whose dependency will be
+  ///                    scanned.
+  /// @param Consumer The dependency consumer that stores the results.
+  /// @param Controller The controller for the dependency scanning action.
+  /// @return Error of the scanner incurs errors.
+  llvm::Error
+  computeDependenciesByNameWithContext(StringRef ModuleName,
+                                       DependencyConsumer &Consumer,
+                                       DependencyActionController &Controller);
+
+  /// @brief Finalizes the diagnostics engine and deletes the compiler instance.
+  /// @return Error if errors occur during finalization.
+  llvm::Error finalizeCompilerInstanceWithContext();
+
   llvm::vfs::FileSystem &getVFS() const { return *BaseFS; }
 
 private:
@@ -151,6 +196,9 @@ class DependencyScanningWorker {
   /// (passed in the constructor).
   llvm::IntrusiveRefCntPtr<DependencyScanningWorkerFilesystem> DepFS;
 
+  friend class CompilerInstanceWithContext;
+  std::unique_ptr<CompilerInstanceWithContext> CIWithContext;
+
   /// Private helper functions.
   bool scanDependencies(StringRef WorkingDirectory,
                         const std::vector<std::string> &CommandLine,
@@ -161,6 +209,32 @@ class DependencyScanningWorker {
                         std::optional<StringRef> ModuleName);
 };
 
+class ScanningDependencyDirectivesGetter : public DependencyDirectivesGetter {
+  DependencyScanningWorkerFilesystem *DepFS;
+
+public:
+  ScanningDependencyDirectivesGetter(FileManager &FileMgr) : DepFS(nullptr) {
+    FileMgr.getVirtualFileSystem().visit([&](llvm::vfs::FileSystem &FS) {
+      auto *DFS = llvm::dyn_cast<DependencyScanningWorkerFilesystem>(&FS);
+      if (DFS) {
+        assert(!DepFS && "Found multiple scanning VFSs");
+        DepFS = DFS;
+      }
+    });
+    assert(DepFS && "Did not find scanning VFS");
+  }
+
+  std::unique_ptr<DependencyDirectivesGetter>
+  cloneFor(FileManager &FileMgr) override {
+    return std::make_unique<ScanningDependencyDirectivesGetter>(FileMgr);
+  }
+
+  std::optional<ArrayRef<dependency_directives_scan::Directive>>
+  operator()(FileEntryRef File) override {
+    return DepFS->getDirectiveTokens(File.getName());
+  }
+};
+
 } // end namespace dependencies
 } // end namespace tooling
 } // end namespace clang
diff --git a/clang/include/clang/Tooling/DependencyScanning/ModuleDepCollector.h b/clang/include/clang/Tooling/DependencyScanning/ModuleDepCollector.h
index 4136cb73f7043..c79dbffa5c263 100644
--- a/clang/include/clang/Tooling/DependencyScanning/ModuleDepCollector.h
+++ b/clang/include/clang/Tooling/DependencyScanning/ModuleDepCollector.h
@@ -282,11 +282,12 @@ class ModuleDepCollector final : public DependencyCollector {
                      CompilerInstance &ScanInstance, DependencyConsumer &C,
                      DependencyActionController &Controller,
                      CompilerInvocation OriginalCI,
-                     const PrebuiltModulesAttrsMap PrebuiltModulesASTMap,
+                     const PrebuiltModulesAttrsMap &PrebuiltModulesASTMap,
                      const ArrayRef<StringRef> StableDirs);
 
   void attachToPreprocessor(Preprocessor &PP) override;
   void attachToASTReader(ASTReader &R) override;
+  PPCallbacks *getPPCallback() override { return CollectorPPPtr; }
 
   /// Apply any changes implied by the discovered dependencies to the given
   /// invocation, (e.g. disable implicit modules, add explicit module paths).
@@ -305,7 +306,7 @@ class ModuleDepCollector final : public DependencyCollector {
   DependencyActionController &Controller;
   /// Mapping from prebuilt AST filepaths to their attributes referenced during
   /// dependency collecting.
-  const PrebuiltModulesAttrsMap PrebuiltModulesASTMap;
+  const PrebuiltModulesAttrsMap &PrebuiltModulesASTMap;
   /// Directory paths known to be stable through an active development and build
   /// cycle.
   const ArrayRef<StringRef> StableDirs;
@@ -339,6 +340,10 @@ class ModuleDepCollector final : public DependencyCollector {
   std::optional<P1689ModuleInfo> ProvidedStdCXXModule;
   std::vector<P1689ModuleInfo> RequiredStdCXXModules;
 
+  /// A pointer to the preprocessor callback so we can invoke it directly
+  /// if needed.
+  ModuleDepCollectorPP *CollectorPPPtr = nullptr;
+
   /// Checks whether the module is known as being prebuilt.
   bool isPrebuiltModule(const Module *M);
 
diff --git a/clang/lib/Tooling/DependencyScanning/CMakeLists.txt b/clang/lib/Tooling/DependencyScanning/CMakeLists.txt
index 42a63faa26d3e..9cb73109902e2 100644
--- a/clang/lib/Tooling/DependencyScanning/CMakeLists.txt
+++ b/clang/lib/Tooling/DependencyScanning/CMakeLists.txt
@@ -6,6 +6,7 @@ set(LLVM_LINK_COMPONENTS
   )
 
 add_clang_library(clangDependencyScanning
+  CompilerInstanceWithContext.cpp
   DependencyScanningFilesystem.cpp
   DependencyScanningService.cpp
   DependencyScanningWorker.cpp
diff --git a/clang/lib/Tooling/DependencyScanning/CompilerInstanceWithContext.cpp b/clang/lib/Tooling/DependencyScanning/CompilerInstanceWithContext.cpp
new file mode 100644
index 0000000000000..172a81003f7ba
--- /dev/null
+++ b/clang/lib/Tooling/DependencyScanning/CompilerInstanceWithContext.cpp
@@ -0,0 +1,250 @@
+//===- CompilerInstanceWithContext.cpp - clang scanning compiler instance -===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#include "clang/Tooling/DependencyScanning/CompilerInstanceWithContext.h"
+#include "clang/Frontend/CompilerInstance.h"
+#include "clang/Frontend/FrontendActions.h"
+#include "clang/Tooling/DependencyScanning/DependencyScanningWorker.h"
+#include "clang/Tooling/DependencyScanning/ModuleDepCollector.h"
+#include "llvm/TargetParser/Host.h"
+
+using namespace clang;
+using namespace tooling;
+using namespace dependencies;
+
+const std::string CompilerInstanceWithContext::FakeFileBuffer =
+    std::string(MAX_NUM_NAMES, ' ');
+
+llvm::Error CompilerInstanceWithContext::initialize() {
+  // Virtual file system setup
+  // - Set the current working directory.
+  Worker.BaseFS->setCurrentWorkingDirectory(CWD);
+  OverlayFS =
+      llvm::makeIntrusiveRefCnt<llvm::vfs::OverlayFileSystem>(Worker.BaseFS);
+  InMemoryFS = llvm::makeIntrusiveRefCnt<llvm::vfs::InMemoryFileSystem>();
+  InMemoryFS->setCurrentWorkingDirectory(CWD);
+
+  // - Create the fake file as scanning input source file and setup overlay
+  //   FS.
+  SmallString<128> FakeInputPath;
+  llvm::sys::fs::createUniquePath("ScanningCI-%%%%%%%%.input", FakeInputPath,
+                                  /*MakeAbsolute=*/false);
+  InMemoryFS->addFile(FakeInputPath, 0,
+                      llvm::MemoryBuffer::getMemBuffer(FakeFileBuffer));
+  InMemoryOverlay = InMemoryFS;
+  // TODO: we need to handle CAS/CASFS here.
+  //    if (Worker.CAS && !Worker.DepCASFS)
+  //     InMemoryOverlay = llvm::cas::createCASProvidingFileSystem(
+  //         Worker.CAS, std::move(InMemoryFS));
+  OverlayFS->pushOverlay(InMemoryOverlay);
+
+  // Augument the command line.
+  CommandLine.emplace_back(FakeInputPath);
+
+  // Create the file manager, the diagnostics engine, and the source manager.
+  FileMgr = std::make_unique<FileManager>(FileSystemOptions{}, OverlayFS);
+  DiagnosticOutput.clear();
+  auto DiagOpts = createDiagOptions(CommandLine);
+  DiagPrinter = std::make_unique<TextDiagnosticPrinter>(DiagnosticsOS,
+                                                        *(DiagOpts.release()));
+  std::vector<const char *> CCommandLine(CommandLine.size(), nullptr);
+  llvm::transform(CommandLine, CCommandLine.begin(),
+                  [](const std::string &Str) { return Str.c_str(); });
+  DiagOpts = CreateAndPopulateDiagOpts(CCommandLine);
+  sanitizeDiagOpts(*DiagOpts);
+  Diags = CompilerInstance::createDiagnostics(*OverlayFS, *(DiagOpts.release()),
+                                              DiagPrinter.get(),
+                                              /*ShouldOwnClient=*/false);
+  SrcMgr = std::make_unique<SourceManager>(*Diags, *FileMgr);
+  Diags->setSourceManager(SrcMgr.get());
+
+  // Create the compiler invocation.
+  Driver = std::make_unique<driver::Driver>(
+      CCommandLine[0], llvm::sys::getDefaultTargetTriple(), *Diags,
+      "clang LLVM compiler", OverlayFS);
+  Driver->setTitle("clang_based_tool");
+  Compilation.reset(Driver->BuildCompilation(llvm::ArrayRef(CCommandLine)));
+
+  if (Compilation->containsError()) {
+    return llvm::make_error<llvm::StringError>("Failed to build compilation",
+                                               llvm::inconvertibleErrorCode());
+  }
+
+  const driver::Command &Command = *(Compilation->getJobs().begin());
+  const auto &CommandArgs = Command.getArguments();
+  size_t ArgSize = CommandArgs.size();
+  assert(ArgSize >= 1 && "Cannot have a command with 0 args");
+  const char *FirstArg = CommandArgs[0];
+  if (strcmp(FirstArg, "-cc1"))
+    return llvm::make_error<llvm::StringError>(
+        "Incorrect compilation c...
[truncated]

@qiongsiwu
Copy link
Contributor Author

Note to reviewers: I am working concurrently on

  1. A PR against https://github.com/swiftlang/llvm-project so we can enable CAS.
  2. A PR against https://github.com/swiftlang/swift to teach Swift to use the new interface for by name scans.

Copy link
Contributor

@jansvoboda11 jansvoboda11 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer if we didn't duplicate all the existing logic. Would it be possible to fit CompilerInstanceWithContext into the current tool/worker APIs? I'm thinking that maybe getModuleDependencies() and other top-level APIs could accept a flag that would control whether CompilerInstanceWithContext should be carried between API invocations, or whether it should be created anew each time.

Comment on lines 88 to 89
CompilerInvocation::CreateFromArgs(*Invocation, Command.getArguments(),
*Diags, Command.getExecutable());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, after you create the invocation here, this class never needs Command, Compilation, Driver, Diags, SrcMgr, etc. ever again. Can you make these local instead of member variables?

Copy link
Contributor Author

@qiongsiwu qiongsiwu Sep 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to chat offline about this in a bit more detail to understand what exactly I can make local. It seems that a lot of these are wired into the diagnostics engine, which is in turn used in the compiler invocation and the compiler instance.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is the DiagnosticsEngine that's only used in the driver and cc1 command line parsing. You call createDiagnostics() on the CompilerInstance, which will instantiate a new DiagnosticsEngine that will be used in the scan itself. These complexities are one of the main reasons I'd like for the new class to share the same implementation as the existing logic.

Comment on lines +214 to +216
CB->LexedFileChanged(MainFileID,
PPChainedCallbacks::LexedFileChangeReason::EnterFile,
FileType, PrevFID, IDLocation);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need to call PP.EnterSourceFileHere()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point! The intention is to try not to call EnterSourceFile, because when SrcLocOffset is not zero, we are in the middle of the fake file, and we want to keep all the preprocessor state. This is to simulate the effect of consecutive scanning of the same file.

Copy link
Contributor

@jansvoboda11 jansvoboda11 Sep 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, makes sense. That would be great to put in a comment besides the condition.

@qiongsiwu
Copy link
Contributor Author

qiongsiwu commented Sep 23, 2025

I'd prefer if we didn't duplicate all the existing logic. Would it be possible to fit CompilerInstanceWithContext into the current tool/worker APIs? I'm thinking that maybe getModuleDependencies() and other top-level APIs could accept a flag that would control whether CompilerInstanceWithContext should be carried between API invocations, or whether it should be created anew each time.

Ah thanks for the suggestion! The new APIs (initialization, by name query, and finalization) are intentional for two reasons:

  1. How this works is significantly different from the existing APIs. I intend to make it as obvious as possible that states are cached between the calls. CompilerInstanceWithContext::computeDependencies only takes a name as its input, and this further implies that the cwd/commandline are inherited and stored. I feel that tweaking existing APIs may be confusing to the user.
  2. I find it complex to implement with a flag added to the existing API to control if a CompilerInstanceWithContext should be used. What if the user is calling the querying function multiple times with different CWD/Commandline combinations but set the flag to use CompilerInstanceWithContext? I think in those cases we may need to implement a list of CompilerInstanceWithContext to make sure we are always picking up the correct instance. I think that is manageable, but that will add more complexity to this patch.

One thing I plan to do later is to replace DependencyScanningAction with CompilerInstanceWithContext to perform all scanning, so we can consolidate the implementation from the back.

@qiongsiwu qiongsiwu self-assigned this Sep 23, 2025
qiongsiwu added a commit that referenced this pull request Sep 26, 2025
…its own header and source files (#160795)

This is the first of three PRs to land
#160207 in smaller pieces.

This PR is an NFC. It moves `DependencyScanningAction` to its own source
file, so we can later implement a `CompilerInstanceWithContext` in the
new file.

Part of work for rdar://136303612.
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Sep 26, 2025
…Action` to its own header and source files (#160795)

This is the first of three PRs to land
llvm/llvm-project#160207 in smaller pieces.

This PR is an NFC. It moves `DependencyScanningAction` to its own source
file, so we can later implement a `CompilerInstanceWithContext` in the
new file.

Part of work for rdar://136303612.
mahesh-attarde pushed a commit to mahesh-attarde/llvm-project that referenced this pull request Oct 3, 2025
…its own header and source files (llvm#160795)

This is the first of three PRs to land
llvm#160207 in smaller pieces.

This PR is an NFC. It moves `DependencyScanningAction` to its own source
file, so we can later implement a `CompilerInstanceWithContext` in the
new file.

Part of work for rdar://136303612.
qiongsiwu added a commit that referenced this pull request Oct 3, 2025
…ialization (#161300)

This PR follows #160795, and it
is the second of a series of planned PRs to land
#160207 in smaller pieces.

The initialization steps before and within
`DependencyScanningAction::runInvocation` are broken up in to several
helper functions. The intention is to reuse the helper functions in a
followup PR to initialize the `CompilerInstanceWithContext`.

Part of work for rdar://136303612.
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Oct 3, 2025
…stance Initialization (#161300)

This PR follows llvm/llvm-project#160795, and it
is the second of a series of planned PRs to land
llvm/llvm-project#160207 in smaller pieces.

The initialization steps before and within
`DependencyScanningAction::runInvocation` are broken up in to several
helper functions. The intention is to reuse the helper functions in a
followup PR to initialize the `CompilerInstanceWithContext`.

Part of work for rdar://136303612.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

clang:frontend Language frontend issues, e.g. anything involving "Sema" clang Clang issues not falling into any other category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants