Skip to content

[clang][Dependency Scanning] Adding an API to Diagnose Invalid Negative Stat Cache Entries #135703

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Apr 18, 2025

Conversation

qiongsiwu
Copy link
Contributor

@qiongsiwu qiongsiwu commented Apr 15, 2025

We have had numerous situations where the negatively stat cached paths are invalidated during the build, and such invalidations lead to build errors.

This PR adds an API to diagnose such cases. DependencyScanningFilesystemSharedCache::diagnoseNegativeStatCachedPaths allows users of the cache to ask the cache to examine all negatively stat cached paths, and re-stat the paths using the passed-in file system. If any re-stat succeeds, the API emits diagnostics.

rdar://149147920

@llvmbot llvmbot added the clang Clang issues not falling into any other category label Apr 15, 2025
@llvmbot
Copy link
Member

llvmbot commented Apr 15, 2025

@llvm/pr-subscribers-clang

Author: Qiongsi Wu (qiongsiwu)

Changes

We have had numerous situations where the negatively stat cached paths are invalidated during the build, and such invalidations lead to build errors.

This PR adds an API to diagnose such cases. DependencyScanningFilesystemSharedCache::diagnoseNegativeStatCachedPaths allows users of the cache to ask the cache to examine all negatively stat cached paths, and re-stat the paths using the passed-in file system. If any re-stat succeeds, the API emits diagnostics.


Full diff: https://github.com/llvm/llvm-project/pull/135703.diff

3 Files Affected:

  • (modified) clang/include/clang/Tooling/DependencyScanning/DependencyScanningFilesystem.h (+8)
  • (modified) clang/lib/Tooling/DependencyScanning/DependencyScanningFilesystem.cpp (+26)
  • (modified) clang/unittests/Tooling/DependencyScanning/DependencyScanningFilesystemTest.cpp (+28)
diff --git a/clang/include/clang/Tooling/DependencyScanning/DependencyScanningFilesystem.h b/clang/include/clang/Tooling/DependencyScanning/DependencyScanningFilesystem.h
index d12814e7c9253..648158c5287c6 100644
--- a/clang/include/clang/Tooling/DependencyScanning/DependencyScanningFilesystem.h
+++ b/clang/include/clang/Tooling/DependencyScanning/DependencyScanningFilesystem.h
@@ -220,6 +220,14 @@ class DependencyScanningFilesystemSharedCache {
   CacheShard &getShardForFilename(StringRef Filename) const;
   CacheShard &getShardForUID(llvm::sys::fs::UniqueID UID) const;
 
+  /// Visits all cached entries and re-stat an entry using the UnderlyingFS if
+  /// it is negatively stat cached. If the re-stat succeeds, print diagnostics
+  /// information to OS indicating that negative stat caching has been
+  /// invalidated.
+  void
+  diagnoseNegativeStatCachedPaths(llvm::raw_ostream &OS,
+                                  llvm::vfs::FileSystem &UnderlyingFS) const;
+
 private:
   std::unique_ptr<CacheShard[]> CacheShards;
   unsigned NumShards;
diff --git a/clang/lib/Tooling/DependencyScanning/DependencyScanningFilesystem.cpp b/clang/lib/Tooling/DependencyScanning/DependencyScanningFilesystem.cpp
index 4d738e4bea41a..fe72a1a91909e 100644
--- a/clang/lib/Tooling/DependencyScanning/DependencyScanningFilesystem.cpp
+++ b/clang/lib/Tooling/DependencyScanning/DependencyScanningFilesystem.cpp
@@ -108,6 +108,32 @@ DependencyScanningFilesystemSharedCache::getShardForUID(
   return CacheShards[Hash % NumShards];
 }
 
+void DependencyScanningFilesystemSharedCache::diagnoseNegativeStatCachedPaths(
+    llvm::raw_ostream &OS, llvm::vfs::FileSystem &UnderlyingFS) const {
+  // Iterate through all shards and look for cached stat errors.
+  for (unsigned i = 0; i < NumShards; i++) {
+    const CacheShard &Shard = CacheShards[i];
+    for (const auto &Path : Shard.CacheByFilename.keys()) {
+      auto CachedPairIt = Shard.CacheByFilename.find(Path);
+      auto CachedPair = CachedPairIt->getValue();
+      const CachedFileSystemEntry *Entry = CachedPair.first;
+
+      if (Entry->getError()) {
+        // Only examine cached errors.
+        llvm::ErrorOr<llvm::vfs::Status> Stat = UnderlyingFS.status(Path);
+        if (Stat) {
+          // This is the case where we have negatively cached the non-existence
+          // of the file at Path, but the cache is later invalidated. Report
+          // diagnostics.
+          OS << Path << " did not exist when it was first searched "
+             << "but was created later. This may have led to a build failure "
+                "due to missing files.\n";
+        }
+      }
+    }
+  }
+}
+
 const CachedFileSystemEntry *
 DependencyScanningFilesystemSharedCache::CacheShard::findEntryByFilename(
     StringRef Filename) const {
diff --git a/clang/unittests/Tooling/DependencyScanning/DependencyScanningFilesystemTest.cpp b/clang/unittests/Tooling/DependencyScanning/DependencyScanningFilesystemTest.cpp
index 111351fb90cee..02b192f1ba587 100644
--- a/clang/unittests/Tooling/DependencyScanning/DependencyScanningFilesystemTest.cpp
+++ b/clang/unittests/Tooling/DependencyScanning/DependencyScanningFilesystemTest.cpp
@@ -178,3 +178,31 @@ TEST(DependencyScanningFilesystem, CacheStatFailures) {
   DepFS.exists("/cache/a.pcm");
   EXPECT_EQ(InstrumentingFS->NumStatusCalls, 5u);
 }
+
+TEST(DependencyScanningFilesystem, DiagnoseStaleStatFailures) {
+  auto InMemoryFS = llvm::makeIntrusiveRefCnt<llvm::vfs::InMemoryFileSystem>();
+  InMemoryFS->setCurrentWorkingDirectory("/");
+  InMemoryFS->addFile("/dir", 0, llvm::MemoryBuffer::getMemBuffer(""));
+
+  DependencyScanningFilesystemSharedCache SharedCache;
+  DependencyScanningWorkerFilesystem DepFS(SharedCache, InMemoryFS);
+
+  bool Path1Exists = DepFS.exists("/path1");
+  EXPECT_EQ(Path1Exists, false);
+
+  // Adding a file that has been stat-ed,
+  InMemoryFS->addFile("/path1", 0, llvm::MemoryBuffer::getMemBuffer(""));
+  Path1Exists = DepFS.exists("/path1");
+  // Due to caching in SharedCache, path1 should not exist in
+  // DepFS's eyes.
+  EXPECT_EQ(Path1Exists, false);
+
+  std::string Diags;
+  llvm::raw_string_ostream DiagsStream(Diags);
+  SharedCache.diagnoseNegativeStatCachedPaths(DiagsStream, *InMemoryFS.get());
+
+  ASSERT_STREQ(
+      "/path1 did not exist when it was first searched but was created later. "
+      "This may have led to a build failure due to missing files.\n",
+      Diags.c_str());
+}

@qiongsiwu
Copy link
Contributor Author

qiongsiwu commented Apr 15, 2025

Note to reviewers:
I am not quite sure if passing in the underlying VFS is a good design. If I understand correctly, the underlying VFS is a dependency scanning worker property, so on the surface it seems that we will need this diagnostic per worker, instead of per scanning service, and we do need a VFS from somewhere to query.

I am all ears for a different design.

Additionally, I can store (pointers to) the cached failed results in a list, so that we do not need to scan the whole cache for the diagnostic.

@qiongsiwu qiongsiwu requested a review from jansvoboda11 April 15, 2025 17:42
@qiongsiwu qiongsiwu requested a review from jansvoboda11 April 15, 2025 21:35
Copy link
Contributor

@jansvoboda11 jansvoboda11 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On a second thought, what's the current intent for build system integration? I'm curious why we're emitting a diagnostic here rather than just passing the list of paths back to the build system and letting it decide how to report these.

@qiongsiwu
Copy link
Contributor Author

On a second thought, what's the current intent for build system integration? I'm curious why we're emitting a diagnostic here rather than just passing the list of paths back to the build system and letting it decide how to report these.

This is a good point. Additionally, the current API design requires a worker calling it, since it needs the worker's VFS (or whoever owns the "current " VFS) to do the stat again. So this will report a diagnostic per worker, which may not be ideal.

I can change the API such that we have two new methods instead of one. The first method collects the erroneously negatively cached paths, and the second reports it (in a form that the build system likes). The collection has to be done per worker, but the reporting can now be done once when the scanning service is destroyed. Does this sound reasonable?

@jansvoboda11
Copy link
Contributor

I don't think workers need to be involved at all - we just need to pass the same VFS to this function that we use to instantiate all workers. For example, downstream clang_experimental_DependencyScannerWorker_create_v0() always uses llvm::vfs::createPhysicalFileSystem(), so I'm thinking whatever C API we invent for this functionality, it can operate on the service only, it just needs to be consistent in using the physical VFS. That C API can return CXCStringArray and let the build system invent its own message format.

@jansvoboda11
Copy link
Contributor

(To be frank, I think it would be better to enforce the consistency of VFSs by having the service own one and the workers just applying overlays as needed, but that's definitely out of scope for this patch.)

@qiongsiwu
Copy link
Contributor Author

Thanks @jansvoboda11 ! The API is updated so that the end result is a vector of paths (strings), so the caller can create the diagnostics.

I will add the C-API to https://github.com/swiftlang/llvm-project after this PR lands.

@qiongsiwu qiongsiwu requested a review from jansvoboda11 April 17, 2025 20:22
@qiongsiwu qiongsiwu requested a review from jansvoboda11 April 17, 2025 23:02
Copy link
Contributor

@jansvoboda11 jansvoboda11 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM once the extra std::string constructor gets removed & the test passes.

@qiongsiwu qiongsiwu merged commit 9ef9167 into llvm:main Apr 18, 2025
11 checks passed
qiongsiwu added a commit to swiftlang/llvm-project that referenced this pull request May 1, 2025
…10524)

llvm#135703 added a C++ API to the shared cached to diagnose invalid negatively stat cached paths. This PR adds a C API so an external system can take advantage of the diagnostics.

rdar://149147920
IanWood1 pushed a commit to IanWood1/llvm-project that referenced this pull request May 6, 2025
…ve Stat Cache Entries (llvm#135703)

We have had numerous situations where the negatively stat cached paths
are invalidated during the build, and such invalidations lead to build
errors.

This PR adds an API to diagnose such cases.
`DependencyScanningFilesystemSharedCache::diagnoseNegativeStatCachedPaths`
allows users of the cache to ask the cache to examine all negatively
stat cached paths, and re-stat the paths using the passed-in file
system. If any re-stat succeeds, the API emits diagnostics.

rdar://149147920
IanWood1 pushed a commit to IanWood1/llvm-project that referenced this pull request May 6, 2025
…ve Stat Cache Entries (llvm#135703)

We have had numerous situations where the negatively stat cached paths
are invalidated during the build, and such invalidations lead to build
errors.

This PR adds an API to diagnose such cases.
`DependencyScanningFilesystemSharedCache::diagnoseNegativeStatCachedPaths`
allows users of the cache to ask the cache to examine all negatively
stat cached paths, and re-stat the paths using the passed-in file
system. If any re-stat succeeds, the API emits diagnostics.

rdar://149147920
IanWood1 pushed a commit to IanWood1/llvm-project that referenced this pull request May 6, 2025
…ve Stat Cache Entries (llvm#135703)

We have had numerous situations where the negatively stat cached paths
are invalidated during the build, and such invalidations lead to build
errors.

This PR adds an API to diagnose such cases.
`DependencyScanningFilesystemSharedCache::diagnoseNegativeStatCachedPaths`
allows users of the cache to ask the cache to examine all negatively
stat cached paths, and re-stat the paths using the passed-in file
system. If any re-stat succeeds, the API emits diagnostics.

rdar://149147920
qiongsiwu added a commit to qiongsiwu/llvm-project that referenced this pull request May 14, 2025
…ve Stat Cache Entries (llvm#135703)

We have had numerous situations where the negatively stat cached paths
are invalidated during the build, and such invalidations lead to build
errors.

This PR adds an API to diagnose such cases.
`DependencyScanningFilesystemSharedCache::diagnoseNegativeStatCachedPaths`
allows users of the cache to ask the cache to examine all negatively
stat cached paths, and re-stat the paths using the passed-in file
system. If any re-stat succeeds, the API emits diagnostics.

rdar://149147920
(cherry picked from commit 9ef9167)

 Conflicts:
	clang/unittests/Tooling/DependencyScanning/DependencyScanningFilesystemTest.cpp
qiongsiwu added a commit to qiongsiwu/llvm-project that referenced this pull request May 14, 2025
…lvm#10524)

llvm#135703 added a C++ API to the shared cached to diagnose invalid negatively stat cached paths. This PR adds a C API so an external system can take advantage of the diagnostics.

rdar://149147920
(cherry picked from commit b7aa45c)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants