-
Notifications
You must be signed in to change notification settings - Fork 13.5k
[clangd] Drop the optimization where only shards for modified files are updated in the background index #140651
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@llvm/pr-subscribers-clangd @llvm/pr-subscribers-clang-tools-extra Author: Nathan Ridge (HighCommander4) ChangesThe optimization is not valid, given that the target of a symbol reference can change without the spelling of the reference (or anything else in the file containing the reference) changing, as illustrated in the added test case. Partially fixes clangd/clangd#1104 Full diff: https://github.com/llvm/llvm-project/pull/140651.diff 3 Files Affected:
diff --git a/clang-tools-extra/clangd/index/Background.cpp b/clang-tools-extra/clangd/index/Background.cpp
index 496d1455def4b..c4bab3c4d4b20 100644
--- a/clang-tools-extra/clangd/index/Background.cpp
+++ b/clang-tools-extra/clangd/index/Background.cpp
@@ -180,13 +180,10 @@ void BackgroundIndex::boostRelated(llvm::StringRef Path) {
Queue.boost(filenameWithoutExtension(Path), IndexBoostedFile);
}
-/// Given index results from a TU, only update symbols coming from files that
-/// are different or missing from than \p ShardVersionsSnapshot. Also stores new
-/// index information on IndexStorage.
-void BackgroundIndex::update(
- llvm::StringRef MainFile, IndexFileIn Index,
- const llvm::StringMap<ShardVersion> &ShardVersionsSnapshot,
- bool HadErrors) {
+/// Given index results from a TU, update symbols coming from all files. Also
+/// stores new index information on IndexStorage.
+void BackgroundIndex::update(llvm::StringRef MainFile, IndexFileIn Index,
+ bool HadErrors) {
// Keys are URIs.
llvm::StringMap<std::pair<Path, FileDigest>> FilesToUpdate;
// Note that sources do not contain any information regarding missing headers,
@@ -198,12 +195,7 @@ void BackgroundIndex::update(
elog("Failed to resolve URI: {0}", AbsPath.takeError());
continue;
}
- const auto DigestIt = ShardVersionsSnapshot.find(*AbsPath);
- // File has different contents, or indexing was successful this time.
- if (DigestIt == ShardVersionsSnapshot.end() ||
- DigestIt->getValue().Digest != IGN.Digest ||
- (DigestIt->getValue().HadErrors && !HadErrors))
- FilesToUpdate[IGN.URI] = {std::move(*AbsPath), IGN.Digest};
+ FilesToUpdate[IGN.URI] = {std::move(*AbsPath), IGN.Digest};
}
// Shard slabs into files.
@@ -263,13 +255,6 @@ llvm::Error BackgroundIndex::index(tooling::CompileCommand Cmd) {
return llvm::errorCodeToError(Buf.getError());
auto Hash = digest(Buf->get()->getBuffer());
- // Take a snapshot of the versions to avoid locking for each file in the TU.
- llvm::StringMap<ShardVersion> ShardVersionsSnapshot;
- {
- std::lock_guard<std::mutex> Lock(ShardVersionsMu);
- ShardVersionsSnapshot = ShardVersions;
- }
-
vlog("Indexing {0} (digest:={1})", Cmd.Filename, llvm::toHex(Hash));
ParseInputs Inputs;
Inputs.TFS = &TFS;
@@ -286,25 +271,6 @@ llvm::Error BackgroundIndex::index(tooling::CompileCommand Cmd) {
return error("Couldn't build compiler instance");
SymbolCollector::Options IndexOpts;
- // Creates a filter to not collect index results from files with unchanged
- // digests.
- IndexOpts.FileFilter = [&ShardVersionsSnapshot](const SourceManager &SM,
- FileID FID) {
- const auto F = SM.getFileEntryRefForID(FID);
- if (!F)
- return false; // Skip invalid files.
- auto AbsPath = getCanonicalPath(*F, SM.getFileManager());
- if (!AbsPath)
- return false; // Skip files without absolute path.
- auto Digest = digestFile(SM, FID);
- if (!Digest)
- return false;
- auto D = ShardVersionsSnapshot.find(*AbsPath);
- if (D != ShardVersionsSnapshot.end() && D->second.Digest == Digest &&
- !D->second.HadErrors)
- return false; // Skip files that haven't changed, without errors.
- return true;
- };
IndexOpts.CollectMainFileRefs = true;
IndexFileIn Index;
@@ -345,7 +311,7 @@ llvm::Error BackgroundIndex::index(tooling::CompileCommand Cmd) {
for (auto &It : *Index.Sources)
It.second.Flags |= IncludeGraphNode::SourceFlag::HadErrors;
}
- update(AbsolutePath, std::move(Index), ShardVersionsSnapshot, HadErrors);
+ update(AbsolutePath, std::move(Index), HadErrors);
Rebuilder.indexedTU();
return llvm::Error::success();
diff --git a/clang-tools-extra/clangd/index/Background.h b/clang-tools-extra/clangd/index/Background.h
index 448e911201575..87a81666e07b7 100644
--- a/clang-tools-extra/clangd/index/Background.h
+++ b/clang-tools-extra/clangd/index/Background.h
@@ -190,12 +190,9 @@ class BackgroundIndex : public SwapIndex {
bool HadErrors = false;
};
- /// Given index results from a TU, only update symbols coming from files with
- /// different digests than \p ShardVersionsSnapshot. Also stores new index
- /// information on IndexStorage.
- void update(llvm::StringRef MainFile, IndexFileIn Index,
- const llvm::StringMap<ShardVersion> &ShardVersionsSnapshot,
- bool HadErrors);
+ /// Given index results from a TU, update symbols coming from all files. Also
+ /// stores new index information on IndexStorage.
+ void update(llvm::StringRef MainFile, IndexFileIn Index, bool HadErrors);
// configuration
const ThreadsafeFS &TFS;
diff --git a/clang-tools-extra/clangd/unittests/BackgroundIndexTests.cpp b/clang-tools-extra/clangd/unittests/BackgroundIndexTests.cpp
index ada14c9939318..ac7ed654683a2 100644
--- a/clang-tools-extra/clangd/unittests/BackgroundIndexTests.cpp
+++ b/clang-tools-extra/clangd/unittests/BackgroundIndexTests.cpp
@@ -14,6 +14,7 @@
#include "gmock/gmock.h"
#include "gtest/gtest.h"
#include <deque>
+#include <optional>
#include <thread>
using ::testing::_;
@@ -213,10 +214,11 @@ TEST_F(BackgroundIndexTest, IndexTwoFiles) {
CDB.setCompileCommand(testPath("root/B.cc"), Cmd);
ASSERT_TRUE(Idx.blockUntilIdleForTest());
- // B_CC is dropped as we don't collect symbols from A.h in this compilation.
+ // A_CC is dropped as A.h was now most recently indexed in the context of
+ // B.cc.
EXPECT_THAT(runFuzzyFind(Idx, ""),
UnorderedElementsAre(AllOf(named("common"), numReferences(5U)),
- AllOf(named("A_CC"), numReferences(0U)),
+ AllOf(named("B_CC"), numReferences(0U)),
AllOf(named("g"), numReferences(1U)),
AllOf(named("f_b"), declared(), defined(),
numReferences(1U))));
@@ -682,6 +684,78 @@ TEST_F(BackgroundIndexTest, Reindex) {
EXPECT_EQ(OldShard, Storage.lookup(testPath("A.cc")));
}
+// Test that restarting clangd properly updates the index with changes
+// to files since the last time the index was built.
+TEST_F(BackgroundIndexTest, UpdateAfterRestart) {
+ MockFS FS;
+ llvm::StringMap<std::string> Storage;
+ size_t CacheHits = 0;
+ MemoryShardStorage MSS(Storage, CacheHits);
+ auto CDB = std::make_unique<OverlayCDB>(/*Base=*/nullptr);
+ auto Idx = std::make_unique<BackgroundIndex>(
+ FS, *CDB, [&](llvm::StringRef) { return &MSS; },
+ BackgroundIndex::Options{});
+
+ // Create a header file containing a function declaration, and a source file
+ // containing a call to the function.
+ FS.Files[testPath("test.h")] = R"cpp(
+ #ifndef TEST_H
+ #define TEST_H
+ void waldo(int);
+ #endif
+ )cpp";
+ FS.Files[testPath("test.cc")] = R"cpp(
+ #include "test.h"
+ int main() {
+ waldo(42);
+ }
+ )cpp";
+
+ // Index the files in this state.
+ tooling::CompileCommand Cmd;
+ Cmd.Filename = "../test.cc";
+ Cmd.Directory = testPath("build");
+ Cmd.CommandLine = {"clang++", "../test.cc", "-fsyntax-only"};
+ CDB->setCompileCommand(testPath("test.cc"), Cmd);
+ ASSERT_TRUE(Idx->blockUntilIdleForTest());
+
+ // Verify that the function 'waldo' has two references in the index
+ // (the declaration, and the call site).
+ auto CheckRefCount = [&](std::string SymbolName) {
+ auto Syms = runFuzzyFind(*Idx, "waldo");
+ EXPECT_THAT(Syms, UnorderedElementsAre(named("waldo")));
+ auto Waldo = *Syms.begin();
+ llvm::errs() << "Querying refs for symbol with ID " << Waldo.ID.str()
+ << "\n";
+ return getRefs(*Idx, Waldo.ID).numRefs();
+ };
+ EXPECT_EQ(CheckRefCount("waldo"), 2u);
+
+ // Modify the declaration of 'waldo' in a way that changes its SymbolID
+ // without changing how existing call sites are written. Here, we add
+ // a new parameter with a default argument.
+ FS.Files[testPath("test.h")] = R"cpp(
+ #ifndef TEST_H
+ #define TEST_H
+ void waldo(int, int = 0);
+ #endif
+ )cpp";
+
+ // Simulate clangd shutting down and restarting, and the background index
+ // being rebuilt after restart.
+ Idx = nullptr;
+ CDB = std::make_unique<OverlayCDB>(/*Base=*/nullptr);
+ Idx = std::make_unique<BackgroundIndex>(
+ FS, *CDB, [&](llvm::StringRef) { return &MSS; },
+ BackgroundIndex::Options{});
+ CDB->setCompileCommand(testPath("test.cc"), Cmd);
+ ASSERT_TRUE(Idx->blockUntilIdleForTest());
+
+ // The rebuild should have updated things so that 'waldo' now again has
+ // two references in the index.
+ EXPECT_EQ(CheckRefCount("waldo"), 2u);
+}
+
class BackgroundIndexRebuilderTest : public testing::Test {
protected:
BackgroundIndexRebuilderTest()
|
…re updated in the background index The optimization is not valid, given that the target of a symbol reference can change without the spelling of the reference (or anything else in the file containing the reference) changing, as illustrated in the added test case. Partially fixes clangd/clangd#1104
2559324
to
7f48d90
Compare
I am afraid this doesn't address clangd/clangd#1104 to a useful extent :/. Whenever a header This was a deliberate trade-off back in the day, for performance wins, there are other cases where files contents can stay the same but USRs for symbols inside the file, and the ones being referenced might change (command line flags, different PP states, ...) and we assumed they're "rare" enough and will be compensated by dynamic index until user modifies these other files. I guess this assumption didn't age well, I think your idea around having a way to flush/rebuild background-index is probably most straight forward approach. We didn't have one explicitly, as there's always the workaround of deleting |
Out of multiple source files including a header, only one is re-indexed when the header is modified.
Good catch, thanks; I had overlooked that. (I added an additional testcase demonstrating this, mostly for my own future reference / as a reference for anyone else who might pick this up.)
FWIW, my expectation as a user is that when a header is modified, that does trigger re-indexing of all source files that include the header. I don't think the performance impact of this is that catastrophic, as the number of headers in a project tends to vary inversely with how widely they're included (i.e. most headers will have a few includers, and a few will have many). In my mind, the scenario is comparable to modifying a header and then rebuilding: if the header is widely included, you will rebuild many TUs, and programmers expect that (and adopt practices like forward declarations, pimpl, etc. to avoid unnecessary header dependencies). Anyways, I agree that the patch in its current form is not useful. I'll mark it as a draft, and (when time permits) will re-work it to implement a "proper" (all dependent TUs rebuilt) update procedure. We can then discuss what the trigger for that procedure should be (explicit invocation, clangd restart, file-save of header, etc.). |
The optimization is not valid, given that the target of a symbol reference can change without the spelling of the reference (or anything else in the file containing the reference) changing, as illustrated in the added test case.
Partially fixes clangd/clangd#1104