Skip to content

[clang] Improve getFileIDLocal binary search. #146510

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jul 1, 2025

Conversation

hokein
Copy link
Collaborator

@hokein hokein commented Jul 1, 2025

@hokein hokein requested review from AaronBallman and erichkeane July 1, 2025 11:39
@llvmbot llvmbot added clang Clang issues not falling into any other category clang:frontend Language frontend issues, e.g. anything involving "Sema" labels Jul 1, 2025
@llvmbot
Copy link
Member

llvmbot commented Jul 1, 2025

@llvm/pr-subscribers-clang

Author: Haojian Wu (hokein)

Changes

Avoid reading the LocalSLocEntryTable twice per loop iteration. NFC.

https://llvm-compile-time-tracker.com/compare.php?from=0b6ddb02efdcbdac9426e8d857499ea0580303cd&to=1aa335ccfb07ba96177b89b1933aa6b980fa14f6&stat=instructions:u


Full diff: https://github.com/llvm/llvm-project/pull/146510.diff

1 Files Affected:

  • (modified) clang/lib/Basic/SourceManager.cpp (+16-25)
diff --git a/clang/lib/Basic/SourceManager.cpp b/clang/lib/Basic/SourceManager.cpp
index a05d6c16caa32..0080d180a3776 100644
--- a/clang/lib/Basic/SourceManager.cpp
+++ b/clang/lib/Basic/SourceManager.cpp
@@ -812,6 +812,8 @@ FileID SourceManager::getFileIDSlow(SourceLocation::UIntTy SLocOffset) const {
 /// loaded one.
 FileID SourceManager::getFileIDLocal(SourceLocation::UIntTy SLocOffset) const {
   assert(SLocOffset < NextLocalOffset && "Bad function choice");
+  assert(SLocOffset >= LocalSLocEntryTable[0].getOffset() &&
+         "Invalid SLocOffset");
 
   // After the first and second level caches, I see two common sorts of
   // behavior: 1) a lot of searched FileID's are "near" the cached file
@@ -855,35 +857,24 @@ FileID SourceManager::getFileIDLocal(SourceLocation::UIntTy SLocOffset) const {
       break;
   }
 
-  NumProbes = 0;
-  while (true) {
-    unsigned MiddleIndex = (GreaterIndex-LessIndex)/2+LessIndex;
-    SourceLocation::UIntTy MidOffset =
-        getLocalSLocEntry(MiddleIndex).getOffset();
-
-    ++NumProbes;
-
-    // If the offset of the midpoint is too large, chop the high side of the
-    // range to the midpoint.
-    if (MidOffset > SLocOffset) {
-      GreaterIndex = MiddleIndex;
-      continue;
-    }
+  while (LessIndex < GreaterIndex) {
+    ++NumBinaryProbes;
 
-    // If the middle index contains the value, succeed and return.
-    if (MiddleIndex + 1 == LocalSLocEntryTable.size() ||
-        SLocOffset < getLocalSLocEntry(MiddleIndex + 1).getOffset()) {
-      FileID Res = FileID::get(MiddleIndex);
+    unsigned MiddleIndex = LessIndex + (GreaterIndex - LessIndex) / 2;
 
-      // Remember it.  We have good locality across FileID lookups.
-      LastFileIDLookup = Res;
-      NumBinaryProbes += NumProbes;
-      return Res;
-    }
+    SourceLocation::UIntTy MidOffset =
+        LocalSLocEntryTable[MiddleIndex].getOffset();
 
-    // Otherwise, move the low-side up to the middle index.
-    LessIndex = MiddleIndex;
+    if (MidOffset <= SLocOffset)
+      LessIndex = MiddleIndex + 1;
+    else
+      GreaterIndex = MiddleIndex;
   }
+
+  // At this point, LessIndex is the index of the *first element greater than*
+  // SLocOffset. The element we are actually looking for is the one immediately
+  // before it.
+  return LastFileIDLookup = FileID::get(LessIndex - 1);
 }
 
 /// Return the FileID for a SourceLocation with a high offset.


// Otherwise, move the low-side up to the middle index.
LessIndex = MiddleIndex;
if (MidOffset <= SLocOffset)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why equal? Shouldn't "we found this!" mean we exit the loop? And does the '+1' here cause us to miss it if the offset is exactly 'midoffset'?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an upper_bound binary search, it finds the first element which is greater than SLocOffset. We will -1 for the final result.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, hrmph.... I guess I see here. I guess that the 'we just managed to be searching for the very beginning of this file' is rare enough to not be worth a branch.

Though, is this safe (the +1/-1 dance)with 'GreaterIndex'? Despite its name, GreaterIndex is initialized to the 'size' (so not really an index... sigh). Though I guess the linear search ensures we've already checked the end....

I think I've convinced myself this is right enough. Thanks.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, initializing GreaterIndex to size can be a bit confusing, but that initial value is never actually used.

GreaterIndex always points to an entry whose offset is strictly greater than SLocOffset. The search space is defined as [LessIndex, GreaterIndex), so we know by design that GreaterIndex is never a valid result.


// Otherwise, move the low-side up to the middle index.
LessIndex = MiddleIndex;
if (MidOffset <= SLocOffset)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, hrmph.... I guess I see here. I guess that the 'we just managed to be searching for the very beginning of this file' is rare enough to not be worth a branch.

Though, is this safe (the +1/-1 dance)with 'GreaterIndex'? Despite its name, GreaterIndex is initialized to the 'size' (so not really an index... sigh). Though I guess the linear search ensures we've already checked the end....

I think I've convinced myself this is right enough. Thanks.

@hokein hokein merged commit 650d015 into llvm:main Jul 1, 2025
10 checks passed
@hokein hokein deleted the perf/binary-get-file-id-local branch July 1, 2025 19:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:frontend Language frontend issues, e.g. anything involving "Sema" clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants