Skip to content

Conversation

@janko
Copy link
Contributor

@janko janko commented Oct 25, 2025

Motivation

Closes #2660

I want workspace symbol searches to be faster so that I use them more often in my code editor.

Implementation

Currently the whole index is fuzzy matched, and then dependency/private entries are filtered out. I flipped this around, by first filtering out dependency/private entires, and then doing the fuzzy matching on the remainder. In theory, this should be faster, assuming that filtering is faster than fuzzy matching.

However, this wasn't faster initially, and I quickly found that URI::Generic#full_path was the bottleneck (where previously it was DidYouMean::JaroWinkler). The path unescaping and Window handling was adding overhead, so I stored the raw path in the URI.

Automated Tests

I didn't update automated tests.

Manual Tests

For my Rails application, this optimization made workspace/symbol requests 4x faster.

janko added 2 commits October 25, 2025 13:28
We're doing extra work fuzzy matching across the whole index, even though only a subset of entries are considered valid results. We can filter out dependency/private entries *before* fuzzy matching, which should speed things up, assuming that fuzzy matching is more expensive than filtering.
Doing filtering brefore fuzzy matching didn't improve performance,
because retrieving full paths of entries is slow.

Since the file paths are escaped in URIs, they're currently being
unescaped with Windows handling in order to retrieve the raw path. This
is extra work, because we know the raw path at the time we're building
the URI, we just didn't store that information.

Storing the raw path in the URI and retrieving it speeds up workspace
symbol search ~4x in my application.
@janko janko requested a review from a team as a code owner October 25, 2025 12:11
@graphite-app
Copy link

graphite-app bot commented Oct 25, 2025

How to use the Graphite Merge Queue

Add the label graphite-merge to this PR to add it to the merge queue.

You must have a Graphite account in order to use the merge queue. Sign up using this link.

An organization admin has enabled the Graphite Merge Queue in this repository.

Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue.


def fuzzy_search
@index.fuzzy_search(@query) do |entry|
file_path = entry.uri.raw_path
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if this is the right approach, we could rather cache this on entry level instead of skipping unscapping parser

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do the unescaping at all? We had the full path when building the URI, why reconstruct it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It won't work properly because the URI path and file_path aren't the same thing on Windows and our comparison for not_in_dependencies will fail.

uri = URI("file:///C:/ruby/something.rb")

# The URI's path is not a valid file path
uri.path # => "/C:/ruby/something.rb"

# It's the handling of `to_standardized_path` that turns it into the correct one
uri.to_standardized_path # => "C:/ruby/something.rb"

Copy link
Contributor Author

@janko janko Oct 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, but my understanding is that it's because of the conversion done in URI::Generic.from_path. The path that's passed to that method should be correct even Windows, shouldn't it? That's what's stored in raw_path.

end

uri = build(scheme: scheme, path: escaped_path, fragment: fragment)
uri.raw_path = path
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont think we should do that, we should rather split whole work into platform specifci path unscaping/escaping and then use the right code paths on the right platform, also the result can be cached on the entry level

Copy link
Contributor Author

@janko janko Oct 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The unescaping was done regardless of the platform, and that was the main bottleneck in my profiling. The secondary bottleneck was the Regex#match? in the Windows code path, that could arguably be skipped non non-Windows platforms. But since I skipped the reconstruction of the file path entirely, there was no need.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a difference between raw_path and simply path (already present in the URI object)?

Copy link
Contributor Author

@janko janko Oct 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, path is the URI-escaped version of file path, while raw_path is the unchanged file path. For example, path will keep any spaces in the path percent-encoded:

uri = URI::Generic.from_path(path: "/Users/janko/Library/Application Support")
uri.path     # => "/Users/janko/Library/Application%20Support"
uri.raw_path # => "/Users/janko/Library/Application Support"

The problem is that with the URI-escaping we lose the information of the valid file path that was passed to URI::Generic.from_path, so URI::Generic#to_standardized_path needs to reconstruct it.

I proposed storing the original valid file path to avoid having to do that work.

Copy link
Member

@vinistock vinistock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The insight that we can filter entries before comparing definitely makes sense and we should try to do it.

That said, I'm not sure we need raw_path when path already exists and skipping the to_standardized_path conversions might produce incorrect results on Windows.

end

uri = build(scheme: scheme, path: escaped_path, fragment: fragment)
uri.raw_path = path
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a difference between raw_path and simply path (already present in the URI object)?


def fuzzy_search
@index.fuzzy_search(@query) do |entry|
file_path = entry.uri.raw_path
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It won't work properly because the URI path and file_path aren't the same thing on Windows and our comparison for not_in_dependencies will fail.

uri = URI("file:///C:/ruby/something.rb")

# The URI's path is not a valid file path
uri.path # => "/C:/ruby/something.rb"

# It's the handling of `to_standardized_path` that turns it into the correct one
uri.to_standardized_path # => "C:/ruby/something.rb"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Improve workspace/symbol request's performance

3 participants