Speed up workspace symbol search #3792

janko · 2025-10-25T12:11:18Z

Motivation

I want workspace symbol searches to be faster so that I use them more often in my code editor.

Implementation

Currently the whole index is fuzzy matched, and then dependency/private entries are filtered out. I flipped this around, by first filtering out dependency/private entires, and then doing the fuzzy matching on the remainder. In theory, this should be faster, assuming that filtering is faster than fuzzy matching.

However, this wasn't faster initially, and I quickly found that URI::Generic#full_path was the bottleneck (where previously it was DidYouMean::JaroWinkler). The path unescaping and Window handling was adding overhead, so I stored the raw path in the URI.

Automated Tests

I didn't update automated tests.

Manual Tests

For my Rails application, this optimization made workspace/symbol requests 4x faster.

We're doing extra work fuzzy matching across the whole index, even though only a subset of entries are considered valid results. We can filter out dependency/private entries *before* fuzzy matching, which should speed things up, assuming that fuzzy matching is more expensive than filtering.

Doing filtering brefore fuzzy matching didn't improve performance, because retrieving full paths of entries is slow. Since the file paths are escaped in URIs, they're currently being unescaped with Windows handling in order to retrieve the raw path. This is extra work, because we know the raw path at the time we're building the URI, we just didn't store that information. Storing the raw path in the URI and retrieving it speeds up workspace symbol search ~4x in my application.

graphite-app · 2025-10-25T12:11:26Z

How to use the Graphite Merge Queue

Add the label graphite-merge to this PR to add it to the merge queue.

You must have a Graphite account in order to use the merge queue. Sign up using this link.

_{An organization admin has enabled the Graphite Merge Queue in this repository.} _{Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue.}

lib/ruby_lsp/requests/workspace_symbol.rb

lib/ruby_indexer/lib/ruby_indexer/index.rb

pkondzior · 2025-10-26T07:20:54Z

lib/ruby_lsp/requests/workspace_symbol.rb

+
+      def fuzzy_search
+        @index.fuzzy_search(@query) do |entry|
+          file_path = entry.uri.raw_path


I am not sure if this is the right approach, we could rather cache this on entry level instead of skipping unscapping parser

Why do the unescaping at all? We had the full path when building the URI, why reconstruct it?

It won't work properly because the URI path and file_path aren't the same thing on Windows and our comparison for not_in_dependencies will fail.

uri = URI("file:///C:/ruby/something.rb") # The URI's path is not a valid file path uri.path # => "/C:/ruby/something.rb" # It's the handling of `to_standardized_path` that turns it into the correct one uri.to_standardized_path # => "C:/ruby/something.rb"

Correct, but my understanding is that it's because of the conversion done in URI::Generic.from_path. The path that's passed to that method should be correct even Windows, shouldn't it? That's what's stored in raw_path.

pkondzior · 2025-10-26T07:21:57Z

lib/ruby_indexer/lib/ruby_indexer/uri.rb

        end

        uri = build(scheme: scheme, path: escaped_path, fragment: fragment)
+        uri.raw_path = path


I dont think we should do that, we should rather split whole work into platform specifci path unscaping/escaping and then use the right code paths on the right platform, also the result can be cached on the entry level

The unescaping was done regardless of the platform, and that was the main bottleneck in my profiling. The secondary bottleneck was the Regex#match? in the Windows code path, that could arguably be skipped non non-Windows platforms. But since I skipped the reconstruction of the file path entirely, there was no need.

Is there a difference between raw_path and simply path (already present in the URI object)?

Yes, path is the URI-escaped version of file path, while raw_path is the unchanged file path. For example, path will keep any spaces in the path percent-encoded:

uri = URI::Generic.from_path(path: "/Users/janko/Library/Application Support") uri.path # => "/Users/janko/Library/Application%20Support" uri.raw_path # => "/Users/janko/Library/Application Support"

The problem is that with the URI-escaping we lose the information of the valid file path that was passed to URI::Generic.from_path, so URI::Generic#to_standardized_path needs to reconstruct it.

I proposed storing the original valid file path to avoid having to do that work.

vinistock

The insight that we can filter entries before comparing definitely makes sense and we should try to do it.

That said, I'm not sure we need raw_path when path already exists and skipping the to_standardized_path conversions might produce incorrect results on Windows.

vinistock · 2025-10-28T14:11:30Z

lib/ruby_indexer/lib/ruby_indexer/uri.rb

        end

        uri = build(scheme: scheme, path: escaped_path, fragment: fragment)
+        uri.raw_path = path


Is there a difference between raw_path and simply path (already present in the URI object)?

vinistock · 2025-10-28T14:15:48Z

lib/ruby_lsp/requests/workspace_symbol.rb

+
+      def fuzzy_search
+        @index.fuzzy_search(@query) do |entry|
+          file_path = entry.uri.raw_path


It won't work properly because the URI path and file_path aren't the same thing on Windows and our comparison for not_in_dependencies will fail.

uri = URI("file:///C:/ruby/something.rb") # The URI's path is not a valid file path uri.path # => "/C:/ruby/something.rb" # It's the handling of `to_standardized_path` that turns it into the correct one uri.to_standardized_path # => "C:/ruby/something.rb"

janko added 2 commits October 25, 2025 13:28

janko requested a review from a team as a code owner October 25, 2025 12:11

github-actions bot added the cla-needed label Oct 25, 2025

janko mentioned this pull request Oct 25, 2025

Improve workspace/symbol request's performance #2660

Open

1 task

pkondzior reviewed Oct 26, 2025

View reviewed changes

lib/ruby_lsp/requests/workspace_symbol.rb Show resolved Hide resolved

pkondzior reviewed Oct 26, 2025

View reviewed changes

lib/ruby_indexer/lib/ruby_indexer/index.rb Show resolved Hide resolved

pkondzior reviewed Oct 26, 2025

View reviewed changes

janko mentioned this pull request Oct 27, 2025

Improve handling of class variables, aliases and constant aliases #3784

Closed

vinistock reviewed Oct 28, 2025

View reviewed changes

vinistock mentioned this pull request Nov 6, 2025

Add Entry#in_dependencies? and refactor dependency checks #3806

Open

Speed up workspace symbol search #3792

Are you sure you want to change the base?

Speed up workspace symbol search #3792

Uh oh!

Conversation

janko commented Oct 25, 2025

Motivation

Implementation

Automated Tests

Manual Tests

Uh oh!

graphite-app bot commented Oct 25, 2025

How to use the Graphite Merge Queue

Uh oh!

Uh oh!

Uh oh!

pkondzior Oct 26, 2025

Choose a reason for hiding this comment

Uh oh!

janko Oct 26, 2025

Choose a reason for hiding this comment

Uh oh!

vinistock Oct 28, 2025

Choose a reason for hiding this comment

Uh oh!

janko Oct 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pkondzior Oct 26, 2025

Choose a reason for hiding this comment

Uh oh!

janko Oct 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vinistock Oct 28, 2025

Choose a reason for hiding this comment

Uh oh!

janko Oct 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vinistock left a comment

Choose a reason for hiding this comment

Uh oh!

vinistock Oct 28, 2025

Choose a reason for hiding this comment

Uh oh!

vinistock Oct 28, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

janko Oct 28, 2025 •

edited

Loading

janko Oct 26, 2025 •

edited

Loading

janko Oct 28, 2025 •

edited

Loading