-
Notifications
You must be signed in to change notification settings - Fork 223
Speed up workspace symbol search #3792
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
We're doing extra work fuzzy matching across the whole index, even though only a subset of entries are considered valid results. We can filter out dependency/private entries *before* fuzzy matching, which should speed things up, assuming that fuzzy matching is more expensive than filtering.
Doing filtering brefore fuzzy matching didn't improve performance, because retrieving full paths of entries is slow. Since the file paths are escaped in URIs, they're currently being unescaped with Windows handling in order to retrieve the raw path. This is extra work, because we know the raw path at the time we're building the URI, we just didn't store that information. Storing the raw path in the URI and retrieving it speeds up workspace symbol search ~4x in my application.
How to use the Graphite Merge QueueAdd the label graphite-merge to this PR to add it to the merge queue. You must have a Graphite account in order to use the merge queue. Sign up using this link. An organization admin has enabled the Graphite Merge Queue in this repository. Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue. |
|
|
||
| def fuzzy_search | ||
| @index.fuzzy_search(@query) do |entry| | ||
| file_path = entry.uri.raw_path |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure if this is the right approach, we could rather cache this on entry level instead of skipping unscapping parser
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do the unescaping at all? We had the full path when building the URI, why reconstruct it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It won't work properly because the URI path and file_path aren't the same thing on Windows and our comparison for not_in_dependencies will fail.
uri = URI("file:///C:/ruby/something.rb")
# The URI's path is not a valid file path
uri.path # => "/C:/ruby/something.rb"
# It's the handling of `to_standardized_path` that turns it into the correct one
uri.to_standardized_path # => "C:/ruby/something.rb"There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct, but my understanding is that it's because of the conversion done in URI::Generic.from_path. The path that's passed to that method should be correct even Windows, shouldn't it? That's what's stored in raw_path.
| end | ||
|
|
||
| uri = build(scheme: scheme, path: escaped_path, fragment: fragment) | ||
| uri.raw_path = path |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dont think we should do that, we should rather split whole work into platform specifci path unscaping/escaping and then use the right code paths on the right platform, also the result can be cached on the entry level
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The unescaping was done regardless of the platform, and that was the main bottleneck in my profiling. The secondary bottleneck was the Regex#match? in the Windows code path, that could arguably be skipped non non-Windows platforms. But since I skipped the reconstruction of the file path entirely, there was no need.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a difference between raw_path and simply path (already present in the URI object)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, path is the URI-escaped version of file path, while raw_path is the unchanged file path. For example, path will keep any spaces in the path percent-encoded:
uri = URI::Generic.from_path(path: "/Users/janko/Library/Application Support")
uri.path # => "/Users/janko/Library/Application%20Support"
uri.raw_path # => "/Users/janko/Library/Application Support"The problem is that with the URI-escaping we lose the information of the valid file path that was passed to URI::Generic.from_path, so URI::Generic#to_standardized_path needs to reconstruct it.
I proposed storing the original valid file path to avoid having to do that work.
vinistock
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The insight that we can filter entries before comparing definitely makes sense and we should try to do it.
That said, I'm not sure we need raw_path when path already exists and skipping the to_standardized_path conversions might produce incorrect results on Windows.
| end | ||
|
|
||
| uri = build(scheme: scheme, path: escaped_path, fragment: fragment) | ||
| uri.raw_path = path |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a difference between raw_path and simply path (already present in the URI object)?
|
|
||
| def fuzzy_search | ||
| @index.fuzzy_search(@query) do |entry| | ||
| file_path = entry.uri.raw_path |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It won't work properly because the URI path and file_path aren't the same thing on Windows and our comparison for not_in_dependencies will fail.
uri = URI("file:///C:/ruby/something.rb")
# The URI's path is not a valid file path
uri.path # => "/C:/ruby/something.rb"
# It's the handling of `to_standardized_path` that turns it into the correct one
uri.to_standardized_path # => "C:/ruby/something.rb"
Motivation
Closes #2660
I want workspace symbol searches to be faster so that I use them more often in my code editor.
Implementation
Currently the whole index is fuzzy matched, and then dependency/private entries are filtered out. I flipped this around, by first filtering out dependency/private entires, and then doing the fuzzy matching on the remainder. In theory, this should be faster, assuming that filtering is faster than fuzzy matching.
However, this wasn't faster initially, and I quickly found that
URI::Generic#full_pathwas the bottleneck (where previously it wasDidYouMean::JaroWinkler). The path unescaping and Window handling was adding overhead, so I stored the raw path in the URI.Automated Tests
I didn't update automated tests.
Manual Tests
For my Rails application, this optimization made
workspace/symbolrequests 4x faster.