-
-
Notifications
You must be signed in to change notification settings - Fork 484
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add fuzzy matching algorithm for tab completions #1855
base: develop
Are you sure you want to change the base?
Conversation
Signed-off-by: Snehil Shah <130062020+Snehil-Shah@users.noreply.github.com>
Signed-off-by: Snehil Shah <snehilshah.989@gmail.com>
@kgryte Regarding visually highlighting the matching characters in the completions (ys -> yes). I am a bit confused about how we can achieve this as we are relying on built-in readline module's completer for tab completions. I don't think we can alter on how the completer displays the completions. One of the (rather hacky) ways I tried doing this is by including the bold outputs in the resulting array itself that is returned by the completer callback. Is there any other way I can go around implementing this?? And regarding the algorithm, afaik this isn't based on any prior implementation/art. It's a simple algorithm that I wrote from scratch that matches based on how much the characters in the completion deviate from the input string (distance). |
Hmm...yeah, I hadn't thought about the completer controlling the terminal output. My initial thought was escape sequences, but then you are right, I can see how that might be an issue with auto-completion. Well...one possibility is to just to write our own completer. We don't have to use the built-in. Could write it from scratch. May be worth exploring the Node.js source implementation to see how it handles the completer logic. |
For IPython, looks like they use jedi for auto-completion. In Node.js, fuzzy-matching is not supported. |
Fuzzy matching is not supported in IPython either from what I know. Am I mistaken? |
For IPython, looks like you can enable jedi: https://ipython.readthedocs.io/en/stable/api/generated/IPython.core.completer.html#IPython.core.completer.Completer.use_jedi. In which case, I believe you can get fuzzy auto-complete in IPython: https://jedi.readthedocs.io/en/latest/docs/api.html?highlight=fuzzy#jedi.Script.complete. |
I haven't personally used jedi in IPython, so YMMV. |
@kgryte understood! I'll explore writing our own completer, and studying jedi's fuzzy matching algorithm👍 |
Don't stray too far down the rabbit hole! It may be vast. 😅 |
This was intentional, as matching is "inlined" due to additional loop logic (e.g., We'll want fuzzy matching to be enabled via a setting. This is another thing which users may have preferences over. And as a first pass, we can merge basic functionality in first and then do character highlighting in a follow-up PR, as that is likely to require yet more refactoring and is not strictly necessary in order to get the benefits of fuzzy matching. |
In the JSDoc comment of |
@kgryte I am almost done with the character highlighting part though, can I push the changes and we can iterate over it after? |
@Snehil-Shah Sure. In general, going forward, it would be good to split tasks which can be split into separate PRs, as any additional complexity will increase the risk of PRs getting bogged down in review and refactoring. |
Signed-off-by: Snehil Shah <snehilshah.989@gmail.com>
Signed-off-by: Snehil Shah <snehilshah.989@gmail.com>
Signed-off-by: Snehil Shah <snehilshah.989@gmail.com>
Signed-off-by: Snehil Shah <snehilshah.989@gmail.com>
@kgryte Done with the implementation. Current behaviours of TAB completions:
undo.mp4
It was better before, and now that we have a setting to disable it, in my opinion, we can normally display them just after the exact completions as it doesn't hurt to have more completions that the user might be potentially going for, given the exact ones are up top already. And if the user doesn't want it, they can disable it via settings. |
Re: line separation. I don't think I agree here. By mixing the exact match and fuzzy results, you are making the user search for the exact matches. They, themselves, have to identify the boundaries between fuzzy and exact match. And that is not ideal UX, especially as the fuzzy matches create visual noise. Take your example above. For the exact matches, all matching text is lower case, but, for the fuzzy matches, you have a mix of lowercase, camelcase, and title case. The fuzzy completions are just inherently noisier, as there is no pattern. And because of that noise, you are imposing additional cognitive load on users to identify the "correct" match and to find how their query matches any one specific fuzzy match. In short, you are making users work. And users, generally, try to avoid work. The line separation makes it clear what is exact and what is fuzzy. And a line separation provides structure to results for which, displayed together, lack obvious structure, until you spend time understanding order and implicit grouping. IMO, grouping together is worse UX. On another note, I am wondering if we should limit the number of results in order to avoid choice paralysis. Did a quick search (ref 1 and ref 2), and seems generally agreed to limit fuzzy results to no more than 10 items. Anything more and users spend more time searching than auto-completing. |
@kgryte I understand. |
Agreed. Restricting number of completion candidates only makes sense for fuzzy completions. And yeah, I still lean toward fuzzy completions only when no exact prefix matches. |
Also, fuzzy completions only when no exact match is the less invasive choice to start. If users start clamoring for fuzzy completions even when there are exact prefix matches, should be straightforward to refactor to match the current behavior. And we have this PR as a reference. |
@kgryte great. Will go ahead with fuzzy completions only when there's no exact matches. |
Maybe make the number of displayed fuzzy completions a setting? |
@kgryte we can do that. although I wonder if it's that user-friendly given it's generally abstracted to just a true or false in most editors🤔 |
Another option would be to provide a setting (or settings) for users to adjust the magic parameters which are used to score results. That would achieve the same end as limiting the number of results. But, then again, we could change the underlying scoring algo in the future. Regardless, I am not sure that users benefit all that much from having more than 10 fuzzy choices. One option to make it more "powerful" or relevant is to weight items in a user's history higher. But this makes the implementation more complex. I am tempted to just say limit to 10. If people want more, we can easily make this adjustable. |
@kgryte I get it. will go ahead and keep the setting a Boolean, and will limit the fuzzy results to 10. Is that right? |
Yes.
Yes, but definitely not in this PR. 😅 |
Signed-off-by: Snehil Shah <snehilshah.989@gmail.com>
Signed-off-by: Snehil Shah <snehilshah.989@gmail.com>
@kgryte Done Only fuzzy: Only exact: Currently writing some tests for the completer Btw I think I found a small bug in auto-closing brackets (in develop). when hitting the up arrow from inside the auto-closed pair, it removes the auto-inserted brackets on the right of the cursor automatically. And for some reason, this only happens from the 2nd command onwards. Before: (['<|>']) After hitting UP: (['<|> Tried to debug and no, it's not caused by auto-delete-pairs, and as it is happening from the 2nd command onwards, I presume it's a node bug.. I found a similar bug when using TAB key as noted in |
Signed-off-by: Snehil Shah <snehilshah.989@gmail.com>
@kgryte Done with the tests.. |
Signed-off-by: Snehil Shah <snehilshah.989@gmail.com>
Signed-off-by: Snehil Shah <snehilshah.989@gmail.com>
Signed-off-by: Snehil Shah <snehilshah.989@gmail.com>
Signed-off-by: Snehil Shah <130062020+Snehil-Shah@users.noreply.github.com>
Signed-off-by: Snehil Shah <snehilshah.989@gmail.com>
Signed-off-by: Snehil Shah <snehilshah.989@gmail.com>
Signed-off-by: Snehil Shah <snehilshah.989@gmail.com>
Signed-off-by: Snehil Shah <snehilshah.989@gmail.com>
Signed-off-by: Snehil Shah <snehilshah.989@gmail.com>
@kgryte Done with this PR: The completer works as expected with the pager and is also responsive to terminal resizes. FYI: while running the tests, I was getting these warnings about I added a removeListener when closing the REPL and the warnings seem to go away: Should I push this? |
Ah, yes, forgot to add the remove listener call in the other PR. Rather than make that a part of this PR, I suggest submitting a patch PR which we can merge sooner. |
Resolves #1845
Description
This pull request:
The algorithm revolves around scoring each completion based on the number of matching characters, and the distance between the matching characters in the completion & input.
It easily catches minor spelling mistakes and matches 'similar looking' prefixes (at least from how much I have tested).
Complexity (assuming
startsWith
takes linear time):Before: O(n) per completion, where n = length of input string.
After: O(m+n) per completion, where n = length of input string & m = length of completion
Update: The algorithm is now updated.
Related Issues
This pull request:
Questions
right now fuzzy match will only work for completing expressions and not for incomplete workspaces, file system, tutorial and require statements, as for some reason they are not using the abstracted
filterByPrefix
method. Instead they are reimplementing it (see image).Would do the refactoring once the algorithm is finalized, to fix this!
Update: fixed!
Other
I tried other popular fuzzy auto-completion algorithms too (not many) but completions didn't feel as relevant as I expected from a code completer.
The Levenshtein algorithm, for example, favors suggestions based on how 'similar looking' two strings are, meaning two strings with a higher difference in length would be scored worse (as it requires more insertions), which is not ideal for code completion.
What I haven't done much yet, is explore existing 'code completions' algorithms in IDEs/Editors and see how they are implementing them. Please tell me if I should do that.
The current implementation does work really well though based on the limited cases I've tested it through.
Update: Updated the algorithm!
And AFAIK, node.js and ipython don't have fuzzy auto-completion yet. (correct me if I'm wrong)
Checklist
@stdlib-js/reviewers