-
Notifications
You must be signed in to change notification settings - Fork 28
Rewrite scoring algorithm to support run of consecutive character, fix acronyms and add optimal selection of character. #22
Conversation
Don't forget that Travis is running on Linux and AppVeyor on Windows -> Different path separators. |
…rence between window/linux)
it "returns the result in order", -> | ||
candidates = [ | ||
'Find And Replace: Selet All', | ||
'Settings View: Uninstall Packages', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Settings View: View Installed Themes
is missing from this list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, because current ranking was not optimal so I did not wanted to set current ranking as spec.
On the other side if I place "optimal" ranking as per @plrenaudin the build will fail...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand there's no "nice to have" spec, it's pass all or the build fail
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pity.
But even without that this version of the patch is still an improvement over the current situation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done :) Added support for your usecase in both spec and scoring (implemented as an extra bonus if exact match happens rigth after token separator)
Maybe include the test case from #18 in the spec? |
And the test case from atom/atom#7783 perhaps? |
Brilliant! I hope this will get through! |
One thing I miss by a long shot is this spec: It only pass because that bypass Which with the new scoring scheme would be considered a bug. Reason is that the various bonus, make it very hard to compute a meaninfull upperbound, especially things that act in the middle of the string like CamelCase, position in string bonus etc. Also I believe in this project usage a accurate relative score ordering matter more than saying |
I'd also ask permission to remove this line So if one search for "git push", i get token information and be able to do exact match for "git" then for "push". Or even simply for "git push" (because "gitpush" will never exact match "git push") |
/cc @kevinsawicki |
@jeancroy, regarding score-spec.coffee, I think the only thing that makes sense is to compare relative scores, and possibly 0 scores (if that's the minimum). What we want tests for is sort order, and absolute comparisons like the one at https://github.com/atom/fuzzaldrin/blob/master/spec/score-spec.coffee#L6 doesn't really do that, and IMO you should just remove that test together with the bypass at https://github.com/atom/fuzzaldrin/blob/master/src/fuzzaldrin.coffee#L18. And I'd apply similar reasoning for https://github.com/atom/fuzzaldrin/blob/master/src/fuzzaldrin.coffee#L12; if you can remove it and honestly say that any specs that start failing aren't relevant any more, go for it! Those lines are there to solve a problem, and if that problem isn't there any more neither should those lines. |
+1 on what @walles said |
…tched by either a another space or a slash. Non optional characters are called core. Strict isMatch is applied only on core characters of query. Soft score have access to the full query for scoring. This replace the regex that strip space from query and increase precision for multi word query eg "git push". To be able to do that without recomputing core for each item, it is done at the filter() level.
#21 Added suffix bonus to mirror prefix bonus, exact token match will get both bonus. Now define optional characters as a whitelist instead of a blacklist. This will better support text with no latin character.
…ild fails) Added spec to test for that preference Added spec to confirm we handle issue18 #18 Cleaned up comments
@walles I really appreciate what you do. Despite contributing a considerable amount of code, I'm still learning etiquette of open source. |
@jeancroy :) OTOH, all these things are things I'd have said at work as well. And I do appreciate you both coding and listening. Thanks @jeancroy! The thing is also that I thought about diving into this myself, but since you're already doing such a great job with it I think my time is better spent reviewing. |
Great job guys, I'm currently testing the fuzzy-finder using this PR. As expected, it's quite slow but OK IMHO. |
This patch (avoids many calls to diff --git a/src/scorer.coffee b/src/scorer.coffee
index bbbc1f2..bf54696 100644
--- a/src/scorer.coffee
+++ b/src/scorer.coffee
@@ -49,6 +49,9 @@ opt_char_re = /[\ \-\_\xE3\xE0\xE1\xE4\xE2\xE6\u1EBD\xE8\xE9\xEB\xEA\xEC\xED\xEF
#
exports.score = score = (subject, query, ignore) ->
+ subject_l = subject.toLowerCase()
+ query_l = query.toLowerCase()
+
m = query.length + 1
n = subject.length + 1
@@ -78,7 +81,7 @@ exports.score = score = (subject, query, ignore) ->
# Score the options
gapA = gapArow[j] = Math.max(gapArow[j] + we, vrow[j] + wo)
gapB = Math.max(gapB + we, vrow[j - 1] + wo)
- align = vd + scoreChar(query, subject, i - 1, j - 1)
+ align = vd + scoreChar(query, query_l, subject, subject_l, i - 1, j - 1)
vd = vrow[j]
#Get the best option
@@ -95,7 +98,7 @@ exports.score = score = (subject, query, ignore) ->
lpos = m - n - 1
#sustring bonus, start of string bonus
- if ( p = subject.toLowerCase().indexOf(query.toLowerCase())) > -1
+ if ( p = subject_l.indexOf(query_l)) > -1
vmax += wex * m * (1.0 + 5.0 / (5.0 + p))
#sustring happens right after a separator (prefix)
@@ -108,11 +111,13 @@ exports.score = score = (subject, query, ignore) ->
return vmax
-scoreChar = (query, subject, i, j) ->
+scoreChar = (query, query_l, subject, subject_l, i, j) ->
qi = query[i]
+ qi_l = query_l[i]
sj = subject[j]
+ sj_l = subject_l[j]
- if qi.toLowerCase() == sj.toLowerCase()
+ if qi_l == sj_l
#Proper casing bonus
bonus = if qi == sj then wc else 0
@@ -129,13 +134,14 @@ scoreChar = (query, subject, i, j) ->
#get previous char
prev_s = subject[j - 1]
+ prev_s_l = subject_l[j - 1]
prev_q = query[i - 1]
#match FOLLOW a separator
return wa + bonus if ( prev_s of sep_map) or ( prev_q of sep_map )
#match IS Capital in camelCase (preceded by lowercase)
- return wa + bonus if (sj == sj.toUpperCase() and prev_s == prev_s.toLowerCase())
+ return wa + bonus if (prev_s == prev_s_l and sj == sj.toUpperCase())
#normal Match, add proper case bonus
return wm + bonus |
Optional are now only space and space like
Hi @walles I like your idea, but I moved the lowercase test out of scoreChar because it was getting a bit ugly with all those arguments, and the whole function was basically an large if lowercase match. Now we gate the call to this function with the lowercase test. |
I agree your version is more readable. Nice! |
"We don't have time for this" and "it's not our priority" after 2 months of waiting and so much people asking. That's why I hate pseudo-opensource - it doesn't matter that you can see the code if you can't fix it. |
@kshnurov Hey now... Even open source needs direction. Atom shouldn't blindly merge changes in, especially overhauls like this. If you care about the future of the editor, a core team reviewing PRs is smart. The fact that @jeancroy is doing all this work is an amazing community contribution, and the fact that Atom's core team is honest about their stance is also a good thing. All that said, I hope fuzzy finder doesn't get ignored for too long! It's one of Atom's biggest flaws IMO. |
Honestly 2 weeks of test as an optional feature is not that bad of a compromise. Personally I was going for at least bumping to a major version. |
@jesseleite users feedback should be taken into account when the direction is chosen, not ignored. Core team reviewing PRs is smart, core team saying "we won't even review this" on the ready PR for one of the biggest flaw (and you're 100% right on that) is not smart at all. I've seen that a dozens of times - it starts with "it's not our priority" on a feature/PR everyone asks for and ends up with a dead software no one uses. |
@kshnurov: Absolutely, but I don't think they are saying "we won't even review this".
^ Seems to me that they are open to the PR, but not ready to skim and merge without thorough review. This PR is a behemoth. My 1st beef with Atom is the fuzzy finder. BUT I also want any solution(s) to be well tested and properly implemented, otherwise we're just layering bandaids on top of bandaids. Be a bit more positive and trust the process :) |
Definitely not. I think getting it in as an option is the first step to changing this. |
@benogle https://github.com/atom/fuzzy-finder/blob/master/lib/fuzzy-finder-view.coffee#L135 Should I bypass the super ? |
Yeah, I suppose you will need to override |
As requested there is now a PR in fuzzy-finder. |
This has been added to atom/master via atom/fuzzy-finder#142. If you build atom from source, it is available via option: |
Closing this as we can continue the discussion over at https://github.com/jeancroy/fuzzaldrin-plus |
Thank you <3 |
For those who want to test it, fuzzaldrin-plus will be an option of autocomplete-plus 2.22 |
Awesome, look forward to it! |
The alternate scoring option makes the search really slow and "jittery" 👎 |
Hi @AbdullahAli , can you give an example of query and context ? The benchmak show it'll run anywhere between 2x faster to 2x slower depending on query. Are you using this from a official stable distribution of atom or using some hack needed before ? |
I haven't found any performance issues with alternate scoring option. It's been great for me! @jeancroy how do we find out how many files are indexed? |
No way to tell index count after it's already indexed? Maybe via console command or something? |
Hey @jeancroy I have ~36k files This is what it looks like when the alternate scoring option is enabled: This is what it looks like with it disabled: In the second video you can see it is smoother and not "jittery" - although the video quality might not be the best, so sorry about that |
In both video it show "reindexing project" so that does not helps. I'll open an issue on fuzzy finder about the debounce delay. (Maybe making a search each 2 char will help reduce pressure) Btw what kind of system do you have ? There's a lot of ruby user that use this and the difference should not be that large. |
Ok so a recent build? Are the file-name confidential ? You can save it to a file like so:
I'm ok with you redacting out name of company , software, and whatever it need to be comfortable. |
This will be available for testing as an option in fuzzy-finder
Please address problems, comments, and suggestions here:
https://github.com/jeancroy/fuzzaldrin-plus
We start a tuning phase, so anything not quite right is welcome.