rustdoc: use a trie for name-based search#133005
Merged
bors merged 3 commits intorust-lang:masterfrom Nov 14, 2024
Merged
Conversation
Preview and profiler results ---------------------------- Here's some quick profiling in Firefox done on the rust compiler docs: - Before: https://share.firefox.dev/3UPm3M8 - After: https://share.firefox.dev/40LXvYb Here's the results for the node.js profiler: - https://notriddle.com/rustdoc-html-demo-15/trie-perf/index.html Here's a copy that you can use to try it out. Compare it with [the nightly]. Try typing `typecheckercontext` one character at a time, slowly. - https://notriddle.com/rustdoc-html-demo-15/compiler-doc-trie/index.html [the nightly]: https://doc.rust-lang.org/nightly/nightly-rustc/ The fuzzy match algo is based on [Fast String Correction with Levenshtein-Automata] and the corresponding implementation code in [moman] and [Lucene]; the bit-packing representation comes from Lucene, but the actual matcher is more based on `fsc.py`. As suggested in the paper, a trie is used to represent the FSA dictionary. The same trie is used for prefix matching. Substring matching is done with a side table of three-character[^1] windows that point into the trie. [Fast String Correction with Levenshtein-Automata]: https://github.com/tpn/pdfs/blob/master/Fast%20String%20Correction%20with%20Levenshtein-Automata%20(2002)%20(10.1.1.16.652).pdf [Lucene]: https://fossies.org/linux/lucene/lucene/core/src/java/org/apache/lucene/util/automaton/Lev1TParametricDescription.java [moman]: https://gitlab.com/notriddle/moman-rustdoc User-visible changes -------------------- I don't expect anybody to notice anything, but it does cause two changes: - Substring matches, in the middle of a name, only apply if there's three or more characters in the search query. - Levenshtein distance limit now maxes out at two. In the old version, the limit was w/3, so you could get looser matches for queries with 9 or more characters[^1] in them. [^1]: technically utf-16 code units
Collaborator
Contributor
Author
| } else { | ||
| const sb = name.charCodeAt(substart); | ||
| let child; | ||
| if (this.children[sb] !== undefined) { |
Member
There was a problem hiding this comment.
Wouldn't it be better to check this.children.length < sb?
Member
There was a problem hiding this comment.
Add this explanation and link on the field definition please. :)
Contributor
Author
There was a problem hiding this comment.
Okay, it's added.
Member
|
Apart from small nits, looks good to me. Performance improvement is really impressive! |
Collaborator
|
Some changes occurred in HTML/CSS/JS. cc @GuillaumeGomez, @jsha |
GuillaumeGomez
approved these changes
Nov 14, 2024
Member
|
Thanks! @bors r+ |
Collaborator
bors
added a commit
to rust-lang-ci/rust
that referenced
this pull request
Nov 14, 2024
…llaumeGomez Rollup of 5 pull requests Successful merges: - rust-lang#132172 (borrowck diagnostics: suggest borrowing function inputs in generic positions) - rust-lang#132649 (add ./x clippy ci) - rust-lang#133005 (rustdoc: use a trie for name-based search) - rust-lang#133034 (update download-rustc comments and default) - rust-lang#133036 (add myself into `users_on_vacation` on triagebot) r? `@ghost` `@rustbot` modify labels: rollup
rust-timer
added a commit
to rust-lang-ci/rust
that referenced
this pull request
Nov 14, 2024
Rollup merge of rust-lang#133005 - notriddle:notriddle/trie-search, r=GuillaumeGomez rustdoc: use a trie for name-based search Potentially rust-lang#131156 — need to try reproducing the problem with `windows` Preview and profiler results ---------------------------- Here's some quick profiling in Firefox done on the rust compiler docs: - Before: https://share.firefox.dev/3UPm3M8 - After: https://share.firefox.dev/40LXvYb Here's the results for the node.js profiler: - https://notriddle.com/rustdoc-html-demo-15/trie-perf/index.html Here's a copy that you can use to try it out. Compare it with [the nightly]. Try typing `typecheckercontext` one character at a time, slowly. - https://notriddle.com/rustdoc-html-demo-15/compiler-doc-trie/index.html [the nightly]: https://doc.rust-lang.org/nightly/nightly-rustc/ The fuzzy match algo is based on [Fast String Correction with Levenshtein-Automata] and the corresponding implementation code in [moman] and [Lucene]; the bit-packing representation comes from Lucene, but the actual matcher is more based on `fsc.py`. As suggested in the paper, a trie is used to represent the FSA dictionary. The same trie is used for prefix matching. Substring matching is done with a side table of three-character[^1] windows that point into the trie. [Fast String Correction with Levenshtein-Automata]: https://github.com/tpn/pdfs/blob/master/Fast%20String%20Correction%20with%20Levenshtein-Automata%20(2002)%20(10.1.1.16.652).pdf [Lucene]: https://fossies.org/linux/lucene/lucene/core/src/java/org/apache/lucene/util/automaton/Lev1TParametricDescription.java [moman]: https://gitlab.com/notriddle/moman-rustdoc User-visible changes -------------------- I don't expect anybody to notice anything, but it does cause two changes: - Substring matches, in the middle of a name, only apply if there's three or more characters in the search query. - Levenshtein distance limit now maxes out at two. In the old version, the limit was w/3, so you could get looser matches for queries with 9 or more characters[^1] in them. - It uses more RAM. - It's faster (assuming you don't swap thrash). [^1]: technically utf-16 code units
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Potentially #131156 — need to try reproducing the problem with
windowsPreview and profiler results
Here's some quick profiling in Firefox done on the rust compiler docs:
Here's the results for the node.js profiler:
Here's a copy that you can use to try it out. Compare it with the nightly. Try typing
typecheckercontextone character at a time, slowly.The fuzzy match algo is based on Fast String Correction with Levenshtein-Automata and the corresponding implementation code in moman and Lucene; the bit-packing representation comes from Lucene, but the actual matcher is more based on
fsc.py. As suggested in the paper, a trie is used to represent the FSA dictionary.The same trie is used for prefix matching. Substring matching is done with a side table of three-character1 windows that point into the trie.
User-visible changes
I don't expect anybody to notice anything, but it does cause two changes:
Footnotes
technically utf-16 code units ↩ ↩2