-
Notifications
You must be signed in to change notification settings - Fork 41
Open
Description
In first tests, https://github.com/google/zoekt is between 2-10x faster than codesearch and degrades much more gracefully for pathological queries (queries which have many potential matches).
For 1.4G of source code, zoekt writes a 1.7G index, which is a 1.21x blow-up. Our nodes currently have 22-24G used and 52-54G available, so disk-wise, we could actually switch to zoekt.
TODO list:
- How can we keep our incremental indexing, i.e. could we store one zoekt shard per package, and/or could we merge the per-package shards into a single big shard?
- zoekt by default indexes into 1 file per repository, so if we treat one debian package as one repository, we already get cheap updates.
- Which features (query keywords) would we need to drop, which could we keep with a compatibility layer?
- Do we need to fork zoekt to get all the features our search result page has (context lines etc.)?
- zoekt does not sort the results within a file, at least not within its own UI
- there are no context lines around matches in zoekt
- How do we get our own ranking into zoekt?
- Could we use the repo/branch feature of zoekt for multiple Debian versions (e.g. sid, testing, …)?
- How much extra disk space would adding other Debian versions need?
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels