Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

score: introduce query.Boost to scale score #728

Merged
merged 1 commit into from
Jan 29, 2024
Merged

Conversation

keegancsmith
Copy link
Member

@keegancsmith keegancsmith commented Jan 28, 2024

This commit introduces a new primitive Boost to our query language. It allows boosting (or dampening) the contribution to the score a query atoms will match contribute.

To achieve this we introduce boostMatchTree which records this weight. We then adjust the visitMatches to take an initial score weight (1.0), and then each time we recurse through a boostMatchTree the score weight is multiplied by the boost weight. Additionally candidateMatch now has a new field, scoreWeight, which records the weight at time of candidate collection. Without boosting in the query this value will always be 1.

Finally when scoring a candidateMatch we take the final score for it and multiply it by scoreWeight.

Note: we do not expose a way to set this in the query language, only the query API.

Test Plan: Manual testing against webserver via the new phrase-boost URL param. Additionally updated ranking tests to use the phrase booster.

stat before after
recall@1 7 (50%) 9 (64%)
recall@5 9 (64%) 11 (79%)
mrr 0.579471 0.710733

@keegancsmith
Copy link
Member Author

Couldn't help but do a little hacking this morning. It hasn't been tested yet, but would appreciate feedback on it / the approach/idea.

@keegancsmith keegancsmith force-pushed the k/query-boost branch 3 times, most recently from 9cb40ac to 818a9c5 Compare January 29, 2024 10:14
@keegancsmith
Copy link
Member Author

Alright after some experimental testing a boost of 20 works quite well! I updated our e2e ranking tests:

stat before after
recall@1 7 (50%) 9 (64%)
recall@5 9 (64%) 11 (79%)
mrr 0.579471 0.710733

Screenshots below on the same corpus as our e2e ranking tests

go run ./cmd/zoekt-webserver -listen 127.0.0.1:6070 -print -index /tmp/zoekt-test-ranking-shards-keegan

image

image

@keegancsmith keegancsmith force-pushed the k/query-boost branch 3 times, most recently from 0611e63 to efdcd9e Compare January 29, 2024 11:19
This commit introduces a new primitive Boost to our query language. It
allows boosting (or dampening) the contribution to the score a query
atoms will match contribute.

To achieve this we introduce boostMatchTree which records this weight.
We then adjust the visitMatches to take an initial score weight (1.0),
and then each time we recurse through a boostMatchTree the score weight
is multiplied by the boost weight. Additionally candidateMatch now has a
new field, scoreWeight, which records the weight at time of candidate
collection. Without boosting in the query this value will always be 1.

Finally when scoring a candidateMatch we take the final score for it and
multiply it by scoreWeight.

Note: we do not expose a way to set this in the query language, only the
query API.

Test Plan: Manual testing against webserver via the new phrase-boost URL
param. Additionally updated ranking tests to use the phrase booster.
@keegancsmith keegancsmith merged commit 340c5f8 into main Jan 29, 2024
8 checks passed
@keegancsmith keegancsmith deleted the k/query-boost branch January 29, 2024 11:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants