score: introduce query.Boost to scale score #728

keegancsmith · 2024-01-28T07:01:05Z

This commit introduces a new primitive Boost to our query language. It allows boosting (or dampening) the contribution to the score a query atoms will match contribute.

To achieve this we introduce boostMatchTree which records this weight. We then adjust the visitMatches to take an initial score weight (1.0), and then each time we recurse through a boostMatchTree the score weight is multiplied by the boost weight. Additionally candidateMatch now has a new field, scoreWeight, which records the weight at time of candidate collection. Without boosting in the query this value will always be 1.

Finally when scoring a candidateMatch we take the final score for it and multiply it by scoreWeight.

Note: we do not expose a way to set this in the query language, only the query API.

Test Plan: Manual testing against webserver via the new phrase-boost URL param. Additionally updated ranking tests to use the phrase booster.

stat	before	after
recall@1	7 (50%)	9 (64%)
recall@5	9 (64%)	11 (79%)
mrr	0.579471	0.710733

keegancsmith · 2024-01-28T07:01:39Z

Couldn't help but do a little hacking this morning. It hasn't been tested yet, but would appreciate feedback on it / the approach/idea.

keegancsmith · 2024-01-29T10:18:03Z

Alright after some experimental testing a boost of 20 works quite well! I updated our e2e ranking tests:

stat	before	after
recall@1	7 (50%)	9 (64%)
recall@5	9 (64%)	11 (79%)
mrr	0.579471	0.710733

Screenshots below on the same corpus as our e2e ranking tests

go run ./cmd/zoekt-webserver -listen 127.0.0.1:6070 -print -index /tmp/zoekt-test-ranking-shards-keegan

This commit introduces a new primitive Boost to our query language. It allows boosting (or dampening) the contribution to the score a query atoms will match contribute. To achieve this we introduce boostMatchTree which records this weight. We then adjust the visitMatches to take an initial score weight (1.0), and then each time we recurse through a boostMatchTree the score weight is multiplied by the boost weight. Additionally candidateMatch now has a new field, scoreWeight, which records the weight at time of candidate collection. Without boosting in the query this value will always be 1. Finally when scoring a candidateMatch we take the final score for it and multiply it by scoreWeight. Note: we do not expose a way to set this in the query language, only the query API. Test Plan: Manual testing against webserver via the new phrase-boost URL param. Additionally updated ranking tests to use the phrase booster.

keegancsmith requested review from jtibshirani, camdencheek and stefanhengl January 28, 2024 07:01

keegancsmith force-pushed the k/query-boost branch 3 times, most recently from 9cb40ac to 818a9c5 Compare January 29, 2024 10:14

keegancsmith force-pushed the k/query-boost branch 3 times, most recently from 0611e63 to efdcd9e Compare January 29, 2024 11:19

keegancsmith force-pushed the k/query-boost branch from efdcd9e to 6ab827f Compare January 29, 2024 11:29

stefanhengl approved these changes Jan 29, 2024

View reviewed changes

keegancsmith merged commit 340c5f8 into main Jan 29, 2024
8 checks passed

keegancsmith deleted the k/query-boost branch January 29, 2024 11:33

jtibshirani mentioned this pull request Jan 31, 2024

☂️ Search: simpler query language inspired by "keyword search" sourcegraph/sourcegraph-public-snapshot#58815

Closed

17 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

score: introduce query.Boost to scale score #728

score: introduce query.Boost to scale score #728

keegancsmith commented Jan 28, 2024 •

edited

Loading

keegancsmith commented Jan 28, 2024

keegancsmith commented Jan 29, 2024

score: introduce query.Boost to scale score #728

score: introduce query.Boost to scale score #728

Conversation

keegancsmith commented Jan 28, 2024 • edited Loading

keegancsmith commented Jan 28, 2024

keegancsmith commented Jan 29, 2024

keegancsmith commented Jan 28, 2024 •

edited

Loading