Search with hypothetical documents #711

ggordonhall · 2023-06-30T14:24:52Z

Our semantic search pipeline sometimes fails to return relevant information. The queries that the agent generates do not overlap at all with the correct documents. For example, for the user query which ml library do we use? bloop might make a semantic search for ml library or machine learning library, neither of which overlap with ort or onnxruntime.

This PR implements a popular approach to tackling this recall problem. As well as making a semantic search with the query, use an LLM to generate a set of hypothetical documents that could answer the query and search w.r.t. those too. https://arxiv.org/abs/2212.10496

We take the user query and ask GPT-3.5 to generate three code snippets in a variety of languages that could answer it. We then batch search Qdrant for all of those queries at once (very open to more elegant implementations of this).

ggordonhall added 5 commits July 3, 2023 12:18

wip: hyde

e9ed717

wip: batch search hyde snippets

aaa57d0

wip: remove logs

0e0e446

cleanup

1262486

deduplicate w.r.t. mean pooled vector

0a5cf8e

ggordonhall force-pushed the gabriel/blo-1183-diverse-search-queries branch from 262a688 to 0a5cf8e Compare July 3, 2023 11:21

log hyde queries in analytics

bdbb191

ggordonhall requested review from anastasiya1155 and calyptobai July 3, 2023 17:57

ggordonhall marked this pull request as ready for review July 3, 2023 17:57

ggordonhall requested review from rsdy and removed request for anastasiya1155 July 3, 2023 17:58

rsdy approved these changes Jul 4, 2023

View reviewed changes

ggordonhall merged commit a1221c4 into main Jul 4, 2023

ggordonhall deleted the gabriel/blo-1183-diverse-search-queries branch July 4, 2023 16:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Search with hypothetical documents #711

Search with hypothetical documents #711

Uh oh!

ggordonhall commented Jun 30, 2023 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Search with hypothetical documents #711

Search with hypothetical documents #711

Uh oh!

Conversation

ggordonhall commented Jun 30, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ggordonhall commented Jun 30, 2023 •

edited

Loading