Skip to content
This repository was archived by the owner on Jan 2, 2025. It is now read-only.

Conversation

@ggordonhall
Copy link
Contributor

@ggordonhall ggordonhall commented Jun 30, 2023

Our semantic search pipeline sometimes fails to return relevant information. The queries that the agent generates do not overlap at all with the correct documents. For example, for the user query which ml library do we use? bloop might make a semantic search for ml library or machine learning library, neither of which overlap with ort or onnxruntime.

This PR implements a popular approach to tackling this recall problem. As well as making a semantic search with the query, use an LLM to generate a set of hypothetical documents that could answer the query and search w.r.t. those too. https://arxiv.org/abs/2212.10496

We take the user query and ask GPT-3.5 to generate three code snippets in a variety of languages that could answer it. We then batch search Qdrant for all of those queries at once (very open to more elegant implementations of this).

@ggordonhall ggordonhall force-pushed the gabriel/blo-1183-diverse-search-queries branch from 262a688 to 0a5cf8e Compare July 3, 2023 11:21
@ggordonhall ggordonhall marked this pull request as ready for review July 3, 2023 17:57
@ggordonhall ggordonhall requested review from rsdy and removed request for anastasiya1155 July 3, 2023 17:58
@ggordonhall ggordonhall merged commit a1221c4 into main Jul 4, 2023
@ggordonhall ggordonhall deleted the gabriel/blo-1183-diverse-search-queries branch July 4, 2023 16:52
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants