chore: performing sparse search with query expansion in langchain #443

bruvduroiu · 2025-05-08T13:49:28Z

Problem

Showcase the use of Pinecone sparse embeddings and sparse indexes for in situations where dense vector search might miss specific technical terms.

Solution

use Pinecone sparse embeddings
use query expansion technique
allow Langchain agent to search for specific technical terms from general queries ("Which tech companies were top performers this week?" transforms into vector search about "GOOGL performance", "NVIDIA Performance", ...

Type of Change

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
This change requires a documentation update
Infrastructure change (CI configs, etc)
Non-code change (docs, etc)
None of the above: (updates to documentation)

Test Plan

Describe specific steps for validating this change.

review-notebook-app · 2025-05-08T13:49:34Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

jamescalam · 2025-05-12T08:01:36Z

move imports to the cells that they're used — it makes it easier for the user to see where various objects etc of coming from as they're working through the code
swap to using HF datasets
the cell where you're setting up the index is too complicated — it needs breaking apart and simplifying, see other code examples (such as rag-chatbot.ipynb) — try to avoid doing multiple actions in a single cell and if we do, explain each new step with a comment above/next to the code
explanation in markdown on "here we are setting up the index, dimension is None because X, metric, etc" — also need to point people to where to get their API keys and setup as we do in other notebooks
missing explanation of what @tool is doing and why we need it
random from langchain.memory import ConversationBufferMemory and from langchain.agents import create_tool_calling_agent, AgentExecutor imports mid-cell
expansion_llm = ChatOpenAI(temperature=0.7) used in query expansion tool can be a smaller LLM imo, either gpt-4.1-nano or gpt-4.1-mini — using smaller model also shows another reason why we might use query expansion as a separate tool rather than just built in to the prompting etc
we shouldn't use create_tool_calling_agent — better use LCEL, can refer to aurelio langchain course code
might be a good idea to explicitly define which model we're using for llm = ChatOpenAI()

chore: review and edits

jamescalam · 2025-05-29T11:33:12Z

moving to in-repo PR

…) (#453) ## Problem Showcase the use of Pinecone sparse embeddings and sparse indexes for in situations where dense vector search might miss specific technical terms. ## Solution - use Pinecone sparse embeddings - use query expansion technique - allow Langchain agent to search for specific technical terms from general queries ("Which tech companies were top performers this week?" transforms into vector search about "GOOGL performance", "NVIDIA Performance", ... ## Type of Change - [ ] Bug fix (non-breaking change which fixes an issue) - [ ] New feature (non-breaking change which adds functionality) - [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected) - [ ] This change requires a documentation update - [ ] Infrastructure change (CI configs, etc) - [ ] Non-code change (docs, etc) - [X] None of the above: (updates to documentation) ## Test Plan Describe specific steps for validating this change. --------- ## Problem Describe the purpose of this change. What problem is being solved and why? ## Solution Describe the approach you took. Link to any relevant bugs, issues, docs, or other resources. ## Type of Change - [ ] Bug fix (non-breaking change which fixes an issue) - [ ] New feature (non-breaking change which adds functionality) - [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected) - [ ] This change requires a documentation update - [ ] Infrastructure change (CI configs, etc) - [ ] Non-code change (docs, etc) - [ ] None of the above: (explain here) ## Test Plan Describe specific steps for validating this change. --------- Co-authored-by: Bogdan Buduroiu <panini-croaky.0m@icloud.com>

chore: performing sparse search with query expansion in langchain

de9df49

jamescalam added 3 commits May 28, 2025 23:46

chore: review and edits

022e2c4

fix: prompting location incorrect for tool

f20d27c

Merge pull request #1 from aurelio-labs/james/tweaks-sparse-query-exp

92089d9

chore: review and edits

jamescalam changed the base branch from master to james/sparse-example May 29, 2025 11:32

jamescalam merged commit 9c59591 into pinecone-io:james/sparse-example May 29, 2025
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

chore: performing sparse search with query expansion in langchain #443

chore: performing sparse search with query expansion in langchain #443

Uh oh!

bruvduroiu commented May 8, 2025

Uh oh!

review-notebook-app bot commented May 8, 2025

Uh oh!

jamescalam commented May 12, 2025 •

edited

Loading

Uh oh!

Uh oh!

jamescalam commented May 29, 2025

Uh oh!

Uh oh!

chore: performing sparse search with query expansion in langchain #443

chore: performing sparse search with query expansion in langchain #443

Uh oh!

Conversation

bruvduroiu commented May 8, 2025

Problem

Solution

Type of Change

Test Plan

Uh oh!

review-notebook-app bot commented May 8, 2025

Uh oh!

jamescalam commented May 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

jamescalam commented May 29, 2025

Uh oh!

Uh oh!

jamescalam commented May 12, 2025 •

edited

Loading