Skip to content

Commit 44dc959

Browse files
authored
Improve pinecone hybrid search retriever adding metadata support (langchain-ai#5098)
# Improve pinecone hybrid search retriever adding metadata support I simply remove the hardwiring of metadata to the existing implementation allowing one to pass `metadatas` attribute to the constructors and in `get_relevant_documents`. I also add one missing pip install to the accompanying notebook (I am not adding dependencies, they were pre-existing). First contribution, just hoping to help, feel free to critique :) my twitter username is `@andreliebschner` While looking at hybrid search I noticed langchain-ai#3043 and langchain-ai#1743. I think the former can be closed as following the example right now (even prior to my improvements) works just fine, the latter I think can be also closed safely, maybe pointing out the relevant classes and example. Should I reply those issues mentioning someone? @dev2049, @hwchase17 --------- Co-authored-by: Andreas Liebschner <a.liebschner@shopfully.com>
1 parent 5cd1210 commit 44dc959

File tree

2 files changed

+28
-5
lines changed

2 files changed

+28
-5
lines changed

docs/modules/indexes/retrievers/examples/pinecone_hybrid_search.ipynb

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@
2424
"metadata": {},
2525
"outputs": [],
2626
"source": [
27-
"#!pip install pinecone-client"
27+
"#!pip install pinecone-client pinecone-text"
2828
]
2929
},
3030
{

langchain/retrievers/pinecone_hybrid_search.py

Lines changed: 27 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,7 @@ def create_index(
1818
embeddings: Embeddings,
1919
sparse_encoder: Any,
2020
ids: Optional[List[str]] = None,
21+
metadatas: Optional[List[dict]] = None,
2122
) -> None:
2223
batch_size = 32
2324
_iterator = range(0, len(contexts), batch_size)
@@ -38,8 +39,15 @@ def create_index(
3839
# extract batch
3940
context_batch = contexts[i:i_end]
4041
batch_ids = ids[i:i_end]
42+
metadata_batch = (
43+
metadatas[i:i_end] if metadatas else [{} for _ in context_batch]
44+
)
4145
# add context passages as metadata
42-
meta = [{"context": context} for context in context_batch]
46+
meta = [
47+
{"context": context, **metadata}
48+
for context, metadata in zip(context_batch, metadata_batch)
49+
]
50+
4351
# create dense vectors
4452
dense_embeds = embeddings.embed_documents(context_batch)
4553
# create sparse vectors
@@ -78,8 +86,20 @@ class Config:
7886
extra = Extra.forbid
7987
arbitrary_types_allowed = True
8088

81-
def add_texts(self, texts: List[str], ids: Optional[List[str]] = None) -> None:
82-
create_index(texts, self.index, self.embeddings, self.sparse_encoder, ids=ids)
89+
def add_texts(
90+
self,
91+
texts: List[str],
92+
ids: Optional[List[str]] = None,
93+
metadatas: Optional[List[dict]] = None,
94+
) -> None:
95+
create_index(
96+
texts,
97+
self.index,
98+
self.embeddings,
99+
self.sparse_encoder,
100+
ids=ids,
101+
metadatas=metadatas,
102+
)
83103

84104
@root_validator()
85105
def validate_environment(cls, values: Dict) -> Dict:
@@ -114,7 +134,10 @@ def get_relevant_documents(self, query: str) -> List[Document]:
114134
)
115135
final_result = []
116136
for res in result["matches"]:
117-
final_result.append(Document(page_content=res["metadata"]["context"]))
137+
context = res["metadata"].pop("context")
138+
final_result.append(
139+
Document(page_content=context, metadata=res["metadata"])
140+
)
118141
# return search results as json
119142
return final_result
120143

0 commit comments

Comments
 (0)