Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -78,6 +78,7 @@ print(docs)

> [!TIP]
> All synchronous functions have corresponding asynchronous functions
> PGVectorStore also supports Hybrid Search which combines multiple search strategies to improve search results.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that a note in the Readme is helpful. We should make sure the code snippet is clear and concise. So instead of this, can we add a header for Hybrid search and the smallest code snippet to get started then link to the how-to?


## ChatMessageHistory

Expand Down
102 changes: 102 additions & 0 deletions examples/pg_vectorstore_how_to.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -686,6 +686,108 @@
"1. For new records, added via `VectorStore` embeddings are automatically generated."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Hybrid Search Vector Store\n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please provide the easy way to get started then a section for how to customize.

"\n",
"A Hybrid Search Vector Store combines multiple lookup strategies to provide more comprehensive and relevant search results. Specifically, it leverages both dense embedding vector search (for semantic similarity) and TSV (Text Search Vector) based keyword search (for lexical matching). This approach is particularly powerful for applications requiring efficient searching through customized text and metadata, especially when a specialized embedding model isn't feasible or necessary.\n",
"\n",
"By integrating both semantic and lexical capabilities, hybrid search helps overcome the limitations of each individual method:\n",
"\n",
"* **Semantic Search**: Excellent for understanding the meaning of a query, even if the exact keywords aren't present. However, it can sometimes miss highly relevant documents that contain the precise keywords but have a slightly different semantic context.\n",
"\n",
"* **Keyword Search**: Highly effective for finding documents with exact keyword matches and is generally fast. Its weakness lies in its inability to understand synonyms, misspellings, or conceptual relationships.\n",
"\n",
"With a `HybridSearchConfig` provided, the `PGVectorStore` class can efficiently manage a hybrid search vector store using PostgreSQL as the backend, automatically handling the creation and population of the necessary TSV columns when possible.\n",
"\n",
"\n",
"Assuming a pre-existing table same as above in PG DB: `products`, which stores product details for an eComm venture.\n",
"\n",
"Here is how this table mapped to `PGVectorStore`:\n",
"\n",
"- **`id_column=\"product_id\"`**: ID column uniquely identifies each row in the products table.\n",
"\n",
"- **`content_column=\"description\"`**: The `description` column contains text descriptions of each product. This text is used by the `embedding_service` to create vectors that go in embedding_column and represent the semantic meaning of each description.\n",
"\n",
"- **`embedding_column=\"embed\"`**: The `embed` column stores the vectors created from the product descriptions. These vectors are used to find products with similar descriptions.\n",
"\n",
"- **`metadata_columns=[\"name\", \"category\", \"price_usd\", \"quantity\", \"sku\", \"image_url\"]`**: These columns are treated as metadata for each product. Metadata provides additional information about a product, such as its name, category, price, quantity available, SKU (Stock Keeping Unit), and an image URL. This information is useful for displaying product details in search results or for filtering and categorization.\n",
"\n",
"- **`metadata_json_column=\"metadata\"`**: The `metadata` column can store any additional information about the products in a flexible JSON format. This allows for storing varied and complex data that doesn't fit into the standard columns.\n"
Comment on lines +710 to +718
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This info is already provided above. Please outline how to use the HybridSearchConfig.

]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from langchain_postgres.v2 import PGVectorStore\n",
"from langchain_postgres.v2.hybrid_search_config import (\n",
" HybridSearchConfig,\n",
" reciprocal_rank_fusion,\n",
")\n",
"\n",
"TABLE_NAME = \"hybrid_search_products\"\n",
"\n",
"hybrid_search_config = HybridSearchConfig(\n",
" tsv_column=\"hybrid_description\",\n",
" tsv_lang=\"pg_catalog.english\",\n",
" fusion_function=reciprocal_rank_fusion,\n",
" fusion_function_parameters={\n",
" \"rrf_k\": 60,\n",
" \"fetch_top_k\": 10,\n",
" },\n",
")\n",
"\n",
"# If a hybrid search config is provided during vector store table creation,\n",
"# the specified TSV column will be automatically created.\n",
"await pg_engine.ainit_vectorstore_table(\n",
" table_name=TABLE_NAME,\n",
" # schema_name=SCHEMA_NAME,\n",
" vector_size=VECTOR_SIZE,\n",
" id_column=\"product_id\",\n",
" content_column=\"description\",\n",
" embedding_column=\"embed\",\n",
" metadata_columns=[\"name\", \"category\", \"price_usd\", \"quantity\", \"sku\", \"image_url\"],\n",
" metadata_json_column=\"metadata\",\n",
" hybrid_search_config=hybrid_search_config,\n",
" store_metadata=True,\n",
")\n",
Comment on lines +745 to +758
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we create individual sections for each of these notes. The inline comments are hard to read.

"\n",
"\n",
"# If a hybrid search config is NOT provided during init_vectorstore_table (above),\n",
"# but only provided during PGVectorStore creation, the specified TSV column\n",
"# is not present and TSV vectors are created dynamically on-the-go for hybrid search.\n",
"vs_hybrid = await PGVectorStore.create(\n",
" pg_engine,\n",
" table_name=TABLE_NAME,\n",
" # schema_name=SCHEMA_NAME,\n",
" embedding_service=embedding,\n",
" # Connect to existing VectorStore by customizing below column names\n",
" id_column=\"product_id\",\n",
" content_column=\"description\",\n",
" embedding_column=\"embed\",\n",
" metadata_columns=[\"name\", \"category\", \"price_usd\", \"quantity\", \"sku\", \"image_url\"],\n",
" metadata_json_column=\"metadata\",\n",
" hybrid_search_config=hybrid_search_config,\n",
")\n",
"\n",
"# Optionally, create an index on hybrid search column name\n",
"await vs_hybrid.aapply_hybrid_search_index()\n",
"\n",
"# Fetch documents from the previopusly created store to fetch product documents\n",
"docs = await custom_store.asimilarity_search(\"products\", k=5)\n",
"# Add data normally to the vector store, which will also add the tsv values in tsv_column\n",
"await vs_hybrid.aadd_documents(docs)\n",
"\n",
"# Use hybrid search\n",
"hybrid_docs = await vs_hybrid.asimilarity_search(\"products\", k=5)\n",
"print(hybrid_docs)"
]
},
{
"cell_type": "markdown",
"metadata": {},
Expand Down