Skip to content

Commit

Permalink
Fix/review (#9)
Browse files Browse the repository at this point in the history
  • Loading branch information
Towhid1 authored Aug 10, 2023
1 parent 3679ef3 commit 207b051
Show file tree
Hide file tree
Showing 5 changed files with 328 additions and 20 deletions.
21 changes: 21 additions & 0 deletions docs/extras/integrations/providers/bageldb.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
# BagelDB

> [BagelDB](https://www.bageldb.ai/) (`Open Vector Database for AI`), is like GitHub for AI data.
It is a collaborative platform where users can create,
share, and manage vector datasets. It can support private projects for independent developers,
internal collaborations for enterprises, and public contributions for data DAOs.

## Installation and Setup

```bash
pip install betabageldb
```


## VectorStore

See a [usage example](/docs/integrations/vectorstores/bageldb).

```python
from langchain.vectorstores import Bagel
```
300 changes: 300 additions & 0 deletions docs/extras/integrations/vectorstores/bageldb.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,300 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# BagelDB\n",
"\n",
"> [BagelDB](https://www.bageldb.ai/) (`Open Vector Database for AI`), is like GitHub for AI data.\n",
"It is a collaborative platform where users can create,\n",
"share, and manage vector datasets. It can support private projects for independent developers,\n",
"internal collaborations for enterprises, and public contributions for data DAOs.\n",
"\n",
"### Installation and Setup\n",
"\n",
"```bash\n",
"pip install betabageldb\n",
"```\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Create VectorStore from texts"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [],
"source": [
"from langchain.vectorstores import Bagel\n",
"\n",
"texts = [\"hello bagel\", \"hello langchain\", \"I love salad\", \"my car\", \"a dog\"]\n",
"# create cluster and add texts\n",
"cluster = Bagel.from_texts(cluster_name=\"testing\", texts=texts)"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[Document(page_content='hello bagel', metadata={}),\n",
" Document(page_content='my car', metadata={}),\n",
" Document(page_content='I love salad', metadata={})]"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# similarity search\n",
"cluster.similarity_search(\"bagel\", k=3)"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[(Document(page_content='hello bagel', metadata={}), 0.27392977476119995),\n",
" (Document(page_content='my car', metadata={}), 1.4783176183700562),\n",
" (Document(page_content='I love salad', metadata={}), 1.5342965126037598)]"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# the score is a distance metric, so lower is better\n",
"cluster.similarity_search_with_score(\"bagel\", k=3)"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [],
"source": [
"# delete the cluster\n",
"cluster.delete_cluster()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Create VectorStore from docs"
]
},
{
"cell_type": "code",
"execution_count": 33,
"metadata": {},
"outputs": [],
"source": [
"from langchain.document_loaders import TextLoader\n",
"from langchain.text_splitter import CharacterTextSplitter\n",
"\n",
"loader = TextLoader(\"../../../state_of_the_union.txt\")\n",
"documents = loader.load()\n",
"text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\n",
"docs = text_splitter.split_documents(documents)[:10]"
]
},
{
"cell_type": "code",
"execution_count": 36,
"metadata": {},
"outputs": [],
"source": [
"# create cluster with docs\n",
"cluster = Bagel.from_documents(cluster_name=\"testing_with_docs\", documents=docs)"
]
},
{
"cell_type": "code",
"execution_count": 37,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Madam Speaker, Madam Vice President, our First Lady and Second Gentleman. Members of Congress and the \n"
]
}
],
"source": [
"# similarity search\n",
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
"docs = cluster.similarity_search(query)\n",
"print(docs[0].page_content[:102])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Get all text/doc from Cluster"
]
},
{
"cell_type": "code",
"execution_count": 53,
"metadata": {},
"outputs": [],
"source": [
"texts = [\"hello bagel\", \"this is langchain\"]\n",
"cluster = Bagel.from_texts(cluster_name=\"testing\", texts=texts)\n",
"cluster_data = cluster.get()"
]
},
{
"cell_type": "code",
"execution_count": 54,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"dict_keys(['ids', 'embeddings', 'metadatas', 'documents'])"
]
},
"execution_count": 54,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# all keys\n",
"cluster_data.keys()"
]
},
{
"cell_type": "code",
"execution_count": 56,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'ids': ['578c6d24-3763-11ee-a8ab-b7b7b34f99ba',\n",
" '578c6d25-3763-11ee-a8ab-b7b7b34f99ba',\n",
" 'fb2fc7d8-3762-11ee-a8ab-b7b7b34f99ba',\n",
" 'fb2fc7d9-3762-11ee-a8ab-b7b7b34f99ba',\n",
" '6b40881a-3762-11ee-a8ab-b7b7b34f99ba',\n",
" '6b40881b-3762-11ee-a8ab-b7b7b34f99ba',\n",
" '581e691e-3762-11ee-a8ab-b7b7b34f99ba',\n",
" '581e691f-3762-11ee-a8ab-b7b7b34f99ba'],\n",
" 'embeddings': None,\n",
" 'metadatas': [{}, {}, {}, {}, {}, {}, {}, {}],\n",
" 'documents': ['hello bagel',\n",
" 'this is langchain',\n",
" 'hello bagel',\n",
" 'this is langchain',\n",
" 'hello bagel',\n",
" 'this is langchain',\n",
" 'hello bagel',\n",
" 'this is langchain']}"
]
},
"execution_count": 56,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# all values and keys\n",
"cluster_data"
]
},
{
"cell_type": "code",
"execution_count": 57,
"metadata": {},
"outputs": [],
"source": [
"cluster.delete_cluster()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Create cluster with metadata & filter using metadata"
]
},
{
"cell_type": "code",
"execution_count": 63,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[(Document(page_content='hello bagel', metadata={'source': 'notion'}), 0.0)]"
]
},
"execution_count": 63,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"texts = [\"hello bagel\", \"this is langchain\"]\n",
"metadatas = [{\"source\": \"notion\"}, {\"source\": \"google\"}]\n",
"\n",
"cluster = Bagel.from_texts(cluster_name=\"testing\", texts=texts, metadatas=metadatas)\n",
"cluster.similarity_search_with_score(\"hello bagel\", where={\"source\": \"notion\"})"
]
},
{
"cell_type": "code",
"execution_count": 64,
"metadata": {},
"outputs": [],
"source": [
"# delete the cluster\n",
"cluster.delete_cluster()"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.12"
},
"orig_nbformat": 4
},
"nbformat": 4,
"nbformat_minor": 2
}
4 changes: 2 additions & 2 deletions libs/langchain/poetry.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

5 changes: 5 additions & 0 deletions libs/langchain/pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -377,6 +377,11 @@ extended_testing = [
"xata",
"xmltodict",
"betabageldb",
"anthropic",
]

scheduled_testing = [
"openai",
]

[tool.ruff]
Expand Down
18 changes: 0 additions & 18 deletions libs/langchain/tests/integration_tests/vectorstores/test_bagel.py
Original file line number Diff line number Diff line change
Expand Up @@ -167,21 +167,3 @@ def test_bagel_update_document() -> None:
docsearch.update_document(document_id=document_id, document=updated_doc)
output = docsearch.similarity_search(updated_content, k=1)
assert output == [Document(page_content=updated_content, metadata={"page": "0"})]


def main() -> None:
"""Bagel intigaration test"""
test_similarity_search()
test_bagel()
test_with_metadatas()
test_with_metadatas_with_scores()
test_with_metadatas_with_scores_using_vector()
test_search_filter()
test_search_filter_with_scores()
test_with_include_parameter()
test_bagel_update_document()
test_with_metadatas_with_scores_using_vector_embe()


if __name__ == "__main__":
main()

0 comments on commit 207b051

Please sign in to comment.