Skip to content

Commit 207b051

Browse files
authored
Fix/review (#9)
1 parent 3679ef3 commit 207b051

File tree

5 files changed

+328
-20
lines changed

5 files changed

+328
-20
lines changed
Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
# BagelDB
2+
3+
> [BagelDB](https://www.bageldb.ai/) (`Open Vector Database for AI`), is like GitHub for AI data.
4+
It is a collaborative platform where users can create,
5+
share, and manage vector datasets. It can support private projects for independent developers,
6+
internal collaborations for enterprises, and public contributions for data DAOs.
7+
8+
## Installation and Setup
9+
10+
```bash
11+
pip install betabageldb
12+
```
13+
14+
15+
## VectorStore
16+
17+
See a [usage example](/docs/integrations/vectorstores/bageldb).
18+
19+
```python
20+
from langchain.vectorstores import Bagel
21+
```
Lines changed: 300 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,300 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"metadata": {},
6+
"source": [
7+
"# BagelDB\n",
8+
"\n",
9+
"> [BagelDB](https://www.bageldb.ai/) (`Open Vector Database for AI`), is like GitHub for AI data.\n",
10+
"It is a collaborative platform where users can create,\n",
11+
"share, and manage vector datasets. It can support private projects for independent developers,\n",
12+
"internal collaborations for enterprises, and public contributions for data DAOs.\n",
13+
"\n",
14+
"### Installation and Setup\n",
15+
"\n",
16+
"```bash\n",
17+
"pip install betabageldb\n",
18+
"```\n",
19+
"\n"
20+
]
21+
},
22+
{
23+
"cell_type": "markdown",
24+
"metadata": {},
25+
"source": [
26+
"## Create VectorStore from texts"
27+
]
28+
},
29+
{
30+
"cell_type": "code",
31+
"execution_count": 9,
32+
"metadata": {},
33+
"outputs": [],
34+
"source": [
35+
"from langchain.vectorstores import Bagel\n",
36+
"\n",
37+
"texts = [\"hello bagel\", \"hello langchain\", \"I love salad\", \"my car\", \"a dog\"]\n",
38+
"# create cluster and add texts\n",
39+
"cluster = Bagel.from_texts(cluster_name=\"testing\", texts=texts)"
40+
]
41+
},
42+
{
43+
"cell_type": "code",
44+
"execution_count": 11,
45+
"metadata": {},
46+
"outputs": [
47+
{
48+
"data": {
49+
"text/plain": [
50+
"[Document(page_content='hello bagel', metadata={}),\n",
51+
" Document(page_content='my car', metadata={}),\n",
52+
" Document(page_content='I love salad', metadata={})]"
53+
]
54+
},
55+
"execution_count": 11,
56+
"metadata": {},
57+
"output_type": "execute_result"
58+
}
59+
],
60+
"source": [
61+
"# similarity search\n",
62+
"cluster.similarity_search(\"bagel\", k=3)"
63+
]
64+
},
65+
{
66+
"cell_type": "code",
67+
"execution_count": 12,
68+
"metadata": {},
69+
"outputs": [
70+
{
71+
"data": {
72+
"text/plain": [
73+
"[(Document(page_content='hello bagel', metadata={}), 0.27392977476119995),\n",
74+
" (Document(page_content='my car', metadata={}), 1.4783176183700562),\n",
75+
" (Document(page_content='I love salad', metadata={}), 1.5342965126037598)]"
76+
]
77+
},
78+
"execution_count": 12,
79+
"metadata": {},
80+
"output_type": "execute_result"
81+
}
82+
],
83+
"source": [
84+
"# the score is a distance metric, so lower is better\n",
85+
"cluster.similarity_search_with_score(\"bagel\", k=3)"
86+
]
87+
},
88+
{
89+
"cell_type": "code",
90+
"execution_count": 13,
91+
"metadata": {},
92+
"outputs": [],
93+
"source": [
94+
"# delete the cluster\n",
95+
"cluster.delete_cluster()"
96+
]
97+
},
98+
{
99+
"cell_type": "markdown",
100+
"metadata": {},
101+
"source": [
102+
"## Create VectorStore from docs"
103+
]
104+
},
105+
{
106+
"cell_type": "code",
107+
"execution_count": 33,
108+
"metadata": {},
109+
"outputs": [],
110+
"source": [
111+
"from langchain.document_loaders import TextLoader\n",
112+
"from langchain.text_splitter import CharacterTextSplitter\n",
113+
"\n",
114+
"loader = TextLoader(\"../../../state_of_the_union.txt\")\n",
115+
"documents = loader.load()\n",
116+
"text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\n",
117+
"docs = text_splitter.split_documents(documents)[:10]"
118+
]
119+
},
120+
{
121+
"cell_type": "code",
122+
"execution_count": 36,
123+
"metadata": {},
124+
"outputs": [],
125+
"source": [
126+
"# create cluster with docs\n",
127+
"cluster = Bagel.from_documents(cluster_name=\"testing_with_docs\", documents=docs)"
128+
]
129+
},
130+
{
131+
"cell_type": "code",
132+
"execution_count": 37,
133+
"metadata": {},
134+
"outputs": [
135+
{
136+
"name": "stdout",
137+
"output_type": "stream",
138+
"text": [
139+
"Madam Speaker, Madam Vice President, our First Lady and Second Gentleman. Members of Congress and the \n"
140+
]
141+
}
142+
],
143+
"source": [
144+
"# similarity search\n",
145+
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
146+
"docs = cluster.similarity_search(query)\n",
147+
"print(docs[0].page_content[:102])"
148+
]
149+
},
150+
{
151+
"cell_type": "markdown",
152+
"metadata": {},
153+
"source": [
154+
"## Get all text/doc from Cluster"
155+
]
156+
},
157+
{
158+
"cell_type": "code",
159+
"execution_count": 53,
160+
"metadata": {},
161+
"outputs": [],
162+
"source": [
163+
"texts = [\"hello bagel\", \"this is langchain\"]\n",
164+
"cluster = Bagel.from_texts(cluster_name=\"testing\", texts=texts)\n",
165+
"cluster_data = cluster.get()"
166+
]
167+
},
168+
{
169+
"cell_type": "code",
170+
"execution_count": 54,
171+
"metadata": {},
172+
"outputs": [
173+
{
174+
"data": {
175+
"text/plain": [
176+
"dict_keys(['ids', 'embeddings', 'metadatas', 'documents'])"
177+
]
178+
},
179+
"execution_count": 54,
180+
"metadata": {},
181+
"output_type": "execute_result"
182+
}
183+
],
184+
"source": [
185+
"# all keys\n",
186+
"cluster_data.keys()"
187+
]
188+
},
189+
{
190+
"cell_type": "code",
191+
"execution_count": 56,
192+
"metadata": {},
193+
"outputs": [
194+
{
195+
"data": {
196+
"text/plain": [
197+
"{'ids': ['578c6d24-3763-11ee-a8ab-b7b7b34f99ba',\n",
198+
" '578c6d25-3763-11ee-a8ab-b7b7b34f99ba',\n",
199+
" 'fb2fc7d8-3762-11ee-a8ab-b7b7b34f99ba',\n",
200+
" 'fb2fc7d9-3762-11ee-a8ab-b7b7b34f99ba',\n",
201+
" '6b40881a-3762-11ee-a8ab-b7b7b34f99ba',\n",
202+
" '6b40881b-3762-11ee-a8ab-b7b7b34f99ba',\n",
203+
" '581e691e-3762-11ee-a8ab-b7b7b34f99ba',\n",
204+
" '581e691f-3762-11ee-a8ab-b7b7b34f99ba'],\n",
205+
" 'embeddings': None,\n",
206+
" 'metadatas': [{}, {}, {}, {}, {}, {}, {}, {}],\n",
207+
" 'documents': ['hello bagel',\n",
208+
" 'this is langchain',\n",
209+
" 'hello bagel',\n",
210+
" 'this is langchain',\n",
211+
" 'hello bagel',\n",
212+
" 'this is langchain',\n",
213+
" 'hello bagel',\n",
214+
" 'this is langchain']}"
215+
]
216+
},
217+
"execution_count": 56,
218+
"metadata": {},
219+
"output_type": "execute_result"
220+
}
221+
],
222+
"source": [
223+
"# all values and keys\n",
224+
"cluster_data"
225+
]
226+
},
227+
{
228+
"cell_type": "code",
229+
"execution_count": 57,
230+
"metadata": {},
231+
"outputs": [],
232+
"source": [
233+
"cluster.delete_cluster()"
234+
]
235+
},
236+
{
237+
"cell_type": "markdown",
238+
"metadata": {},
239+
"source": [
240+
"## Create cluster with metadata & filter using metadata"
241+
]
242+
},
243+
{
244+
"cell_type": "code",
245+
"execution_count": 63,
246+
"metadata": {},
247+
"outputs": [
248+
{
249+
"data": {
250+
"text/plain": [
251+
"[(Document(page_content='hello bagel', metadata={'source': 'notion'}), 0.0)]"
252+
]
253+
},
254+
"execution_count": 63,
255+
"metadata": {},
256+
"output_type": "execute_result"
257+
}
258+
],
259+
"source": [
260+
"texts = [\"hello bagel\", \"this is langchain\"]\n",
261+
"metadatas = [{\"source\": \"notion\"}, {\"source\": \"google\"}]\n",
262+
"\n",
263+
"cluster = Bagel.from_texts(cluster_name=\"testing\", texts=texts, metadatas=metadatas)\n",
264+
"cluster.similarity_search_with_score(\"hello bagel\", where={\"source\": \"notion\"})"
265+
]
266+
},
267+
{
268+
"cell_type": "code",
269+
"execution_count": 64,
270+
"metadata": {},
271+
"outputs": [],
272+
"source": [
273+
"# delete the cluster\n",
274+
"cluster.delete_cluster()"
275+
]
276+
}
277+
],
278+
"metadata": {
279+
"kernelspec": {
280+
"display_name": "Python 3",
281+
"language": "python",
282+
"name": "python3"
283+
},
284+
"language_info": {
285+
"codemirror_mode": {
286+
"name": "ipython",
287+
"version": 3
288+
},
289+
"file_extension": ".py",
290+
"mimetype": "text/x-python",
291+
"name": "python",
292+
"nbconvert_exporter": "python",
293+
"pygments_lexer": "ipython3",
294+
"version": "3.10.12"
295+
},
296+
"orig_nbformat": 4
297+
},
298+
"nbformat": 4,
299+
"nbformat_minor": 2
300+
}

libs/langchain/poetry.lock

Lines changed: 2 additions & 2 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

libs/langchain/pyproject.toml

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -377,6 +377,11 @@ extended_testing = [
377377
"xata",
378378
"xmltodict",
379379
"betabageldb",
380+
"anthropic",
381+
]
382+
383+
scheduled_testing = [
384+
"openai",
380385
]
381386

382387
[tool.ruff]

libs/langchain/tests/integration_tests/vectorstores/test_bagel.py

Lines changed: 0 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -167,21 +167,3 @@ def test_bagel_update_document() -> None:
167167
docsearch.update_document(document_id=document_id, document=updated_doc)
168168
output = docsearch.similarity_search(updated_content, k=1)
169169
assert output == [Document(page_content=updated_content, metadata={"page": "0"})]
170-
171-
172-
def main() -> None:
173-
"""Bagel intigaration test"""
174-
test_similarity_search()
175-
test_bagel()
176-
test_with_metadatas()
177-
test_with_metadatas_with_scores()
178-
test_with_metadatas_with_scores_using_vector()
179-
test_search_filter()
180-
test_search_filter_with_scores()
181-
test_with_include_parameter()
182-
test_bagel_update_document()
183-
test_with_metadatas_with_scores_using_vector_embe()
184-
185-
186-
if __name__ == "__main__":
187-
main()

0 commit comments

Comments
 (0)