Skip to content

Commit 1b23215

Browse files
authored
Merge pull request #13 from mongodb-developer/rag_local_load
Local dataset loading in RAG lab
2 parents 1f70f48 + 1766eeb commit 1b23215

File tree

2 files changed

+7
-10
lines changed

2 files changed

+7
-10
lines changed

notebooks/ai-rag-lab.ipynb

Lines changed: 7 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -51,7 +51,7 @@
5151
"cell_type": "markdown",
5252
"metadata": {},
5353
"source": [
54-
"# Step 2: Download the dataset"
54+
"# Step 2: Load the dataset"
5555
]
5656
},
5757
{
@@ -60,9 +60,7 @@
6060
"metadata": {},
6161
"outputs": [],
6262
"source": [
63-
"# You may see a warning upon running this cell. You can ignore it.\n",
64-
"import pandas as pd\n",
65-
"from datasets import load_dataset"
63+
"import json"
6664
]
6765
},
6866
{
@@ -71,10 +69,10 @@
7169
"metadata": {},
7270
"outputs": [],
7371
"source": [
74-
"# Download the `mongodb-docs` dataset from Hugging Face\n",
75-
"data = load_dataset(\"mongodb/mongodb-docs\", split=\"train\")\n",
76-
"# Convert the dataset into a dataframe first, then into a list of Python objects/dictionaries\n",
77-
"docs = pd.DataFrame(data).to_dict(\"records\")"
72+
"with open(\"../data/mongodb_docs.json\", \"r\") as data_file:\n",
73+
" json_data = data_file.read()\n",
74+
"\n",
75+
"docs = json.loads(json_data)"
7876
]
7977
},
8078
{
@@ -177,7 +175,7 @@
177175
" chunked_data = []\n",
178176
" for chunk in chunks:\n",
179177
" temp = doc.copy()\n",
180-
" temp[text_field]=chunk\n",
178+
" temp[text_field] = chunk\n",
181179
" chunked_data.append(temp)\n",
182180
"\n",
183181
" return chunked_data"

requirements.txt

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,4 @@
11
pymongo==4.11.3
2-
datasets==3.6.0
32
langchain==0.3.25
43
langchain-aws==0.2.22
54
langchain-google-genai==2.1.4

0 commit comments

Comments
 (0)