langchain-ai · hinthornw · Nov 21, 2023 · Nov 21, 2023 · Nov 21, 2023
diff --git a/docs/source/notebooks/rag_evaluations.ipynb → ...ooks/retrieval/comparing_techniques.ipynb b/docs/source/notebooks/rag_evaluations.ipynb → ...ooks/retrieval/comparing_techniques.ipynb
diff --git a/docs/source/notebooks/retrieval/intro.ipynb b/docs/source/notebooks/retrieval/intro.ipynb
@@ -0,0 +1,111 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Retrieval Tasks\n",
+    "\n",
+    "These tasks are meant to test retrieval-augmented generation (RAG) architectures on various datasets.\n",
+    "\n",
+    "### Task resources\n",
+    "\n",
+    "Each retrieval task provides a few helper functions you can use to configure your pipeline.\n",
+    "\n",
+    "- `get_docs: callable` - fetches the original `Document` objects from the cache. Each task may provide configurable parameters you can use to define how the original documents are fetched.\n",
+    "- `retriever_factories: Dict[str, callable]` - define some configurable pipelines you can use to transform the documents, embed them, and add them to a vectorstore (or other retriever object) for downstream use. They use LangChain's caching `index` API so you don't have to re-index for every evaluation. For custom transformations, we ask that you provide a `transformation_name` to isolate the cache and vectorstore namespace. Currently (2023/11/21) these all use Chroma as a vectorstore, but you can swap this out\n",
+    "- `chain_factories: Dict[str, callable]` - define some off-the-shelf architectures you can configure to evaluate.\n",
+    "\n",
+    "When evaluating, you don't have to use any of these factory methods. You can instead define your own custom architecture or ETl pipeline before evaluating. They are meant to facilitate evaluations and comparisons for specific design decisions.\n",
+    "\n",
+    "### Dataset schema\n",
+    "\n",
+    "Each task corresponds to a LangSmith dataset with the following schema:\n",
+    "\n",
+    "Inputs:\n",
+    "- question: str - the user question\n",
+    "\n",
+    "Outputs\n",
+    "- answer: str - the expected answer"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Supported benchmark tasks\n",
+    "\n",
+    "You can check an up-to-date list of retrieval tasks in the registry:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<table>\n",
+       "<thead>\n",
+       "<tr><th>Name                    </th><th>Type         </th><th>Dataset ID                                                                                                                                                 </th><th>Description  </th></tr>\n",
+       "</thead>\n",
+       "<tbody>\n",
+       "<tr><td>LangChain Docs Q&A      </td><td>RetrievalTask</td><td><a href=\"https://smith.langchain.com/public/452ccafc-18e1-4314-885b-edd735f17b9d/d\" target=\"_blank\" rel=\"noopener\">452ccafc-18e1-4314-885b-edd735f17b9d</a></td><td>Questions and answers based on a snapshot of the LangChain python docs.\n",
+       "\n",
+       "The environment provides the documents and the retriever information.\n",
+       "\n",
+       "Each example is composed of a question and reference answer.\n",
+       "\n",
+       "Success is measured based on the accuracy of the answer relative to the reference answer.\n",
+       "We also measure the faithfulness of the model's response relative to the retrieved documents (if any).              </td></tr>\n",
+       "<tr><td>Semi-structured Earnings</td><td>RetrievalTask</td><td><a href=\"https://smith.langchain.com/public/c47d9617-ab99-4d6e-a6e6-92b8daf85a7d/d\" target=\"_blank\" rel=\"noopener\">c47d9617-ab99-4d6e-a6e6-92b8daf85a7d</a></td><td>Questions and answers based on PDFs containing tables and charts.\n",
+       "\n",
+       "The task provides the raw documents as well as factory methods to easily index them\n",
+       "and create a retriever.\n",
+       "\n",
+       "Each example is composed of a question and reference answer.\n",
+       "\n",
+       "Success is measured based on the accuracy of the answer relative to the reference answer.\n",
+       "We also measure the faithfulness of the model's response relative to the retrieved documents (if any).              </td></tr>\n",
+       "</tbody>\n",
+       "</table>"
+      ],
+      "text/plain": [
+       "Registry(tasks=[RetrievalTask(name='LangChain Docs Q&A', dataset_id='https://smith.langchain.com/public/452ccafc-18e1-4314-885b-edd735f17b9d/d', description=\"Questions and answers based on a snapshot of the LangChain python docs.\\n\\nThe environment provides the documents and the retriever information.\\n\\nEach example is composed of a question and reference answer.\\n\\nSuccess is measured based on the accuracy of the answer relative to the reference answer.\\nWe also measure the faithfulness of the model's response relative to the retrieved documents (if any).\\n\", retriever_factories={'basic': <function _chroma_retriever_factory at 0x138367ec0>, 'parent-doc': <function _chroma_parent_document_retriever_factory at 0x138367f60>, 'hyde': <function _chroma_hyde_retriever_factory at 0x13838c040>}, architecture_factories={'conversational-retrieval-qa': <function default_response_chain at 0x11fa74fe0>}, get_docs=<function load_cached_docs at 0x101bfb240>), RetrievalTask(name='Semi-structured Earnings', dataset_id='https://smith.langchain.com/public/c47d9617-ab99-4d6e-a6e6-92b8daf85a7d/d', description=\"Questions and answers based on PDFs containing tables and charts.\\n\\nThe task provides the raw documents as well as factory methods to easily index them\\nand create a retriever.\\n\\nEach example is composed of a question and reference answer.\\n\\nSuccess is measured based on the accuracy of the answer relative to the reference answer.\\nWe also measure the faithfulness of the model's response relative to the retrieved documents (if any).\\n\", retriever_factories={'basic': <function _chroma_retriever_factory at 0x13838c5e0>, 'parent-doc': <function _chroma_parent_document_retriever_factory at 0x13838c680>, 'hyde': <function _chroma_hyde_retriever_factory at 0x13838c720>}, architecture_factories={}, get_docs=<function load_docs at 0x13838c540>)])"
+      ]
+     },
+     "execution_count": 1,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "from langchain_benchmarks import registry\n",
+    "\n",
+    "registry.filter(Type=\"RetrievalTask\")"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.11.2"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}
diff --git a/...source/notebooks/rag_langchain_docs.ipynb → ...tebooks/retrieval/langchain_docs_qa.ipynb b/...source/notebooks/rag_langchain_docs.ipynb → ...tebooks/retrieval/langchain_docs_qa.ipynb
@@ -16,19 +16,6 @@
     "We will be using LangSmith to capture the evaluation traces. You can make a free account at [smith.langchain.com](https://smith.langchain.com/). Once you've done so, you can make an API key and set it below."
    ]
   },
-  {
-   "cell_type": "code",
-   "execution_count": 1,
-   "id": "8397dde8-cfde-4f98-b331-e6dab0618f97",
-   "metadata": {
-    "tags": []
-   },
-   "outputs": [],
-   "source": [
-    "%load_ext autoreload\n",
-    "%autoreload 2"
-   ]
-  },
   {
    "cell_type": "code",
    "execution_count": null,
@@ -59,8 +46,8 @@
    "source": [
     "import os\n",
     "\n",
-    "# os.environ[\"LANGCHAIN_ENDPOINT\"] = \"https://api.smith.langchain.com\"\n",
-    "# os.environ[\"LANGCHAIN_API_KEY\"] = \"sk-...\"  # Your API key\n",
+    "os.environ[\"LANGCHAIN_ENDPOINT\"] = \"https://api.smith.langchain.com\"\n",
+    "os.environ[\"LANGCHAIN_API_KEY\"] = \"sk-...\"  # Your API key\n",
     "\n",
     "# # Silence warnings from HuggingFace\n",
     "os.environ[\"TOKENIZERS_PARALLELISM\"] = \"false\""

diff --git a/...ource/notebooks/rag_semi_structured.ipynb → ...notebooks/retrieval/semi_structured.ipynb b/...ource/notebooks/rag_semi_structured.ipynb → ...notebooks/retrieval/semi_structured.ipynb
diff --git a/docs/source/toc.segment b/docs/source/toc.segment
@@ -28,7 +28,7 @@
 :maxdepth: 2
 :caption: RAG
 
-./notebooks/rag_langchain_docs
-./notebooks/rag_semi_structured
-./notebooks/rag_evaluations
+./notebooks/retrieval/langchain_docs_qa
+./notebooks/retrieval/semi_structured
+./notebooks/retrieval/comparing_techniques
 ```
diff --git a/langchain_benchmarks/rag/tasks/langchain_docs/task.py b/langchain_benchmarks/rag/tasks/langchain_docs/task.py
@@ -9,9 +9,8 @@
 )
 from langchain_benchmarks.schema import RetrievalTask
 
-DATASET_ID = (
-    "452ccafc-18e1-4314-885b-edd735f17b9d"  # ID of public LangChain Docs dataset
-)
+# URL of public LangChain Docs dataset
+DATASET_ID = "https://smith.langchain.com/public/452ccafc-18e1-4314-885b-edd735f17b9d/d"
 
 
 def load_cached_docs() -> Iterable[Document]:

diff --git a/langchain_benchmarks/rag/tasks/semi_structured_earnings/task.py b/langchain_benchmarks/rag/tasks/semi_structured_earnings/task.py
@@ -6,7 +6,8 @@
 )
 from langchain_benchmarks.schema import RetrievalTask
 
-DATASET_ID = "c47d9617-ab99-4d6e-a6e6-92b8daf85a7d"  # ID of public Semi-structured Earnings dataset
+# ID of public Semi-structured Earnings dataset
+DATASET_ID = "https://smith.langchain.com/public/c47d9617-ab99-4d6e-a6e6-92b8daf85a7d/d"
 
 SEMI_STRUCTURED_EARNINGS_TASK = RetrievalTask(
     name="Semi-structured Earnings",

diff --git a/langchain_benchmarks/schema.py b/langchain_benchmarks/schema.py
@@ -2,7 +2,7 @@
 from __future__ import annotations
 
 import dataclasses
-import inspect
+import urllib
 from typing import Any, Callable, Dict, Iterable, List, Optional, Type, Union
 
 from langchain.prompts import ChatPromptTemplate
@@ -46,21 +46,37 @@ class BaseTask:
     etc.
     """
 
+    @property
+    def _dataset_link(self) -> str:
+        """Return a link to the dataset."""
+        dataset_url = (
+            self.dataset_id
+            if self.dataset_id.startswith("http")
+            else f"https://smith.langchain.com/public/{self.dataset_id}/d"
+        )
+        parsed_url = urllib.parse.urlparse(dataset_url)
+        # Extract the UUID from the path
+        path_parts = parsed_url.path.split("/")
+        token_uuid = path_parts[-2] if len(path_parts) >= 2 else "Link"
+        return (
+            f'<a href="{dataset_url}" target="_blank" rel="noopener">{token_uuid}</a>'
+        )
+
     @property
     def _table(self) -> List[List[str]]:
         """Return a table representation of the environment."""
         return [
             ["Name", self.name],
             ["Type", self.__class__.__name__],
-            ["Dataset ID", self.dataset_id],
+            ["Dataset ID", self._dataset_link],
             ["Description", self.description],
         ]
 
     def _repr_html_(self) -> str:
         """Return an HTML representation of the environment."""
         return tabulate(
             self._table,
-            tablefmt="html",
+            tablefmt="unsafehtml",
         )
 
 
@@ -146,12 +162,12 @@ def _repr_html_(self) -> str:
             [
                 task.name,
                 task.__class__.__name__,
-                task.dataset_id,
+                task._dataset_link,
                 task.description,
             ]
             for task in self.tasks
         ]
-        return tabulate(table, headers=headers, tablefmt="html")
+        return tabulate(table, headers=headers, tablefmt="unsafehtml")
 
     def filter(
         self,