Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
111 changes: 111 additions & 0 deletions docs/source/notebooks/retrieval/intro.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,111 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Retrieval Tasks\n",
"\n",
"These tasks are meant to test retrieval-augmented generation (RAG) architectures on various datasets.\n",
"\n",
"### Task resources\n",
"\n",
"Each retrieval task provides a few helper functions you can use to configure your pipeline.\n",
"\n",
"- `get_docs: callable` - fetches the original `Document` objects from the cache. Each task may provide configurable parameters you can use to define how the original documents are fetched.\n",
"- `retriever_factories: Dict[str, callable]` - define some configurable pipelines you can use to transform the documents, embed them, and add them to a vectorstore (or other retriever object) for downstream use. They use LangChain's caching `index` API so you don't have to re-index for every evaluation. For custom transformations, we ask that you provide a `transformation_name` to isolate the cache and vectorstore namespace. Currently (2023/11/21) these all use Chroma as a vectorstore, but you can swap this out\n",
"- `chain_factories: Dict[str, callable]` - define some off-the-shelf architectures you can configure to evaluate.\n",
"\n",
"When evaluating, you don't have to use any of these factory methods. You can instead define your own custom architecture or ETl pipeline before evaluating. They are meant to facilitate evaluations and comparisons for specific design decisions.\n",
"\n",
"### Dataset schema\n",
"\n",
"Each task corresponds to a LangSmith dataset with the following schema:\n",
"\n",
"Inputs:\n",
"- question: str - the user question\n",
"\n",
"Outputs\n",
"- answer: str - the expected answer"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Supported benchmark tasks\n",
"\n",
"You can check an up-to-date list of retrieval tasks in the registry:"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<table>\n",
"<thead>\n",
"<tr><th>Name </th><th>Type </th><th>Dataset ID </th><th>Description </th></tr>\n",
"</thead>\n",
"<tbody>\n",
"<tr><td>LangChain Docs Q&A </td><td>RetrievalTask</td><td><a href=\"https://smith.langchain.com/public/452ccafc-18e1-4314-885b-edd735f17b9d/d\" target=\"_blank\" rel=\"noopener\">452ccafc-18e1-4314-885b-edd735f17b9d</a></td><td>Questions and answers based on a snapshot of the LangChain python docs.\n",
"\n",
"The environment provides the documents and the retriever information.\n",
"\n",
"Each example is composed of a question and reference answer.\n",
"\n",
"Success is measured based on the accuracy of the answer relative to the reference answer.\n",
"We also measure the faithfulness of the model's response relative to the retrieved documents (if any). </td></tr>\n",
"<tr><td>Semi-structured Earnings</td><td>RetrievalTask</td><td><a href=\"https://smith.langchain.com/public/c47d9617-ab99-4d6e-a6e6-92b8daf85a7d/d\" target=\"_blank\" rel=\"noopener\">c47d9617-ab99-4d6e-a6e6-92b8daf85a7d</a></td><td>Questions and answers based on PDFs containing tables and charts.\n",
"\n",
"The task provides the raw documents as well as factory methods to easily index them\n",
"and create a retriever.\n",
"\n",
"Each example is composed of a question and reference answer.\n",
"\n",
"Success is measured based on the accuracy of the answer relative to the reference answer.\n",
"We also measure the faithfulness of the model's response relative to the retrieved documents (if any). </td></tr>\n",
"</tbody>\n",
"</table>"
],
"text/plain": [
"Registry(tasks=[RetrievalTask(name='LangChain Docs Q&A', dataset_id='https://smith.langchain.com/public/452ccafc-18e1-4314-885b-edd735f17b9d/d', description=\"Questions and answers based on a snapshot of the LangChain python docs.\\n\\nThe environment provides the documents and the retriever information.\\n\\nEach example is composed of a question and reference answer.\\n\\nSuccess is measured based on the accuracy of the answer relative to the reference answer.\\nWe also measure the faithfulness of the model's response relative to the retrieved documents (if any).\\n\", retriever_factories={'basic': <function _chroma_retriever_factory at 0x138367ec0>, 'parent-doc': <function _chroma_parent_document_retriever_factory at 0x138367f60>, 'hyde': <function _chroma_hyde_retriever_factory at 0x13838c040>}, architecture_factories={'conversational-retrieval-qa': <function default_response_chain at 0x11fa74fe0>}, get_docs=<function load_cached_docs at 0x101bfb240>), RetrievalTask(name='Semi-structured Earnings', dataset_id='https://smith.langchain.com/public/c47d9617-ab99-4d6e-a6e6-92b8daf85a7d/d', description=\"Questions and answers based on PDFs containing tables and charts.\\n\\nThe task provides the raw documents as well as factory methods to easily index them\\nand create a retriever.\\n\\nEach example is composed of a question and reference answer.\\n\\nSuccess is measured based on the accuracy of the answer relative to the reference answer.\\nWe also measure the faithfulness of the model's response relative to the retrieved documents (if any).\\n\", retriever_factories={'basic': <function _chroma_retriever_factory at 0x13838c5e0>, 'parent-doc': <function _chroma_parent_document_retriever_factory at 0x13838c680>, 'hyde': <function _chroma_hyde_retriever_factory at 0x13838c720>}, architecture_factories={}, get_docs=<function load_docs at 0x13838c540>)])"
]
},
"execution_count": 1,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from langchain_benchmarks import registry\n",
"\n",
"registry.filter(Type=\"RetrievalTask\")"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.2"
}
},
"nbformat": 4,
"nbformat_minor": 4
}
Original file line number Diff line number Diff line change
Expand Up @@ -16,19 +16,6 @@
"We will be using LangSmith to capture the evaluation traces. You can make a free account at [smith.langchain.com](https://smith.langchain.com/). Once you've done so, you can make an API key and set it below."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "8397dde8-cfde-4f98-b331-e6dab0618f97",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"%load_ext autoreload\n",
"%autoreload 2"
]
},
{
"cell_type": "code",
"execution_count": null,
Expand Down Expand Up @@ -59,8 +46,8 @@
"source": [
"import os\n",
"\n",
"# os.environ[\"LANGCHAIN_ENDPOINT\"] = \"https://api.smith.langchain.com\"\n",
"# os.environ[\"LANGCHAIN_API_KEY\"] = \"sk-...\" # Your API key\n",
"os.environ[\"LANGCHAIN_ENDPOINT\"] = \"https://api.smith.langchain.com\"\n",
"os.environ[\"LANGCHAIN_API_KEY\"] = \"sk-...\" # Your API key\n",
"\n",
"# # Silence warnings from HuggingFace\n",
"os.environ[\"TOKENIZERS_PARALLELISM\"] = \"false\""
Expand Down
6 changes: 3 additions & 3 deletions docs/source/toc.segment
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@
:maxdepth: 2
:caption: RAG

./notebooks/rag_langchain_docs
./notebooks/rag_semi_structured
./notebooks/rag_evaluations
./notebooks/retrieval/langchain_docs_qa
./notebooks/retrieval/semi_structured
./notebooks/retrieval/comparing_techniques
```
5 changes: 2 additions & 3 deletions langchain_benchmarks/rag/tasks/langchain_docs/task.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,9 +9,8 @@
)
from langchain_benchmarks.schema import RetrievalTask

DATASET_ID = (
"452ccafc-18e1-4314-885b-edd735f17b9d" # ID of public LangChain Docs dataset
)
# URL of public LangChain Docs dataset
DATASET_ID = "https://smith.langchain.com/public/452ccafc-18e1-4314-885b-edd735f17b9d/d"


def load_cached_docs() -> Iterable[Document]:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,8 @@
)
from langchain_benchmarks.schema import RetrievalTask

DATASET_ID = "c47d9617-ab99-4d6e-a6e6-92b8daf85a7d" # ID of public Semi-structured Earnings dataset
# ID of public Semi-structured Earnings dataset
DATASET_ID = "https://smith.langchain.com/public/c47d9617-ab99-4d6e-a6e6-92b8daf85a7d/d"

SEMI_STRUCTURED_EARNINGS_TASK = RetrievalTask(
name="Semi-structured Earnings",
Expand Down
26 changes: 21 additions & 5 deletions langchain_benchmarks/schema.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
from __future__ import annotations

import dataclasses
import inspect
import urllib
from typing import Any, Callable, Dict, Iterable, List, Optional, Type, Union

from langchain.prompts import ChatPromptTemplate
Expand Down Expand Up @@ -46,21 +46,37 @@ class BaseTask:
etc.
"""

@property
def _dataset_link(self) -> str:
"""Return a link to the dataset."""
dataset_url = (
self.dataset_id
if self.dataset_id.startswith("http")
else f"https://smith.langchain.com/public/{self.dataset_id}/d"
)
parsed_url = urllib.parse.urlparse(dataset_url)
# Extract the UUID from the path
path_parts = parsed_url.path.split("/")
token_uuid = path_parts[-2] if len(path_parts) >= 2 else "Link"
return (
f'<a href="{dataset_url}" target="_blank" rel="noopener">{token_uuid}</a>'
)

@property
def _table(self) -> List[List[str]]:
"""Return a table representation of the environment."""
return [
["Name", self.name],
["Type", self.__class__.__name__],
["Dataset ID", self.dataset_id],
["Dataset ID", self._dataset_link],
["Description", self.description],
]

def _repr_html_(self) -> str:
"""Return an HTML representation of the environment."""
return tabulate(
self._table,
tablefmt="html",
tablefmt="unsafehtml",
)


Expand Down Expand Up @@ -146,12 +162,12 @@ def _repr_html_(self) -> str:
[
task.name,
task.__class__.__name__,
task.dataset_id,
task._dataset_link,
task.description,
]
for task in self.tasks
]
return tabulate(table, headers=headers, tablefmt="html")
return tabulate(table, headers=headers, tablefmt="unsafehtml")

def filter(
self,
Expand Down