Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -20,9 +20,9 @@
"id": "60bb467d-861d-4b07-a48d-8e5aa177c969",
"metadata": {},
"source": [
"# Extraction\n",
"# Email Extraction\n",
"\n",
"Let's see how to evaluate an agent's ability to use tools."
"Let's examine how to evaluate an email extraction task"
]
},
{
Expand All @@ -45,75 +45,6 @@
"For this code to work, please configure LangSmith environment variables with your credentials."
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "3644d211-382e-41aa-b282-21b01d28fc35",
"metadata": {
"tags": []
},
"outputs": [
{
"data": {
"text/html": [
"<table>\n",
"<thead>\n",
"<tr><th>Name </th><th>Type </th><th>Dataset ID </th><th>Description </th></tr>\n",
"</thead>\n",
"<tbody>\n",
"<tr><td>Tool Usage - Typewriter (1 func)</td><td>ToolUsageTask </td><td>placeholder </td><td>Environment with a single function that accepts a single letter as input, and &quot;prints&quot; it on a piece of paper.\n",
"\n",
"The objective of this task is to evaluate the ability to use the provided tools to repeat a given input string.\n",
"\n",
"For example, if the string is &#x27;abc&#x27;, the tools &#x27;a&#x27;, &#x27;b&#x27;, and &#x27;c&#x27; must be invoked in that order.\n",
"\n",
"The dataset includes examples of varying difficulty. The difficulty is measured by the length of the string. </td></tr>\n",
"<tr><td>Tool Usage - Typewriter </td><td>ToolUsageTask </td><td>placeholder </td><td>Environment with 26 functions each representing a letter of the alphabet.\n",
"\n",
"In this variation of the typewriter task, there are 26 parameterless functions, where each function represents a letter of the alphabet (instead of a single function that takes a letter as an argument).\n",
"\n",
"The object is to evaluate the ability of use the functions to repeat the given string.\n",
"\n",
"For example, if the string is &#x27;abc&#x27;, the tools &#x27;a&#x27;, &#x27;b&#x27;, and &#x27;c&#x27; must be invoked in that order.\n",
"\n",
"The dataset includes examples of varying difficulty. The difficulty is measured by the length of the string. </td></tr>\n",
"<tr><td>Tool Usage - Relational Data </td><td>ToolUsageTask </td><td>e95d45da-aaa3-44b3-ba2b-7c15ff6e46f5 </td><td>Environment with fake data about users and their locations and favorite foods.\n",
"\n",
"The environment provides a set of tools that can be used to query the data.\n",
"\n",
"The objective of this task is to evaluate the ability to use the provided tools to answer questions about relational data.\n",
"\n",
"The dataset contains 21 examples of varying difficulty. The difficulty is measured by the number of tools that need to be used to answer the question.\n",
"\n",
"Each example is composed of a question, a reference answer, and information about the sequence in which tools should be used to answer the question.\n",
"\n",
"Success is measured by the ability to answer the question correctly, and efficiently. </td></tr>\n",
"<tr><td>Multiverse Math </td><td>ToolUsageTask </td><td>placeholder </td><td>An environment that contains a few basic math operations, but with altered results.\n",
"\n",
"For example, multiplication of 5*3 will be re-interpreted as 5*3*1.1. The basic operations retain some basic properties, such as commutativity, associativity, and distributivity; however, the results are different than expected.\n",
"\n",
"The objective of this task is to evaluate the ability to use the provided tools to solve simple math questions and ignore any innate knowledge about math. </td></tr>\n",
"<tr><td>Email Extraction </td><td>ExtractionTask</td><td>https://smith.langchain.com/public/36bdfe7d-3cd1-4b36-b957-d12d95810a2b/d</td><td>A dataset of 42 real emails deduped from a spam folder, with semantic HTML tags removed, as well as a script for initial extraction and formatting of other emails from an arbitrary .mbox file like the one exported by Gmail.\n",
"\n",
"Some additional cleanup of the data was done by hand after the initial pass.\n",
"\n",
"See https://github.com/jacoblee93/oss-model-extraction-evals. </td></tr>\n",
"</tbody>\n",
"</table>"
],
"text/plain": [
"Registry(tasks=[ToolUsageTask(name='Tool Usage - Typewriter (1 func)', dataset_id='placeholder', description='Environment with a single function that accepts a single letter as input, and \"prints\" it on a piece of paper.\\n\\nThe objective of this task is to evaluate the ability to use the provided tools to repeat a given input string.\\n\\nFor example, if the string is \\'abc\\', the tools \\'a\\', \\'b\\', and \\'c\\' must be invoked in that order.\\n\\nThe dataset includes examples of varying difficulty. The difficulty is measured by the length of the string.\\n', create_environment=<function get_environment at 0x7f34137f4820>, instructions=\"Repeat the given string by using the provided tools. Do not write anything else or provide any explanations. For example, if the string is 'abc', you must invoke the tools 'a', 'b', and 'c' in that order. Please invoke the function with a single letter at a time.\"), ToolUsageTask(name='Tool Usage - Typewriter', dataset_id='placeholder', description=\"Environment with 26 functions each representing a letter of the alphabet.\\n\\nIn this variation of the typewriter task, there are 26 parameterless functions, where each function represents a letter of the alphabet (instead of a single function that takes a letter as an argument).\\n\\nThe object is to evaluate the ability of use the functions to repeat the given string.\\n\\nFor example, if the string is 'abc', the tools 'a', 'b', and 'c' must be invoked in that order.\\n\\nThe dataset includes examples of varying difficulty. The difficulty is measured by the length of the string.\\n\", create_environment=<function get_environment at 0x7f3413805280>, instructions=\"Repeat the given string by using the provided tools. Do not write anything else or provide any explanations. For example, if the string is 'abc', you must invoke the tools 'a', 'b', and 'c' in that order. Please invoke the functions without any arguments.\"), ToolUsageTask(name='Tool Usage - Relational Data', dataset_id='e95d45da-aaa3-44b3-ba2b-7c15ff6e46f5', description='Environment with fake data about users and their locations and favorite foods.\\n\\nThe environment provides a set of tools that can be used to query the data.\\n\\nThe objective of this task is to evaluate the ability to use the provided tools to answer questions about relational data.\\n\\nThe dataset contains 21 examples of varying difficulty. The difficulty is measured by the number of tools that need to be used to answer the question.\\n\\nEach example is composed of a question, a reference answer, and information about the sequence in which tools should be used to answer the question.\\n\\nSuccess is measured by the ability to answer the question correctly, and efficiently.\\n', create_environment=<function get_environment at 0x7f34137f4e50>, instructions=\"Please answer the user's question by using the tools provided. Do not guess the answer. Keep in mind that entities like users,foods and locations have both a name and an ID, which are not the same.\"), ToolUsageTask(name='Multiverse Math', dataset_id='placeholder', description='An environment that contains a few basic math operations, but with altered results.\\n\\nFor example, multiplication of 5*3 will be re-interpreted as 5*3*1.1. The basic operations retain some basic properties, such as commutativity, associativity, and distributivity; however, the results are different than expected.\\n\\nThe objective of this task is to evaluate the ability to use the provided tools to solve simple math questions and ignore any innate knowledge about math.\\n', create_environment=<function get_environment at 0x7f3413805820>, instructions='You are requested to solve math questions in an alternate mathematical universe. The rules of association, commutativity, and distributivity still apply, but the operations have been altered to yield different results than expected. Solve the given math questions using the provided tools. Do not guess the answer.'), ExtractionTask(name='Email Extraction', dataset_id='https://smith.langchain.com/public/36bdfe7d-3cd1-4b36-b957-d12d95810a2b/d', description='A dataset of 42 real emails deduped from a spam folder, with semantic HTML tags removed, as well as a script for initial extraction and formatting of other emails from an arbitrary .mbox file like the one exported by Gmail.\\n\\nSome additional cleanup of the data was done by hand after the initial pass.\\n\\nSee https://github.com/jacoblee93/oss-model-extraction-evals.\\n ', schema=<class 'langchain_benchmarks.extraction.tasks.email_task.Email'>, instructions=ChatPromptTemplate(input_variables=['email'], messages=[SystemMessagePromptTemplate(prompt=PromptTemplate(input_variables=[], template='You are an expert researcher.')), HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['email'], template='What can you tell me about the following email? Make sure to extract the question in the correct format. Here is the email:\\n ```\\n{email}\\n```'))]))])"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"registry"
]
},
{
"cell_type": "code",
"execution_count": 5,
Expand Down
Original file line number Diff line number Diff line change
@@ -1,25 +1,10 @@
{
"cells": [
{
"cell_type": "code",
"execution_count": 1,
"id": "ac938479-e489-4b04-80d9-b1b550154122",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"%load_ext autoreload\n",
"%autoreload 2"
]
},
{
"cell_type": "markdown",
"id": "60bb467d-861d-4b07-a48d-8e5aa177c969",
"metadata": {
"tags": [
"remove-cell"
]
"tags": []
},
"source": [
"# Multiverse Math\n",
Expand All @@ -37,6 +22,21 @@
" Please note that the modified operations are not guaranteed to even make sense in the real world since not all properties will be retained (e.g., distributive property)."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "8df805c7-02b2-4c59-8b15-507015f5a284",
"metadata": {
"tags": [
"remove-cell"
]
},
"outputs": [],
"source": [
"%load_ext autoreload\n",
"%autoreload 2"
]
},
{
"cell_type": "code",
"execution_count": 18,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
"id": "60bb467d-861d-4b07-a48d-8e5aa177c969",
"metadata": {},
"source": [
"# Tool Usage\n",
"# Relational Data \n",
"\n",
"Let's see how to evaluate an agent's ability to use tools."
]
Expand Down Expand Up @@ -96,8 +96,7 @@
"\n",
"Each example is composed of a question, a reference answer, and information about the sequence in which tools should be used to answer the question.\n",
"\n",
"Success is measured by the ability to answer the question correctly, and efficiently.\n",
"\n"
"Success is measured by the ability to answer the question correctly, and efficiently.\n"
]
}
],
Expand Down Expand Up @@ -1157,7 +1156,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.2"
"version": "3.9.6"
}
},
"nbformat": 4,
Expand Down
Original file line number Diff line number Diff line change
@@ -1,25 +1,10 @@
{
"cells": [
{
"cell_type": "code",
"execution_count": 1,
"id": "ac938479-e489-4b04-80d9-b1b550154122",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"%load_ext autoreload\n",
"%autoreload 2"
]
},
{
"cell_type": "markdown",
"id": "60bb467d-861d-4b07-a48d-8e5aa177c969",
"metadata": {
"tags": [
"remove-cell"
]
"tags": []
},
"source": [
"# Typewriter: Single Tool\n",
Expand All @@ -32,6 +17,21 @@
" that takes a letter as an argument."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "8d4ef88c-f478-4cb9-8fa9-f30eff422558",
"metadata": {
"tags": [
"remove-cell"
]
},
"outputs": [],
"source": [
"%load_ext autoreload\n",
"%autoreload 2"
]
},
{
"cell_type": "code",
"execution_count": 2,
Expand Down
Original file line number Diff line number Diff line change
@@ -1,25 +1,10 @@
{
"cells": [
{
"cell_type": "code",
"execution_count": 1,
"id": "ac938479-e489-4b04-80d9-b1b550154122",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"%load_ext autoreload\n",
"%autoreload 2"
]
},
{
"cell_type": "markdown",
"id": "60bb467d-861d-4b07-a48d-8e5aa177c969",
"metadata": {
"tags": [
"remove-cell"
]
"tags": []
},
"source": [
"# Typewriter: 26 Tools\n",
Expand All @@ -32,6 +17,17 @@
" each representing a letter of the alphabet."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "ce1f4cc3-4160-43b5-8822-b8da25988a86",
"metadata": {},
"outputs": [],
"source": [
"%load_ext autoreload\n",
"%autoreload 2"
]
},
{
"cell_type": "code",
"execution_count": 2,
Expand Down
31 changes: 28 additions & 3 deletions docs/source/toc.segment
Original file line number Diff line number Diff line change
@@ -1,8 +1,33 @@
```{toctree}
:maxdepth: 2
:caption: Introduction

./notebooks/datasets
```


```{toctree}
:maxdepth: 2
:caption: Tool Usage

./notebooks/tool_usage/relational_data
./notebooks/tool_usage/multiverse_math
./notebooks/tool_usage/typewriter_1
./notebooks/tool_usage/typewriter_26
```

```{toctree}
:maxdepth: 2
:caption: Extraction

./notebooks/extraction/email
```

```{toctree}
:maxdepth: 2
:caption: Contents
:glob:
:caption: RAG

./notebooks/*
./notebooks/rag_langchain_docs
./notebooks/rag_semi_structured
./notebooks/rag_evaluations
```