-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
core[minor]: Add LangSmith doc loader (#6568)
* core[minor]: Add LangSmith doc loader * tests and nits * add test and docs * rename
- Loading branch information
1 parent
af25a44
commit 465bbfb
Showing
6 changed files
with
569 additions
and
0 deletions.
There are no files selected for viewing
302 changes: 302 additions & 0 deletions
302
docs/core_docs/docs/integrations/document_loaders/web_loaders/langsmith.ipynb
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,302 @@ | ||
{ | ||
"cells": [ | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"---\n", | ||
"sidebar_label: FireCrawl\n", | ||
"\n", | ||
"---" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"# LangSmithLoader\n", | ||
"\n", | ||
"This notebook provides a quick overview for getting started with the [LangSmithLoader](/docs/integrations/document_loaders/). For detailed documentation of all `LangSmithLoader` features and configurations head to the [API reference](https://api.js.langchain.com/classes/_langchain_core.document_loaders_langsmith.LangSmithLoader.html).\n", | ||
"\n", | ||
"## Overview\n", | ||
"### Integration details\n", | ||
"\n", | ||
"| Class | Package | Local | Serializable | [PY support](https://python.langchain.com/docs/integrations/document_loaders/langsmith)|\n", | ||
"| :--- | :--- | :---: | :---: | :---: |\n", | ||
"| [LangSmithLoader](https://api.js.langchain.com/classes/_langchain_core.document_loaders_langsmith.LangSmithLoader.html) | [@langchain/community](https://api.js.langchain.com/classes/_langchain_core.html) | ✅ | beta | ✅ | \n", | ||
"### Loader features\n", | ||
"| Source | Web Loader | Node Envs Only\n", | ||
"| :---: | :---: | :---: | \n", | ||
"| LangSmithLoader | ✅ | ❌ | \n", | ||
"\n", | ||
"[FireCrawl](https://firecrawl.dev) crawls and convert any website into LLM-ready data. It crawls all accessible sub-pages and give you clean markdown and metadata for each. No sitemap required.\n", | ||
"\n", | ||
"FireCrawl handles complex tasks such as reverse proxies, caching, rate limits, and content blocked by JavaScript. Built by the [mendable.ai](https://mendable.ai) team.\n", | ||
"\n", | ||
"This guide shows how to scrap and crawl entire websites and load them using the `LangSmithLoader` in LangChain.\n", | ||
"\n", | ||
"## Setup\n", | ||
"\n", | ||
"To access the LangSmith document loader you'll need to install `@langchain/core`, create a [LangSmith](https://langsmith.com/) account and get an API key.\n", | ||
"\n", | ||
"### Credentials\n", | ||
"\n", | ||
"Sign up at https://langsmith.com and generate an API key. Once you've done this set the `LANGSMITH_API_KEY` environment variable:\n", | ||
"\n", | ||
"```bash\n", | ||
"export LANGSMITH_API_KEY=\"your-api-key\"\n", | ||
"```\n", | ||
"\n", | ||
"### Installation\n", | ||
"\n", | ||
"The `LangSmithLoader` integration lives in the `@langchain/core` package:\n", | ||
"\n", | ||
"```{=mdx}\n", | ||
"import IntegrationInstallTooltip from \"@mdx_components/integration_install_tooltip.mdx\";\n", | ||
"import Npm2Yarn from \"@theme/Npm2Yarn\";\n", | ||
"\n", | ||
"<IntegrationInstallTooltip></IntegrationInstallTooltip>\n", | ||
"\n", | ||
"<Npm2Yarn>\n", | ||
" @langchain/core\n", | ||
"</Npm2Yarn>\n", | ||
"\n", | ||
"```" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"## Create example dataset\n", | ||
"\n", | ||
"For this example, we'll create a new dataset which we'll use in our document loader." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 5, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"import { Client as LangSmithClient } from 'langsmith';\n", | ||
"import { faker } from \"@faker-js/faker\";\n", | ||
"\n", | ||
"const lsClient = new LangSmithClient();\n", | ||
"\n", | ||
"const datasetName = \"LangSmith Few Shot Datasets Notebook\";\n", | ||
"\n", | ||
"const exampleInputs = Array.from({ length: 10 }, (_, i) => ({\n", | ||
" input: faker.lorem.paragraph(),\n", | ||
"}));\n", | ||
"const exampleOutputs = Array.from({ length: 10 }, (_, i) => ({\n", | ||
" output: faker.lorem.sentence(),\n", | ||
"}));\n", | ||
"const exampleMetadata = Array.from({ length: 10 }, (_, i) => ({\n", | ||
" companyCatchPhrase: faker.company.catchPhrase(),\n", | ||
"}));\n", | ||
"\n", | ||
"await lsClient.deleteDataset({\n", | ||
" datasetName,\n", | ||
"})\n", | ||
"\n", | ||
"const dataset = await lsClient.createDataset(datasetName);\n", | ||
"\n", | ||
"const examples = await lsClient.createExamples({\n", | ||
" inputs: exampleInputs,\n", | ||
" outputs: exampleOutputs,\n", | ||
" metadata: exampleMetadata,\n", | ||
" datasetId: dataset.id,\n", | ||
"});" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 7, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"import { LangSmithLoader } from \"@langchain/core/document_loaders/langsmith\"\n", | ||
"\n", | ||
"const loader = new LangSmithLoader({\n", | ||
" datasetName: \"LangSmith Few Shot Datasets Notebook\",\n", | ||
" // Instead of a datasetName, you can alternatively provide a datasetId\n", | ||
" // datasetId: dataset.id,\n", | ||
" contentKey: \"input\",\n", | ||
" limit: 5,\n", | ||
" // formatContent: (content) => content,\n", | ||
" // ... other options\n", | ||
"})" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"## Load" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 8, | ||
"metadata": {}, | ||
"outputs": [ | ||
{ | ||
"name": "stdout", | ||
"output_type": "stream", | ||
"text": [ | ||
"{\n", | ||
" pageContent: 'Conventus supellex aegrotatio termes. Vapulus abscido ubi vita coadunatio modi crapula comparo caecus. Acervus voluptate tergeo pariatur conor argumentum inventore vomito stella.',\n", | ||
" metadata: {\n", | ||
" id: 'f1a04800-6f7a-4232-9743-fb5d9029bf1f',\n", | ||
" created_at: '2024-08-20T17:01:38.984045+00:00',\n", | ||
" modified_at: '2024-08-20T17:01:38.984045+00:00',\n", | ||
" name: '#f1a0 @ LangSmith Few Shot Datasets Notebook',\n", | ||
" dataset_id: '9ccd66e6-e506-478c-9095-3d9e27575a89',\n", | ||
" source_run_id: null,\n", | ||
" metadata: {\n", | ||
" dataset_split: [Array],\n", | ||
" companyCatchPhrase: 'Integrated solution-oriented secured line'\n", | ||
" },\n", | ||
" inputs: {\n", | ||
" input: 'Conventus supellex aegrotatio termes. Vapulus abscido ubi vita coadunatio modi crapula comparo caecus. Acervus voluptate tergeo pariatur conor argumentum inventore vomito stella.'\n", | ||
" },\n", | ||
" outputs: {\n", | ||
" output: 'Excepturi adeptio spectaculum bis volaticus accusamus.'\n", | ||
" }\n", | ||
" }\n", | ||
"}\n" | ||
] | ||
} | ||
], | ||
"source": [ | ||
"const docs = await loader.load()\n", | ||
"docs[0]" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 9, | ||
"metadata": {}, | ||
"outputs": [ | ||
{ | ||
"name": "stdout", | ||
"output_type": "stream", | ||
"text": [ | ||
"{\n", | ||
" id: 'f1a04800-6f7a-4232-9743-fb5d9029bf1f',\n", | ||
" created_at: '2024-08-20T17:01:38.984045+00:00',\n", | ||
" modified_at: '2024-08-20T17:01:38.984045+00:00',\n", | ||
" name: '#f1a0 @ LangSmith Few Shot Datasets Notebook',\n", | ||
" dataset_id: '9ccd66e6-e506-478c-9095-3d9e27575a89',\n", | ||
" source_run_id: null,\n", | ||
" metadata: {\n", | ||
" dataset_split: [ 'base' ],\n", | ||
" companyCatchPhrase: 'Integrated solution-oriented secured line'\n", | ||
" },\n", | ||
" inputs: {\n", | ||
" input: 'Conventus supellex aegrotatio termes. Vapulus abscido ubi vita coadunatio modi crapula comparo caecus. Acervus voluptate tergeo pariatur conor argumentum inventore vomito stella.'\n", | ||
" },\n", | ||
" outputs: { output: 'Excepturi adeptio spectaculum bis volaticus accusamus.' }\n", | ||
"}\n" | ||
] | ||
} | ||
], | ||
"source": [ | ||
"console.log(docs[0].metadata)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 10, | ||
"metadata": {}, | ||
"outputs": [ | ||
{ | ||
"name": "stdout", | ||
"output_type": "stream", | ||
"text": [ | ||
"{\n", | ||
" input: 'Conventus supellex aegrotatio termes. Vapulus abscido ubi vita coadunatio modi crapula comparo caecus. Acervus voluptate tergeo pariatur conor argumentum inventore vomito stella.'\n", | ||
"}\n" | ||
] | ||
} | ||
], | ||
"source": [ | ||
"console.log(docs[0].metadata.inputs)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 11, | ||
"metadata": {}, | ||
"outputs": [ | ||
{ | ||
"name": "stdout", | ||
"output_type": "stream", | ||
"text": [ | ||
"{ output: 'Excepturi adeptio spectaculum bis volaticus accusamus.' }\n" | ||
] | ||
} | ||
], | ||
"source": [ | ||
"console.log(docs[0].metadata.outputs)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 12, | ||
"metadata": {}, | ||
"outputs": [ | ||
{ | ||
"name": "stdout", | ||
"output_type": "stream", | ||
"text": [ | ||
"[\n", | ||
" 'id',\n", | ||
" 'created_at',\n", | ||
" 'modified_at',\n", | ||
" 'name',\n", | ||
" 'dataset_id',\n", | ||
" 'source_run_id',\n", | ||
" 'metadata',\n", | ||
" 'inputs',\n", | ||
" 'outputs'\n", | ||
"]\n" | ||
] | ||
} | ||
], | ||
"source": [ | ||
"console.log(Object.keys(docs[0].metadata))" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"## API reference\n", | ||
"\n", | ||
"For detailed documentation of all `LangSmithLoader` features and configurations head to the [API reference](https://api.js.langchain.com/classes/_langchain_core.document_loaders_langsmith.LangSmithLoader.html)" | ||
] | ||
} | ||
], | ||
"metadata": { | ||
"kernelspec": { | ||
"display_name": "TypeScript", | ||
"language": "typescript", | ||
"name": "tslab" | ||
}, | ||
"language_info": { | ||
"codemirror_mode": { | ||
"mode": "typescript", | ||
"name": "javascript", | ||
"typescript": true | ||
}, | ||
"file_extension": ".ts", | ||
"mimetype": "text/typescript", | ||
"name": "typescript", | ||
"version": "3.7.2" | ||
} | ||
}, | ||
"nbformat": 4, | ||
"nbformat_minor": 4 | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.