Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Harrison/apify #2215

Merged
merged 2 commits into from
Mar 31, 2023
Merged

Harrison/apify #2215

merged 2 commits into from
Mar 31, 2023

Conversation

hwchase17
Copy link
Contributor

No description provided.

jirimoravcik and others added 2 commits March 30, 2023 20:47
Adds an integration with the [Apify](https://apify.com/) platform. Apify
is a web scraping and data extraction platform. You can use it to get
content from documentation, knowledge bases, help centers, or blogs.

Apify welcomes all developers, they can sign up for a free account
[here](https://console.apify.com/sign-up).

## Usage
```python
import os
from langchain.document_loaders.base import Document
from langchain.indexes import VectorstoreIndexCreator
from langchain.utilities import ApifyWrapper

os.environ["OPENAI_API_KEY"] = "Your OpenAI API key"
os.environ["APIFY_API_TOKEN"] = "Your Apify API token"

apify = ApifyWrapper()

loader = apify.call_actor(
    actor_id="apify/website-content-crawler",
    run_input={"startUrls": [{"url": "https://python.langchain.com/en/latest/"}]},
    dataset_mapping_function=lambda item: Document(
        page_content=item["text"] or "", metadata={"source": item["url"]}
    ),
)

index = VectorstoreIndexCreator().from_loaders([loader])
query = "What is LangChain?"
result = index.query_with_sources(query)

print(result["answer"])
print(result["sources"])
```
### Output
```
LangChain is a standard interface through which you can interact with a variety of large language models (LLMs). It provides modules that can be used to build language model applications, and it also provides chains and agents with memory capabilities.

https://python.langchain.com/en/latest/modules/models/llms.html, https://python.langchain.com/en/latest/getting_started/getting_started.html
```

This PR includes all the required code with docstrings, and
documentation with examples. If you have any suggestions how to improve
this PR, feel free to comment here.
@hwchase17 hwchase17 merged commit 2eeaccf into master Mar 31, 2023
@hwchase17 hwchase17 deleted the harrison/apify branch March 31, 2023 03:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants