docs: Add guide for running crawler in web server #1174

Pijukatel · 2025-04-25T12:15:27Z

Description

Add guide for running crawler in web server

Issues

Closes: Feature parity: Support for running Crawlee in a web server environment #1148

Copilot

Pull Request Overview

This PR adds a guide for running the crawler in a web server by including new FastAPI server and crawler code examples along with configuration updates.

Updated pyproject.toml to include new file paths and disable specific error codes for the web server examples.
Added a FastAPI server example (server.py) to illustrate how to run the crawler from a web endpoint.
Introduced an asynchronous crawler implementation (crawler.py) with lifecycle management using an async context manager.

Reviewed Changes

Copilot reviewed 4 out of 5 changed files in this pull request and generated 1 comment.

File	Description
pyproject.toml	Updated configuration to include new file mappings for docs examples and added mypy overrides.
docs/guides/code_examples/running_in_web_server/server.py	Introduces a FastAPI server with endpoints for running and interacting with a crawler.
docs/guides/code_examples/running_in_web_server/crawler.py	Adds an asynchronous crawler setup with a default request handler and lifecycle management.

Files not reviewed (1)

docs/guides/running_in_web_server.mdx: Language not supported

docs/guides/code_examples/running_in_web_server/crawler.py

Mantisus

LGTM

vdusek · 2025-04-25T16:43:25Z

docs/guides/running_in_web_server.mdx

+
+# Set up a web server
+
+There are many popular web server frameworks for Python, such as Flask, Django, Pyramid, ... In this guide, we will use the [FastAPI](https://fastapi.tiangolo.com/) to keep things simple.


links to the mentioned projects?

vdusek · 2025-04-25T16:45:38Z

docs/guides/running_in_web_server.mdx

+- `/` - The index is just giving short description of the server with example link to the second endpoint.
+- `/scrape` - This is the endpoint that receives a `url` parameter and returns the page title scraped from the URL
+
+To run the example server, make sure that you have installed the [fastapi[standard]](https://fastapi.tiangolo.com/#installation) and you can use the command `fastapi dev server.py` from the directory where the example code is located.


could we have a separate triple-backticks (```) command here for executing the server?

vdusek · 2025-04-25T16:47:07Z

docs/guides/running_in_web_server.mdx

+
+This will be our core server setup:
+
+<CodeBlock className="language-python">


since we have 2 files here, could we use filename arg for code block?

vdusek · 2025-04-25T16:47:11Z

docs/guides/running_in_web_server.mdx

+
+We will create a standard <ApiLink to="class/ParselCrawler">`ParselCrawler`</ApiLink> and use the `keep_alive=true` option to keep the crawler running even if there are no requests currently in the <ApiLink to="class/RequestQueue">`RequestQueue`</ApiLink>. This way it will always be waiting for new requests to come in.
+
+<CodeBlock className="language-python">


since we have 2 files here, could we use filename arg for code block?

vdusek · 2025-04-25T16:47:37Z

pyproject.toml

@@ -244,8 +247,15 @@ module = [
    "cookiecutter.*",               # Untyped and stubs not available
    "inquirer.*",                   # Untyped and stubs not available
 ]
+disable_error_code = ["misc"]


sorry - what is this?

vdusek · 2025-04-25T16:48:28Z

docs/guides/running_in_web_server.mdx

+import Crawler from '!!raw-loader!./code_examples/running_in_web_server/crawler.py';
+import Server from '!!raw-loader!./code_examples/running_in_web_server/server.py';
+
+# Running in web server


This should not be here, as titles are rendered based on the title field in the --- header.

Suggested change

# Running in web server

vdusek · 2025-04-25T16:48:45Z

docs/guides/running_in_web_server.mdx

+
+We will build a simple HTTP server that receives a page URL and returns the page title in the response.
+
+# Set up a web server


2nd level heading (1st only for page title)

vdusek · 2025-04-25T16:48:51Z

docs/guides/running_in_web_server.mdx

+
+To run the example server, make sure that you have installed the [fastapi[standard]](https://fastapi.tiangolo.com/#installation) and you can use the command `fastapi dev server.py` from the directory where the example code is located.
+
+# Create a crawler


2nd level heading (1st only for page title)

Add example guide for running in web server

6be3dc8

Pijukatel added documentation Improvements or additions to documentation. t-tooling Issues with this label are in the ownership of the tooling team. labels Apr 25, 2025

github-actions bot assigned Pijukatel Apr 25, 2025

github-actions bot added this to the 113rd sprint - Tooling team milestone Apr 25, 2025

Pijukatel requested a review from Copilot April 25, 2025 12:17

Copilot AI reviewed Apr 25, 2025

View reviewed changes

docs/guides/code_examples/running_in_web_server/crawler.py Show resolved Hide resolved

Pijukatel requested review from vdusek and Mantisus April 25, 2025 12:20

Pijukatel marked this pull request as ready for review April 25, 2025 12:20

Mantisus approved these changes Apr 25, 2025

View reviewed changes

vdusek requested changes Apr 25, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: Add guide for running crawler in web server #1174

docs: Add guide for running crawler in web server #1174

Pijukatel commented Apr 25, 2025

Copilot AI left a comment

Mantisus left a comment

vdusek Apr 25, 2025

vdusek Apr 25, 2025

vdusek Apr 25, 2025

vdusek Apr 25, 2025

vdusek Apr 25, 2025

vdusek Apr 25, 2025

vdusek Apr 25, 2025

vdusek Apr 25, 2025


		# Set up a web server

		There are many popular web server frameworks for Python, such as Flask, Django, Pyramid, ... In this guide, we will use the [FastAPI](https://fastapi.tiangolo.com/) to keep things simple.


		This will be our core server setup:

		<CodeBlock className="language-python">


		We will create a standard <ApiLink to="class/ParselCrawler">`ParselCrawler`</ApiLink> and use the `keep_alive=true` option to keep the crawler running even if there are no requests currently in the <ApiLink to="class/RequestQueue">`RequestQueue`</ApiLink>. This way it will always be waiting for new requests to come in.

		<CodeBlock className="language-python">


		We will build a simple HTTP server that receives a page URL and returns the page title in the response.

		# Set up a web server


		To run the example server, make sure that you have installed the [fastapi[standard]](https://fastapi.tiangolo.com/#installation) and you can use the command `fastapi dev server.py` from the directory where the example code is located.

		# Create a crawler

docs: Add guide for running crawler in web server #1174

Are you sure you want to change the base?

docs: Add guide for running crawler in web server #1174

Conversation

Pijukatel commented Apr 25, 2025

Description

Issues

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Mantisus left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment