Stealth Crawler

A headless-Chrome web crawler that discovers same-host links and optionally saves HTML, Markdown, PDF, or screenshots. Use as a library or via the stealth-crawler CLI.

Features

Asynchronous, headless Chrome browsing via pydoll
Discovers internal links starting from a root URL
Optional content saving:
- HTML
- Markdown (via html2text)
- PDF snapshots
- PNG screenshots
Rich progress bars with rich
Configurable URL filtering (base, exclude)
Pure-Python API and CLI

Installation

Install the latest stable release:

pip install stealth-crawler

Or in isolation:

pipx install stealth-crawler

Or via other tools:

uv

uv venv .venv
source .venv/bin/activate
uv pip install stealth-crawler

Poetry
```
poetry add stealth-crawler
```

Quickstart

Command-Line

# Discover URLs only
stealth-crawler crawl https://example.com --urls-only

# Crawl and save HTML + Markdown
stealth-crawler crawl https://example.com \
  --save-html --save-md \
  --output-dir ./output

# Exclude specific paths
stealth-crawler crawl https://example.com \
  --exclude /private,/logout

Run stealth-crawler --help for full options.

Python API

import asyncio
from stealthcrawler import StealthCrawler

crawler = StealthCrawler(
    base="https://example.com",
    exclude=["/admin"],
    save_html=True,
    save_md=True,
    output_dir="export"
)
urls = asyncio.run(crawler.crawl("https://example.com"))
print(urls)

Configuration

Option	CLI flag	API param	Default
Base URL(s)	`--base`	`base`	start URL
Exclude paths	`--exclude`	`exclude`	none
Save HTML	`--save-html`	`save_html`	`False`
Save Markdown	`--save-md`	`save_md`	`False`
URLs only	`--urls-only`	`urls_only`	`False`
Output folder	`--output-dir`	`output_dir`	`./output`

Testing & Quality

Run tests:
```
pytest
```
Check formatting & linting:
```
black src tests
ruff check src tests
```

Contributing

Fork the repository and create a feature branch.

Set up your development environment:

python3 -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]"

Or with uv:

uv venv .venv
source .venv/bin/activate
uv pip install -e ".[dev]"

Implement your changes, add tests, and run:

black src tests
ruff check src tests
pytest

Open a pull request against main.

License

This project is licensed under the GNU General Public License v3.0 or later (GPL-3.0-or-later). You are free to use, modify, and redistribute under the terms of the GPL. See LICENSE for full details.

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
src/stealthcrawler		src/stealthcrawler
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
icon.png		icon.png
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Stealth Crawler

Features

Installation

Quickstart

Command-Line

Python API

Configuration

Testing & Quality

Contributing

License

About

Uh oh!

Releases

Languages

License

kgruiz/stealth-crawler

Folders and files

Latest commit

History

Repository files navigation

Stealth Crawler

Features

Installation

Quickstart

Command-Line

Python API

Configuration

Testing & Quality

Contributing

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Languages