A headless-Chrome web crawler that discovers same-host links and optionally saves HTML, Markdown, PDF, or screenshots. Use as a library or via the stealth-crawler
CLI.
- Asynchronous, headless Chrome browsing via
pydoll
- Discovers internal links starting from a root URL
- Optional content saving:
- HTML
- Markdown (via
html2text
) - PDF snapshots
- PNG screenshots
- Rich progress bars with
rich
- Configurable URL filtering (base, exclude)
- Pure-Python API and CLI
Install the latest stable release:
pip install stealth-crawler
Or in isolation:
pipx install stealth-crawler
Or via other tools:
-
uv
uv venv .venv source .venv/bin/activate uv pip install stealth-crawler
-
Poetry
poetry add stealth-crawler
# Discover URLs only
stealth-crawler crawl https://example.com --urls-only
# Crawl and save HTML + Markdown
stealth-crawler crawl https://example.com \
--save-html --save-md \
--output-dir ./output
# Exclude specific paths
stealth-crawler crawl https://example.com \
--exclude /private,/logout
Run stealth-crawler --help
for full options.
import asyncio
from stealthcrawler import StealthCrawler
crawler = StealthCrawler(
base="https://example.com",
exclude=["/admin"],
save_html=True,
save_md=True,
output_dir="export"
)
urls = asyncio.run(crawler.crawl("https://example.com"))
print(urls)
Option | CLI flag | API param | Default |
---|---|---|---|
Base URL(s) | --base |
base |
start URL |
Exclude paths | --exclude |
exclude |
none |
Save HTML | --save-html |
save_html |
False |
Save Markdown | --save-md |
save_md |
False |
URLs only | --urls-only |
urls_only |
False |
Output folder | --output-dir |
output_dir |
./output |
-
Run tests:
pytest
-
Check formatting & linting:
black src tests ruff check src tests
-
Fork the repository and create a feature branch.
-
Set up your development environment:
python3 -m venv .venv source .venv/bin/activate pip install -e ".[dev]"
Or with uv:
uv venv .venv source .venv/bin/activate uv pip install -e ".[dev]"
-
Implement your changes, add tests, and run:
black src tests ruff check src tests pytest
-
Open a pull request against
main
.
This project is licensed under the GNU General Public License v3.0 or later (GPL-3.0-or-later). You are free to use, modify, and redistribute under the terms of the GPL. See LICENSE for full details.