A fast, idiomatic Rust tool for scraping and converting official Rust documentation sites into Markdown — with automatic attribution headers and offline-friendly output.
Built for maintainers of AI agents, documentation tools, or GPTs that use content from rust-lang.org, docs.rs, or other community-authored Rust books and sites.
- 🔍 Scrapes HTML pages from Rust ecosystem documentation sites
- 📄 Converts to Markdown using customizable rules
- 🖋️ Injects attribution headers automatically
- 📂 Outputs Markdown to structured folders
- 🦀 100% Rust-native, fast and parallelizable
The following sources are currently scraped:
| Doc Name | Source URL | 
|---|---|
| The Rust Book | https://doc.rust-lang.org/book/ | 
| Rust by Example | https://doc.rust-lang.org/rust-by-example/ | 
| The Cargo Book | https://doc.rust-lang.org/cargo/ | 
| The Rustonomicon | https://doc.rust-lang.org/nomicon/ | 
| The Async Book | https://rust-lang.github.io/async-book/ | 
| The Clippy Book | https://rust-lang.github.io/rust-clippy/current/ | 
| Error Index | https://doc.rust-lang.org/error_codes/ | 
| Rust API Guidelines | https://rust-lang.github.io/api-guidelines/ | 
| The Rust and WebAssembly Book | https://rustwasm.github.io/book/ | 
| Tokio Documentation | https://docs.rs/tokio/latest/tokio/ | 
| Axum Documentation | https://docs.rs/axum/latest/axum/ | 
| Leptos Book | https://book.leptos.dev/ | 
| Embedded Rust Book | https://docs.rust-embedded.org/book/ | 
| The Little Book of Rust Macros | https://danielkeep.github.io/tlborm/book/ | 
| Too Many Linked Lists | https://rust-unofficial.github.io/too-many-lists/ | 
You can customize which documentation sites the scraper pulls from by editing the source list in:
src/targets.rs
Inside, you'll find a function like:
pub fn get_scrape_targets() -> HashMap<String, String> {
    HashMap::from([
        ("The Rust Programming Language Book".into(), "https://doc.rust-lang.org/book/".into()),
        ("Tokio Documentation".into(), "https://docs.rs/tokio/latest/tokio/".into()),
        // ...
    ])
}You can:
- ✅ Add new entries to scrape new Rust documentation sites
- ❌ Remove entries if you don’t need certain sources
- ✏️ Rename entries (keys are just used for folder names)
Changes take effect next time you run the scraper.
cargo build --releaseTo scrape all configured sources and output Markdown into output/:
cargo run --releaseIf you only want to run specific modules, you can comment out others in
main.rs.
- 
Markdown files will be saved in folder: ./scraped_docs/
- 
Attribution headers are prepended like: 
<!--
Source: The Rust Book - https://doc.rust-lang.org/book/
License: MIT OR Apache-2.0
-->- All scraped content includes source URL and license attribution in each .mdor.rsfile.
- All sources currently use dual MITorApache-2.0licenses.
- You can find complete references in ATTRIBUTION.md.
src/
├── main.rs             # Entry point
├── scrape.rs           # Web scraping and HTML-to-Markdown logic
├── attribute_md.rs     # Attribution for .md files
├── attribute_rs.rs     # Attribution for .rs files
├── utils.rs            # Helper functions
output/                 # Final Markdown output
- Rust 1.72+ (tested)
- OpenSSL (for crates using reqweston some systems)
sudo apt install pkg-config libssl-devcargo fmt     # Format
cargo clippy  # Lint
cargo test    # (Future: Add test suite)This project is dual-licensed under either:
- MIT License (LICENSE-MIT)
- Apache License, Version 2.0 (LICENSE-APACHE)
You may choose either license.
Scraped documentation content retains the license of its original source (typically MIT OR Apache-2.0).
See ATTRIBUTION.md for source-specific license references.
PRs welcome — especially for:
- New doc sources
- Better markdown cleaning
- Language-specific scraping (i18n)
Last updated: 2025-05-25