Skip to content

hansanaT/CrawlDoc

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

4 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ•·οΈ CrawlDoc

A powerful Chrome extension that automatically crawls and summarizes entire documentation sites using AI. No more reading through dozens of pagesβ€”get comprehensive summaries in seconds!

License: MIT Chrome Web Store Version

✨ Features

  • πŸ€– Multi-AI Support - Use Claude, OpenAI, or OpenRouter (with automatic fallback)
  • πŸ”— Smart Web Crawling - Automatically crawls entire documentation sites up to configurable depth and page limits
  • πŸ“ Comprehensive Summaries - Generates detailed summaries covering all major aspects
  • 🎯 Focus Areas - Optionally emphasize specific topics (API, Setup, Examples, etc.)
  • πŸ“Ž One-Click URL Capture - Grab the current browser tab's URL with a single button click
  • πŸ’Ύ Multiple Export Options - Download as text, copy to clipboard, or generate shareable links
  • πŸ” Your Own API Keys - Use your own API credentials for full control and privacy
  • ⚑ Automatic Fallback - If one AI provider fails, automatically tries the next one

πŸš€ Quick Start

Installation

  1. Clone or download this repository

    git clone https://github.com/yourusername/documentation-summarizer.git
    cd documentation-summarizer
  2. Open Chrome Extensions

    • Go to chrome://extensions/
    • Enable "Developer mode" (toggle in top right)
    • Click "Load unpacked"
    • Select the extension folder
  3. Configure API Keys

    • Click the extension icon
    • Click on βš™οΈ API Configuration to expand
    • Add your API key(s) from one or more providers:
  4. Start Summarizing!

    • Click the extension icon
    • Click "πŸ“Ž Current" to capture your current page, or paste a documentation URL
    • (Optional) Add focus areas like "API, Setup, Examples"
    • Click "Generate Full Summary"

πŸ“– How It Works

1. URL Input

Choose between:

  • πŸ“Ž Current Button - Auto-captures your active browser tab's URL
  • Manual Input - Paste any documentation URL

2. Web Crawling

The extension intelligently crawls:

  • The entire documentation site (configurable max pages)
  • Multiple levels deep (configurable max depth)
  • Only same-domain links to stay focused
  • Extracts clean, readable content from each page

3. AI Summarization

Generates comprehensive summaries covering:

  • Main purpose and overview
  • Key features and capabilities
  • Installation/setup instructions
  • Core concepts and terminology
  • API reference and methods
  • Common use cases and examples
  • Best practices and tips
  • Important warnings or notes

4. Export Options

  • πŸ“„ Download - Save as plain text file
  • πŸ”— Share - Generate shareable link with encoded summary
  • πŸ“‹ Copy - Copy summary to clipboard

βš™οΈ Configuration

API Providers

The extension supports three AI providers. You can use one or all three:

Claude (Anthropic)

  • Model: Claude 3.5 Sonnet
  • Best for: Detailed, nuanced summaries
  • Get API Key

OpenAI

  • Models: GPT-4o, GPT-4 Turbo, GPT-3.5 Turbo
  • Best for: Fast, accurate summaries
  • Get API Key

OpenRouter

  • Models: 5+ models including Claude, GPT, Llama, Mistral
  • Best for: Choice and flexibility
  • Get API Key

Crawling Settings

In the βš™οΈ API Configuration section, customize:

  • Max Pages to Crawl (default: 50)

    • Limits the number of pages crawled from the documentation
    • Lower = faster but less comprehensive
    • Higher = more thorough but slower
  • Max Depth Level (default: 3)

    • How many link levels to follow
    • Depth 1: only main page
    • Depth 3: main β†’ sub pages β†’ sub-sub pages
    • Higher depths take longer

πŸ’‘ Usage Examples

Example 1: Summarize Next.js Documentation

  1. Click "πŸ“Ž Current" while on nextjs.org/docs
  2. Leave focus areas blank for comprehensive summary
  3. Click "Generate Full Summary"
  4. Get a complete overview of Next.js in minutes!

Example 2: Focus on Specific Topics

  1. Paste documentation URL
  2. Add focus areas: "Installation, API, Examples"
  3. Click "Generate Full Summary"
  4. Summary prioritizes these topics

Example 3: Quick Reference

  1. Capture URL with "πŸ“Ž Current"
  2. Click "Generate Full Summary"
  3. Copy with "πŸ“‹ Copy"
  4. Paste into your notes or documents

πŸ“ Project Structure

documentation-summarizer/
β”œβ”€β”€ manifest.json          # Chrome extension config
β”œβ”€β”€ popup.html            # UI layout and styles
β”œβ”€β”€ popup.js              # Frontend logic and UI interactions
β”œβ”€β”€ background.js         # Backend logic, crawling, AI calls
└── README.md            # This file

πŸ”§ Technical Details

Technologies Used

  • Frontend: HTML, CSS, Vanilla JavaScript
  • Backend: Chrome Service Worker (background.js)
  • APIs: Anthropic, OpenAI, OpenRouter REST APIs
  • Storage: Chrome Local Storage API

Browser Compatibility

  • βœ… Chrome 88+
  • βœ… Chromium-based browsers (Brave, Edge, Opera)
  • ⚠️ Firefox (requires manifest adaptation)

Key Functions

popup.js (Frontend)

  • Event listener setup
  • API key management
  • Settings management
  • UI state management
  • Export functionality

background.js (Backend)

  • crawlDocumentation() - Intelligent web crawling
  • extractTextFromHtml() - HTML parsing and text extraction
  • extractLinksFromHtml() - Smart link extraction
  • callAI() - AI provider abstraction
  • callClaude(), callOpenRouter(), callOpenAI() - Provider-specific API calls

πŸ” Privacy & Security

  • Your API Keys: Stored locally in Chrome's storage, never sent to external servers except the respective API providers
  • Content Processing: Documentation content is sent to your chosen AI provider for processing
  • No Tracking: The extension contains no analytics, tracking, or telemetry
  • Open Source: Code is transparent and auditable

πŸ› Troubleshooting

"Summarization failed: Failed to fetch documentation content"

  • The website might require authentication
  • Try adjusting max pages/depth in settings
  • Check that the URL is valid and publicly accessible

API Key Not Working

  • Verify the API key is correct
  • Check that your API account has available credits/quota
  • Ensure you're using the right API key for the provider
  • Try a different provider to test

Extension Not Loading

  • Clear Chrome cache: chrome://settings/clearBrowserData
  • Reload the extension from extensions page
  • Check browser console for errors (Chrome DevTools)

Slow Performance

  • Reduce "Max Pages to Crawl" in settings
  • Reduce "Max Depth Level" in settings
  • Use a faster AI model (GPT-3.5 Turbo instead of GPT-4)
  • Try a different AI provider

πŸ“Š Performance Tips

  1. Start with smaller crawls - Begin with max 20 pages and depth 2
  2. Use faster models - GPT-3.5 Turbo is faster than GPT-4
  3. Add focus areas - Focused summaries process faster than comprehensive ones
  4. Cache results - Export and save frequently-used summaries

🀝 Contributing

Contributions are welcome! Feel free to:

  • Report bugs by opening an issue
  • Suggest features and improvements
  • Submit pull requests
  • Improve documentation

πŸ“ License

This project is licensed under the MIT License - see the LICENSE file for details.

πŸŽ“ Learning Resources

πŸ™ Acknowledgments

  • Built with Chrome Extensions API
  • Powered by Claude, OpenAI, and OpenRouter
  • Inspired by the need to quickly understand complex documentation

πŸ“§ Support

Have questions or issues?

  • Open an issue on GitHub
  • Check existing issues for solutions
  • Review the Troubleshooting section above

🚦 Roadmap

  • Firefox extension version
  • Batch processing multiple URLs
  • Summary scheduling/automation
  • Integration with popular note-taking apps (Notion, Obsidian)
  • Custom prompt templates
  • Local AI model support (Ollama)
  • Comparison summaries (before/after docs versions)

Made with ❀️ to help developers save time and understand documentation faster.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published