LibreCrawl

A web-based multi-tenant crawler for SEO analysis and website auditing.

Try the Live Demo: https://librecrawl.com/app/

API Documentation: https://librecrawl.com/api/docs/

Plugin Workshop https://librecrawl.com/workshop/

LibreCrawl will always be free and open source. If it's replacing your $259/year Screaming Frog license, deepcrawl license or sitebulb license, buy me a coffee.

What it does

LibreCrawl crawls websites and gives you detailed information about pages, links, SEO elements, and performance. It's built as a web application using Python Flask with a modern web interface supporting multiple concurrent users.

Features

🚀 Multi-tenancy - Multiple users can crawl simultaneously with isolated sessions
🎨 Custom CSS styling - Personalize the UI with your own CSS themes
💾 Browser localStorage persistence - Settings saved per browser
🔄 JavaScript rendering for dynamic content (React, Vue, Angular, etc.)
📊 SEO analysis - Extract titles, meta descriptions, headings, etc.
🔗 Link analysis - Track internal and external links with detailed relationship mapping
📈 PageSpeed Insights integration - Analyze Core Web Vitals
💾 Multiple export formats - CSV, JSON, or XML
🔍 Issue detection - Automated SEO issue identification
⚡ Real-time crawling progress with live statistics

Getting started

Quick Start (Automatic Installation)

The easiest way to run LibreCrawl - just run the startup script and it handles everything:

Windows:

start-librecrawl.bat

Linux/Mac:

chmod +x start-librecrawl.sh
./start-librecrawl.sh

What it does automatically:

Checks for Docker - if found, runs LibreCrawl in a container (recommended)
If no Docker, checks for Python - if not found, downloads and installs it (Windows only temporairly disabled since it causes some bat issues)
Installs all dependencies automatically (pip install -r requirements.txt)
Installs Playwright browsers for JavaScript rendering
Starts LibreCrawl in local mode (no authentication)
Opens your browser to http://localhost:5000

Manual Installation

If you prefer to install manually or want more control:

Option 1: Docker (Recommended)

Requirements:

Docker and Docker Compose

Steps:

# Clone the repository
git clone https://github.com/PhialsBasement/LibreCrawl.git
cd LibreCrawl

# Copy environment file
cp .env.example .env

# Start LibreCrawl
docker-compose up -d

# Open browser to http://localhost:5000

By default, LibreCrawl runs in local mode for easy personal use. The .env file controls this:

# .env file
LOCAL_MODE=true
HOST_BINDING=127.0.0.1
REGISTRATION_DISABLED=false

For production deployment with user authentication, edit your .env file:

# .env file
LOCAL_MODE=false
HOST_BINDING=0.0.0.0
REGISTRATION_DISABLED=false

Option 2: Python

Python 3.8 or later
Modern web browser (Chrome, Firefox, Safari, Edge)

Installation

Clone or download this repository
Install dependencies:

pip install -r requirements.txt

For JavaScript rendering support (optional):

playwright install chromium

Run the application:

# Standard mode (with authentication and tier system)
python main.py

# Local mode (all users get admin tier, no rate limits)
python main.py --local
# or
python main.py -l

Open your browser and navigate to:
- Local: http://localhost:5000
- Network: http://<your-ip>:5000

LibreCrawl Plugins

Drop your custom plugin files in /web/static/plugins/! Each .js file will automatically create a new tab in LibreCrawl.

🔌 Quick Start

Create a new .js file in this folder (e.g., my-plugin.js)
Register your plugin using the LibreCrawl Plugin API
Refresh the app - your new tab appears automatically!

📝 Example Plugin Structure

LibreCrawlPlugin.register({
  // Required: Unique ID (used for tab identification)
  id: 'my-plugin',

  // Required: Display name
  name: 'My Plugin',

  // Required: Tab configuration
  tab: {
    label: 'My Tab',
    icon: '🔥', // Optional emoji
  },

  // Called when your tab is activated
  onTabActivate(container, data) {
    // data contains: { urls, links, issues, stats }
    container.innerHTML = `
      <div class="plugin-content" style="padding: 20px; overflow-y: auto; max-height: calc(100vh - 280px);">
        <h2>My Custom Analysis</h2>
        <p>Found ${data.urls.length} URLs!</p>
      </div>
    `;
  },

  // Optional: Called during live crawls when data updates
  onDataUpdate(data) {
    if (this.isActive) {
      // Update your UI
    }
  }
});

🎯 Available Data

Your plugin receives the same data as built-in tabs:

urls - Array of all crawled URLs with full metadata
links - All discovered links (internal/external)
issues - Detected SEO issues
stats - Crawl statistics (discovered, crawled, depth, speed)

📚 Full API Reference

Plugin Configuration

{
  id: string,              // Unique identifier
  name: string,            // Display name
  version: string,         // Optional version
  author: string,          // Optional author
  description: string,     // Optional description

  tab: {
    label: string,         // Tab button text
    icon: string,          // Optional emoji/icon
    position: number       // Optional position (default: append to end)
  }
}

Lifecycle Hooks

onLoad() - Called when plugin loads
onTabActivate(container, data) - Called when tab becomes active
onTabDeactivate() - Called when user switches away
onDataUpdate(data) - Called during live crawls
onCrawlComplete(data) - Called when crawl finishes

Utilities

Access built-in utilities via this.utils:

this.utils.showNotification(message, type) // 'success', 'error', 'info'
this.utils.formatUrl(url)
this.utils.escapeHtml(text)

🎨 Styling

Use these CSS classes to match LibreCrawl's design:

.plugin-content - Main container
.plugin-header - Header section
.data-table - Tables (auto-styled)
.stat-card - Statistic cards
.score-good / .score-needs-improvement / .score-poor - Score indicators

Important: Always add these styles to your main plugin container for proper scrolling:

container.innerHTML = `
  <div class="plugin-content" style="padding: 20px; overflow-y: auto; max-height: calc(100vh - 280px);">
    <!-- Your content here -->
  </div>
`;

The max-height: calc(100vh - 280px) ensures your content scrolls properly within the tab pane.

Example Plugins

Check out these example plugins to get started:

_example-plugin.js - Basic template (ignored by loader)
e-e-a-t.js - E-E-A-T analyzer example

Running Modes

Standard Mode (default):

Full authentication system with login/register
Tier-based access control (Guest, User, Extra, Admin)
Guest users limited to 3 crawls per 24 hours (IP-based)
Ideal for public-facing demos or shared hosting

Local Mode (--local or -l):

All users automatically get admin tier access
No rate limits or tier restrictions
Perfect for personal use or single-user self-hosting
Recommended for local development and testing

Configuration

Click "Settings" to configure:

Crawler settings: depth (up to 5M URLs), delays, external links
Request settings: user agent, timeouts, proxy, robots.txt
JavaScript rendering: browser engine, wait times, viewport size
Filters: file types and URL patterns to include/exclude
Export options: formats and fields to export
Custom CSS: personalize the UI appearance with custom styles
Issue exclusion: patterns to exclude from SEO issue detection

For PageSpeed analysis, add a Google API key in Settings > Requests for higher rate limits (25k/day vs limited).

Export formats

CSV: Spreadsheet-friendly format
JSON: Structured data with all details
XML: Markup format for other tools

Multi-tenancy

LibreCrawl supports multiple concurrent users with isolated sessions:

Each browser session gets its own crawler instance and data
Settings are stored in browser localStorage (persistent across restarts)
Custom CSS themes are per-browser
Sessions expire after 1 hour of inactivity
Crawl data is isolated between users

Known limitations

PageSpeed API has rate limits (works better with API key)
Large sites may take time to crawl completely
JavaScript rendering is slower than HTTP-only crawling
Settings stored in localStorage (cleared if browser data is cleared)

Files

main.py - Main application and Flask server
src/crawler.py - Core crawling engine
src/settings_manager.py - Configuration management
web/ - Frontend interface files

License

MIT License - see LICENSE file for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

LibreCrawl

What it does

Features

Getting started

Quick Start (Automatic Installation)

Manual Installation

Option 1: Docker (Recommended)

Option 2: Python

Installation

LibreCrawl Plugins

🔌 Quick Start

📝 Example Plugin Structure

🎯 Available Data

📚 Full API Reference

Plugin Configuration

Lifecycle Hooks

Utilities

🎨 Styling

Example Plugins

Running Modes

Configuration

Export formats

Multi-tenancy

Known limitations

Files

License

About

Uh oh!

Releases

Packages

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 65 Commits
src		src
web		web
.env.example		.env.example
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
main.py		main.py
requirements.txt		requirements.txt
start-librecrawl.bat		start-librecrawl.bat
start-librecrawl.sh		start-librecrawl.sh

License

PhialsBasement/LibreCrawl

Folders and files

Latest commit

History

Repository files navigation

LibreCrawl

What it does

Features

Getting started

Quick Start (Automatic Installation)

Manual Installation

Option 1: Docker (Recommended)

Option 2: Python

Installation

LibreCrawl Plugins

🔌 Quick Start

📝 Example Plugin Structure

🎯 Available Data

📚 Full API Reference

Plugin Configuration

Lifecycle Hooks

Utilities

🎨 Styling

Example Plugins

Running Modes

Configuration

Export formats

Multi-tenancy

Known limitations

Files

License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages