Skip to content

A lightweight, self-hosted headless browser automation platform. Designed as an alternative to Browserless, built for speed, privacy, and scalability.

License

Notifications You must be signed in to change notification settings

rubyco/HeadlessX

 
 

Repository files navigation

🚀 HeadlessX v1.2.0

Open Source Browserless Web Scraping API with Human-like Behavior

License: MIT Node.js Playwright GitHub Open Source

🎯 Unified Solution: Website + API on a single domain
🧠 Human-like Behavior: 40+ anti-detection techniques
🚀 Deploy Anywhere: Docker, Node.js+PM2, or Development


✨ Key Features

  • 🌐 Unified Architecture: Website and API on one domain
  • 🧠 Human-like Intelligence: Natural mouse movements, smart scrolling, behavioral randomization
  • 📊 Multiple Formats: HTML, text, screenshots, PDFs
  • ⚡ Batch Processing: Handle multiple URLs efficiently
  • 🔒 Production Ready: Docker, PM2, Nginx, SSL support
  • 🛡️ Anti-Detection: 40+ stealth techniques for reliable scraping

🎯 Quick Start

# 1. Clone and configure
git clone https://github.com/SaifyXPRO/HeadlessX.git
cd HeadlessX

# Quick setup (makes scripts executable + creates .env)
chmod +x scripts/quick-setup.sh && ./scripts/quick-setup.sh
# Then edit: nano .env  # Update DOMAIN, SUBDOMAIN, and AUTH_TOKEN

Choose your deployment:

Method Command Best For
🐳 Docker docker-compose up -d Production, easy deployment
🔧 Auto Setup chmod +x scripts/setup.sh && sudo ./scripts/setup.sh VPS/Server with full control
💻 Development npm install && npm start Local development, testing

Access your HeadlessX:

🌐 Website:  https://your-subdomain.yourdomain.com
🔧 Health:   https://your-subdomain.yourdomain.com/api/health
📊 Status:   https://your-subdomain.yourdomain.com/api/status?token=YOUR_AUTH_TOKEN

🏗️ New Modular Architecture v1.2.0

HeadlessX v1.2.0 introduces a completely refactored modular architecture for better maintainability, scalability, and development experience.

Key Improvements:

  • 🔧 Separation of Concerns: Distinct modules for configuration, services, controllers, and middleware
  • 🚀 Better Performance: Optimized browser management and resource usage
  • 🛠️ Developer Experience: Clear module boundaries and dependency injection
  • 📦 Production Ready: Enhanced error handling and logging with correlation IDs
  • 🔒 Security: Improved authentication and rate limiting
  • 📊 Monitoring: Structured logging and health monitoring

Architecture Overview:

┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
│   Routes        │───▶│   Controllers   │───▶│   Services      │
│   (api.js)      │    │   (rendering.js)│    │   (browser.js)  │
└─────────────────┘    └─────────────────┘    └─────────────────┘
         │                       │                       │
         ▼                       ▼                       ▼
┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
│   Middleware    │    │   Utils         │    │   Config        │
│   (auth.js)     │    │   (logger.js)   │    │   (index.js)    │
└─────────────────┘    └─────────────────┘    └─────────────────┘

Quick Migration from v1.1.0:

  • The original src/server.js (3079 lines) has been broken down into 20+ focused modules
  • Environment variable TOKEN is now AUTH_TOKEN
  • PM2 config moved from config/ecosystem.config.js to ecosystem.config.js
  • All functionality preserved with improved performance and maintainability

📖 Detailed Documentation: MODULAR_ARCHITECTURE.md


🚀 Deployment Guide

🐳 Docker Deployment (Recommended)

# Install Docker (if needed)
curl -fsSL https://get.docker.com | sh
sudo usermod -aG docker $USER

# Deploy HeadlessX
git clone https://github.com/SaifyXPRO/HeadlessX.git
cd HeadlessX
cp .env.example .env
nano .env  # Configure DOMAIN, SUBDOMAIN, AUTH_TOKEN

# Start services
docker-compose up -d

# Optional: Setup SSL
sudo apt install certbot
sudo certbot --standalone -d your-subdomain.yourdomain.com

Docker Management:

docker-compose ps              # Check status
docker-compose logs headlessx  # View logs
docker-compose restart         # Restart services
docker-compose down            # Stop services

🔧 Node.js + PM2 Deployment

# Automated setup (recommended)
git clone https://github.com/SaifyXPRO/HeadlessX.git
cd HeadlessX
cp .env.example .env
nano .env  # Configure environment
chmod +x scripts/setup.sh
sudo ./scripts/setup.sh  # Installs dependencies, builds website, starts PM2

🌐 Nginx Configuration (Auto-handled by setup script):

The setup script automatically configures nginx, but if you need to manually configure:

# Copy and configure nginx site
sudo cp nginx/headlessx.conf /etc/nginx/sites-available/headlessx

# Replace placeholders with your actual domain
sudo sed -i 's/SUBDOMAIN.DOMAIN.COM/your-subdomain.yourdomain.com/g' /etc/nginx/sites-available/headlessx

# Enable the site
sudo ln -sf /etc/nginx/sites-available/headlessx /etc/nginx/sites-enabled/
sudo rm -f /etc/nginx/sites-enabled/default

# Test and reload nginx
sudo nginx -t && sudo systemctl reload nginx

Manual setup (if not using setup script):

sudo apt update && sudo apt upgrade -y
curl -fsSL https://deb.nodesource.com/setup_20.x | sudo -E bash -
sudo apt install -y nodejs build-essential
npm install && npm run build
sudo npm install -g pm2
npm run pm2:start

PM2 Management:

npm run pm2:status     # Check status
npm run pm2:logs       # View logs
npm run pm2:restart    # Restart server
npm run pm2:stop       # Stop server

💻 Development Setup

git clone https://github.com/SaifyXPRO/HeadlessX.git
cd HeadlessX
cp .env.example .env
nano .env  # Set AUTH_TOKEN, DOMAIN=localhost, SUBDOMAIN=headlessx

# Make scripts executable
chmod +x scripts/*.sh

# Install dependencies
npm install
cd website && npm install && npm run build && cd ..

# Start development server
npm start  # Access at http://localhost:3000

🌐 API Routes & Structure

HeadlessX Routes:
├── /favicon.ico         → Favicon
├── /robots.txt          → SEO robots file
├── /api/health         → Health check (no auth required)
├── /api/status         → Server status (requires token)
├── /api/render         → Full page rendering
├── /api/html           → HTML extraction  
├── /api/content        → Clean text extraction
├── /api/screenshot     → Screenshot generation
├── /api/pdf            → PDF generation
└── /api/batch          → Batch URL processing

🔄 Request Flow:

  1. Nginx receives request on port 80/443
  2. Proxies to Node.js server on port 3000
  3. Server routes based on path:
    • /api/* → API endpoints
    • /* → Website files (built Next.js app)

🚀 API Examples & HTTP Integrations

Quick Health Check (No Auth)

curl https://your-subdomain.yourdomain.com/api/health

🔧 cURL Examples

Extract HTML Content

curl -X POST "https://your-subdomain.yourdomain.com/api/html?token=YOUR_AUTH_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com", "timeout": 30000}'

Generate Screenshot

curl "https://your-subdomain.yourdomain.com/api/screenshot?token=YOUR_AUTH_TOKEN&url=https://example.com&fullPage=true" \
  -o screenshot.png

Extract Text Only

curl -X POST "https://your-subdomain.yourdomain.com/api/text?token=YOUR_AUTH_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com", "waitForSelector": "main"}'

Generate PDF

curl -X POST "https://your-subdomain.yourdomain.com/api/pdf?token=YOUR_AUTH_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com", "format": "A4"}' \
  -o document.pdf

🤖 Make.com (Integromat) Integration

HTTP Request Module Configuration:

{
  "url": "https://your-subdomain.yourdomain.com/api/html",
  "method": "POST",
  "headers": {
    "Content-Type": "application/json"
  },
  "qs": {
    "token": "YOUR_AUTH_TOKEN"
  },
  "body": {
    "url": "{{url_to_scrape}}",
    "timeout": 30000,
    "waitForSelector": "{{optional_selector}}"
  }
}

⚡ Zapier Integration

Webhooks by Zapier Setup:

  • URL: https://your-subdomain.yourdomain.com/api/html?token=YOUR_AUTH_TOKEN
  • Method: POST
  • Headers: Content-Type: application/json
  • Body:
{
  "url": "{{url_from_trigger}}",
  "timeout": 30000,
  "humanBehavior": true
}

🔗 n8n Integration

HTTP Request Node:

{
  "url": "https://your-subdomain.yourdomain.com/api/html",
  "method": "POST",
  "authentication": "queryAuth",
  "query": {
    "token": "YOUR_AUTH_TOKEN"
  },
  "headers": {
    "Content-Type": "application/json"
  },
  "body": {
    "url": "={{$json.url}}",
    "timeout": 30000,
    "humanBehavior": true
  }
}

Available via n8n Community Node:

🐍 Python Example

import requests

def scrape_with_headlessx(url, token):
    response = requests.post(
        "https://your-subdomain.yourdomain.com/api/html",
        params={"token": token},
        json={
            "url": url,
            "timeout": 30000,
            "humanBehavior": True
        }
    )
    return response.json()

# Usage
result = scrape_with_headlessx("https://example.com", "YOUR_TOKEN")
print(result['html'])

🟨 JavaScript/Node.js Example

const axios = require('axios');

async function scrapeWithHeadlessX(url, token) {
  try {
    const response = await axios.post(
      `https://your-subdomain.yourdomain.com/api/html?token=${token}`,
      {
        url: url,
        timeout: 30000,
        humanBehavior: true
      }
    );
    return response.data;
  } catch (error) {
    console.error('Scraping failed:', error.message);
    throw error;
  }
}

// Usage
scrapeWithHeadlessX('https://example.com', 'YOUR_TOKEN')
  .then(result => console.log(result.html))
  .catch(error => console.error(error));

🔄 Batch Processing Example

curl -X POST "https://your-subdomain.yourdomain.com/api/batch?token=YOUR_AUTH_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "urls": [
      "https://example1.com",
      "https://example2.com",
      "https://example3.com"
    ],
    "timeout": 30000,
    "humanBehavior": true
  }'

Batch Processing

curl -X POST "https://your-subdomain.yourdomain.com/api/batch?token=YOUR_AUTH_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "urls": ["https://example.com", "https://httpbin.org"],
    "format": "text",
    "options": {"timeout": 30000}
  }'

📁 Project Structure

HeadlessX v1.2.0 - Modular Architecture/
├── 📂 src/                         # Modular application source
│   ├── 📂 config/                  # Configuration management
│   │   ├── index.js               # Main configuration loader
│   │   └── browser.js             # Browser-specific settings
│   ├── 📂 utils/                   # Utility functions
│   │   ├── errors.js              # Error handling & categorization
│   │   ├── logger.js              # Structured logging
│   │   └── helpers.js             # Common utilities
│   ├── 📂 services/                # Business logic services
│   │   ├── browser.js             # Browser lifecycle management
│   │   ├── stealth.js             # Anti-detection techniques
│   │   ├── interaction.js         # Human-like behavior
│   │   └── rendering.js           # Core rendering logic
│   ├── 📂 middleware/              # Express middleware
│   │   ├── auth.js                # Authentication
│   │   └── error.js               # Error handling
│   ├── 📂 controllers/             # Request handlers
│   │   ├── system.js              # Health & status endpoints
│   │   ├── rendering.js           # Main rendering endpoints
│   │   ├── batch.js               # Batch processing
│   │   └── get.js                 # GET endpoints & docs
│   ├── 📂 routes/                  # Route definitions
│   │   ├── api.js                 # API route mappings
│   │   └── static.js              # Static file serving
│   ├── app.js                     # Main application setup
│   ├── server.js                  # Entry point for PM2
│   └── rate-limiter.js            # Rate limiting implementation
├── 📂 website/                     # Next.js website (unchanged)
│   ├── app/                        # Next.js 13+ app directory
│   ├── components/                 # React components
│   ├── .env.example               # Website environment template
│   ├── next.config.js             # Next.js configuration
│   └── package.json               # Website dependencies
├── 📂 scripts/                     # Deployment & management scripts
│   ├── setup.sh                   # Automated installation (updated)
│   ├── update_server.sh           # Server update script (updated)
│   ├── verify-domain.sh           # Domain verification
│   └── test-routing.sh            # Integration testing
├── 📂 nginx/                       # Nginx configuration
│   └── headlessx.conf             # Nginx proxy config
├── 📂 docker/                      # Docker deployment (updated)
│   ├── Dockerfile                 # Container definition
│   └── docker-compose.yml         # Docker Compose setup
├── ecosystem.config.js            # PM2 configuration (moved to root)
├── .env.example                   # Environment template (updated)
├── package.json                   # Server dependencies (updated)
├── MODULAR_ARCHITECTURE.md        # Architecture documentation
└── README.md                      # This file

🛠️ Development

Local Development

# 1. Install dependencies
npm install

# 2. Build website
cd website
npm install
npm run build
cd ..

# 3. Set environment variables
export AUTH_TOKEN="development_token_123"
export DOMAIN="localhost"
export SUBDOMAIN="headlessx"

# 4. Start server
npm start  # Uses src/app.js

# 5. Access locally
# Website: http://localhost:3000
# API: http://localhost:3000/api/health

Testing Integration

# Test server and website integration
bash scripts/test-routing.sh localhost

# Test with environment variables
bash scripts/verify-domain.sh

⚙️ Configuration

🌐 Environment Variables (.env)

Create your .env file from the template:

cp .env.example .env
nano .env

Required configuration:

# Security Token (Generate a secure random string)
AUTH_TOKEN=your_secure_token_here

# Domain Configuration  
DOMAIN=yourdomain.com
SUBDOMAIN=headlessx

# Optional: Browser Settings
BROWSER_TIMEOUT=60000
MAX_CONCURRENT_BROWSERS=5

# Optional: Server Settings
PORT=3000
NODE_ENV=production

🌐 Nginx Domain Setup

Option 1: Automatic (Recommended)

# The setup script automatically replaces domain placeholders
sudo ./scripts/setup.sh

Option 2: Manual Configuration

# Copy nginx configuration
sudo cp nginx/headlessx.conf /etc/nginx/sites-available/headlessx

# Replace domain placeholders (replace with your actual domain)
sudo sed -i 's/SUBDOMAIN.DOMAIN.COM/headlessx.yourdomain.com/g' /etc/nginx/sites-available/headlessx

# Example: If your domain is "api.example.com"
sudo sed -i 's/SUBDOMAIN.DOMAIN.COM/api.example.com/g' /etc/nginx/sites-available/headlessx

# Enable site and reload nginx
sudo ln -sf /etc/nginx/sites-available/headlessx /etc/nginx/sites-enabled/
sudo nginx -t && sudo systemctl reload nginx

Your final URLs will be:

  • Website: https://your-subdomain.yourdomain.com
  • API Health: https://your-subdomain.yourdomain.com/api/health
  • API Endpoints: https://your-subdomain.yourdomain.com/api/*

📊 API Reference

🔧 Core Endpoints

Endpoint Method Description Auth Required
/api/health GET Health check
/api/status GET Server status
/api/render POST Full page rendering (JSON)
/api/html GET/POST Raw HTML extraction
/api/content GET/POST Clean text extraction
/api/screenshot GET Screenshot generation
/api/pdf GET PDF generation
/api/batch POST Batch URL processing

🔑 Authentication

All endpoints (except /api/health) require a token via:

  • Query parameter: ?token=YOUR_TOKEN
  • Header: X-Token: YOUR_TOKEN
  • Header: Authorization: Bearer YOUR_TOKEN

📖 Complete Documentation

Visit your HeadlessX website for full API documentation with examples, or check:


📊 Monitoring & Troubleshooting

🔍 Health Checks

curl https://your-subdomain.yourdomain.com/api/health
curl "https://your-subdomain.yourdomain.com/api/status?token=YOUR_TOKEN"

📋 Log Management

# PM2 logs
npm run pm2:logs
pm2 logs headlessx --lines 100

# Docker logs
docker-compose logs -f headlessx

# Nginx logs
sudo tail -f /var/log/nginx/access.log

🔄 Updates

git pull origin main
npm run build          # Rebuild website
npm run pm2:restart     # PM2
# OR
docker-compose restart  # Docker

🔧 Common Issues

"npm ci" Error (missing package-lock.json):

chmod +x scripts/generate-lockfiles.sh
./scripts/generate-lockfiles.sh  # Generate lock files
# OR
npm install --production  # Use install instead

"Cannot find module 'express'":

npm install  # Install dependencies

System dependency errors (Ubuntu):

sudo apt update && sudo apt install -y \
  libatk1.0-0t64 libatk-bridge2.0-0t64 libcups2t64 \
  libatspi2.0-0t64 libasound2t64 libxcomposite1

PM2 not starting:

sudo npm install -g pm2
chmod +x scripts/setup.sh  # Make script executable
pm2 start config/ecosystem.config.js
pm2 logs headlessx  # Check errors

Script permission errors:

# Make all scripts executable
chmod +x scripts/*.sh

# Or use the quick setup
chmod +x scripts/quick-setup.sh && ./scripts/quick-setup.sh

Playwright browser installation errors:

# Use dedicated Playwright setup script
chmod +x scripts/setup-playwright.sh
./scripts/setup-playwright.sh

# Or install manually:
sudo apt update && sudo apt install -y \
  libgtk-3-0t64 libpangocairo-1.0-0 libcairo-gobject2 \
  libgdk-pixbuf-2.0-0 libdrm2 libxss1 libxrandr2 \
  libasound2t64 libatk1.0-0t64 libnss3

# Install only Chromium (most stable)
npx playwright install chromium

# Alternative: Use Docker (avoids dependency issues)
docker-compose up -d

🔐 Security Features

  • Token Authentication: Secure API access with custom tokens
  • Rate Limiting: Nginx-level request throttling
  • Security Headers: XSS, CSRF, and clickjacking protection
  • Bot Protection: Common attack vector blocking
  • SSL/TLS: Automatic HTTPS with Let's Encrypt

🤝 Contributing

  1. Fork the repository
  2. Create your feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.


🆘 Support


🎯 Built by SaifyXPRO

HeadlessX v1.1.0 - The most advanced open-source browserless web scraping solution.

Made with ❤️ for the developer community.

About

A lightweight, self-hosted headless browser automation platform. Designed as an alternative to Browserless, built for speed, privacy, and scalability.

Resources

License

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • JavaScript 60.9%
  • TypeScript 20.0%
  • Shell 16.9%
  • CSS 1.6%
  • Dockerfile 0.6%