Open Source Browserless Web Scraping API with Human-like Behavior
🎯 Unified Solution: Website + API on a single domain
🧠 Human-like Behavior: 40+ anti-detection techniques
🚀 Deploy Anywhere: Docker, Node.js+PM2, or Development
- 🌐 Unified Architecture: Website and API on one domain
- 🧠 Human-like Intelligence: Natural mouse movements, smart scrolling, behavioral randomization
- 📊 Multiple Formats: HTML, text, screenshots, PDFs
- ⚡ Batch Processing: Handle multiple URLs efficiently
- 🔒 Production Ready: Docker, PM2, Nginx, SSL support
- 🛡️ Anti-Detection: 40+ stealth techniques for reliable scraping
# 1. Clone and configure
git clone https://github.com/SaifyXPRO/HeadlessX.git
cd HeadlessX
# Quick setup (makes scripts executable + creates .env)
chmod +x scripts/quick-setup.sh && ./scripts/quick-setup.sh
# Then edit: nano .env # Update DOMAIN, SUBDOMAIN, and AUTH_TOKENChoose your deployment:
| Method | Command | Best For |
|---|---|---|
| 🐳 Docker | docker-compose up -d |
Production, easy deployment |
| 🔧 Auto Setup | chmod +x scripts/setup.sh && sudo ./scripts/setup.sh |
VPS/Server with full control |
| 💻 Development | npm install && npm start |
Local development, testing |
Access your HeadlessX:
🌐 Website: https://your-subdomain.yourdomain.com
🔧 Health: https://your-subdomain.yourdomain.com/api/health
📊 Status: https://your-subdomain.yourdomain.com/api/status?token=YOUR_AUTH_TOKEN
HeadlessX v1.2.0 introduces a completely refactored modular architecture for better maintainability, scalability, and development experience.
- 🔧 Separation of Concerns: Distinct modules for configuration, services, controllers, and middleware
- 🚀 Better Performance: Optimized browser management and resource usage
- 🛠️ Developer Experience: Clear module boundaries and dependency injection
- 📦 Production Ready: Enhanced error handling and logging with correlation IDs
- 🔒 Security: Improved authentication and rate limiting
- 📊 Monitoring: Structured logging and health monitoring
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ Routes │───▶│ Controllers │───▶│ Services │
│ (api.js) │ │ (rendering.js)│ │ (browser.js) │
└─────────────────┘ └─────────────────┘ └─────────────────┘
│ │ │
▼ ▼ ▼
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ Middleware │ │ Utils │ │ Config │
│ (auth.js) │ │ (logger.js) │ │ (index.js) │
└─────────────────┘ └─────────────────┘ └─────────────────┘
Quick Migration from v1.1.0:
- The original
src/server.js(3079 lines) has been broken down into 20+ focused modules - Environment variable
TOKENis nowAUTH_TOKEN - PM2 config moved from
config/ecosystem.config.jstoecosystem.config.js - All functionality preserved with improved performance and maintainability
📖 Detailed Documentation: MODULAR_ARCHITECTURE.md
# Install Docker (if needed)
curl -fsSL https://get.docker.com | sh
sudo usermod -aG docker $USER
# Deploy HeadlessX
git clone https://github.com/SaifyXPRO/HeadlessX.git
cd HeadlessX
cp .env.example .env
nano .env # Configure DOMAIN, SUBDOMAIN, AUTH_TOKEN
# Start services
docker-compose up -d
# Optional: Setup SSL
sudo apt install certbot
sudo certbot --standalone -d your-subdomain.yourdomain.comDocker Management:
docker-compose ps # Check status
docker-compose logs headlessx # View logs
docker-compose restart # Restart services
docker-compose down # Stop services# Automated setup (recommended)
git clone https://github.com/SaifyXPRO/HeadlessX.git
cd HeadlessX
cp .env.example .env
nano .env # Configure environment
chmod +x scripts/setup.sh
sudo ./scripts/setup.sh # Installs dependencies, builds website, starts PM2🌐 Nginx Configuration (Auto-handled by setup script):
The setup script automatically configures nginx, but if you need to manually configure:
# Copy and configure nginx site
sudo cp nginx/headlessx.conf /etc/nginx/sites-available/headlessx
# Replace placeholders with your actual domain
sudo sed -i 's/SUBDOMAIN.DOMAIN.COM/your-subdomain.yourdomain.com/g' /etc/nginx/sites-available/headlessx
# Enable the site
sudo ln -sf /etc/nginx/sites-available/headlessx /etc/nginx/sites-enabled/
sudo rm -f /etc/nginx/sites-enabled/default
# Test and reload nginx
sudo nginx -t && sudo systemctl reload nginxManual setup (if not using setup script):
sudo apt update && sudo apt upgrade -y
curl -fsSL https://deb.nodesource.com/setup_20.x | sudo -E bash -
sudo apt install -y nodejs build-essential
npm install && npm run build
sudo npm install -g pm2
npm run pm2:startPM2 Management:
npm run pm2:status # Check status
npm run pm2:logs # View logs
npm run pm2:restart # Restart server
npm run pm2:stop # Stop servergit clone https://github.com/SaifyXPRO/HeadlessX.git
cd HeadlessX
cp .env.example .env
nano .env # Set AUTH_TOKEN, DOMAIN=localhost, SUBDOMAIN=headlessx
# Make scripts executable
chmod +x scripts/*.sh
# Install dependencies
npm install
cd website && npm install && npm run build && cd ..
# Start development server
npm start # Access at http://localhost:3000HeadlessX Routes:
├── /favicon.ico → Favicon
├── /robots.txt → SEO robots file
├── /api/health → Health check (no auth required)
├── /api/status → Server status (requires token)
├── /api/render → Full page rendering
├── /api/html → HTML extraction
├── /api/content → Clean text extraction
├── /api/screenshot → Screenshot generation
├── /api/pdf → PDF generation
└── /api/batch → Batch URL processing
🔄 Request Flow:
- Nginx receives request on port 80/443
- Proxies to Node.js server on port 3000
- Server routes based on path:
/api/*→ API endpoints/*→ Website files (built Next.js app)
curl https://your-subdomain.yourdomain.com/api/healthcurl -X POST "https://your-subdomain.yourdomain.com/api/html?token=YOUR_AUTH_TOKEN" \
-H "Content-Type: application/json" \
-d '{"url": "https://example.com", "timeout": 30000}'curl "https://your-subdomain.yourdomain.com/api/screenshot?token=YOUR_AUTH_TOKEN&url=https://example.com&fullPage=true" \
-o screenshot.pngcurl -X POST "https://your-subdomain.yourdomain.com/api/text?token=YOUR_AUTH_TOKEN" \
-H "Content-Type: application/json" \
-d '{"url": "https://example.com", "waitForSelector": "main"}'curl -X POST "https://your-subdomain.yourdomain.com/api/pdf?token=YOUR_AUTH_TOKEN" \
-H "Content-Type: application/json" \
-d '{"url": "https://example.com", "format": "A4"}' \
-o document.pdfHTTP Request Module Configuration:
{
"url": "https://your-subdomain.yourdomain.com/api/html",
"method": "POST",
"headers": {
"Content-Type": "application/json"
},
"qs": {
"token": "YOUR_AUTH_TOKEN"
},
"body": {
"url": "{{url_to_scrape}}",
"timeout": 30000,
"waitForSelector": "{{optional_selector}}"
}
}Webhooks by Zapier Setup:
- URL:
https://your-subdomain.yourdomain.com/api/html?token=YOUR_AUTH_TOKEN - Method: POST
- Headers:
Content-Type: application/json - Body:
{
"url": "{{url_from_trigger}}",
"timeout": 30000,
"humanBehavior": true
}HTTP Request Node:
{
"url": "https://your-subdomain.yourdomain.com/api/html",
"method": "POST",
"authentication": "queryAuth",
"query": {
"token": "YOUR_AUTH_TOKEN"
},
"headers": {
"Content-Type": "application/json"
},
"body": {
"url": "={{$json.url}}",
"timeout": 30000,
"humanBehavior": true
}
}Available via n8n Community Node:
- Install:
npm install n8n-nodes-headlessx - GitHub Repository
import requests
def scrape_with_headlessx(url, token):
response = requests.post(
"https://your-subdomain.yourdomain.com/api/html",
params={"token": token},
json={
"url": url,
"timeout": 30000,
"humanBehavior": True
}
)
return response.json()
# Usage
result = scrape_with_headlessx("https://example.com", "YOUR_TOKEN")
print(result['html'])const axios = require('axios');
async function scrapeWithHeadlessX(url, token) {
try {
const response = await axios.post(
`https://your-subdomain.yourdomain.com/api/html?token=${token}`,
{
url: url,
timeout: 30000,
humanBehavior: true
}
);
return response.data;
} catch (error) {
console.error('Scraping failed:', error.message);
throw error;
}
}
// Usage
scrapeWithHeadlessX('https://example.com', 'YOUR_TOKEN')
.then(result => console.log(result.html))
.catch(error => console.error(error));curl -X POST "https://your-subdomain.yourdomain.com/api/batch?token=YOUR_AUTH_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"urls": [
"https://example1.com",
"https://example2.com",
"https://example3.com"
],
"timeout": 30000,
"humanBehavior": true
}'curl -X POST "https://your-subdomain.yourdomain.com/api/batch?token=YOUR_AUTH_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"urls": ["https://example.com", "https://httpbin.org"],
"format": "text",
"options": {"timeout": 30000}
}'HeadlessX v1.2.0 - Modular Architecture/
├── 📂 src/ # Modular application source
│ ├── 📂 config/ # Configuration management
│ │ ├── index.js # Main configuration loader
│ │ └── browser.js # Browser-specific settings
│ ├── 📂 utils/ # Utility functions
│ │ ├── errors.js # Error handling & categorization
│ │ ├── logger.js # Structured logging
│ │ └── helpers.js # Common utilities
│ ├── 📂 services/ # Business logic services
│ │ ├── browser.js # Browser lifecycle management
│ │ ├── stealth.js # Anti-detection techniques
│ │ ├── interaction.js # Human-like behavior
│ │ └── rendering.js # Core rendering logic
│ ├── 📂 middleware/ # Express middleware
│ │ ├── auth.js # Authentication
│ │ └── error.js # Error handling
│ ├── 📂 controllers/ # Request handlers
│ │ ├── system.js # Health & status endpoints
│ │ ├── rendering.js # Main rendering endpoints
│ │ ├── batch.js # Batch processing
│ │ └── get.js # GET endpoints & docs
│ ├── 📂 routes/ # Route definitions
│ │ ├── api.js # API route mappings
│ │ └── static.js # Static file serving
│ ├── app.js # Main application setup
│ ├── server.js # Entry point for PM2
│ └── rate-limiter.js # Rate limiting implementation
├── 📂 website/ # Next.js website (unchanged)
│ ├── app/ # Next.js 13+ app directory
│ ├── components/ # React components
│ ├── .env.example # Website environment template
│ ├── next.config.js # Next.js configuration
│ └── package.json # Website dependencies
├── 📂 scripts/ # Deployment & management scripts
│ ├── setup.sh # Automated installation (updated)
│ ├── update_server.sh # Server update script (updated)
│ ├── verify-domain.sh # Domain verification
│ └── test-routing.sh # Integration testing
├── 📂 nginx/ # Nginx configuration
│ └── headlessx.conf # Nginx proxy config
├── 📂 docker/ # Docker deployment (updated)
│ ├── Dockerfile # Container definition
│ └── docker-compose.yml # Docker Compose setup
├── ecosystem.config.js # PM2 configuration (moved to root)
├── .env.example # Environment template (updated)
├── package.json # Server dependencies (updated)
├── MODULAR_ARCHITECTURE.md # Architecture documentation
└── README.md # This file
# 1. Install dependencies
npm install
# 2. Build website
cd website
npm install
npm run build
cd ..
# 3. Set environment variables
export AUTH_TOKEN="development_token_123"
export DOMAIN="localhost"
export SUBDOMAIN="headlessx"
# 4. Start server
npm start # Uses src/app.js
# 5. Access locally
# Website: http://localhost:3000
# API: http://localhost:3000/api/health# Test server and website integration
bash scripts/test-routing.sh localhost
# Test with environment variables
bash scripts/verify-domain.shCreate your .env file from the template:
cp .env.example .env
nano .envRequired configuration:
# Security Token (Generate a secure random string)
AUTH_TOKEN=your_secure_token_here
# Domain Configuration
DOMAIN=yourdomain.com
SUBDOMAIN=headlessx
# Optional: Browser Settings
BROWSER_TIMEOUT=60000
MAX_CONCURRENT_BROWSERS=5
# Optional: Server Settings
PORT=3000
NODE_ENV=productionOption 1: Automatic (Recommended)
# The setup script automatically replaces domain placeholders
sudo ./scripts/setup.shOption 2: Manual Configuration
# Copy nginx configuration
sudo cp nginx/headlessx.conf /etc/nginx/sites-available/headlessx
# Replace domain placeholders (replace with your actual domain)
sudo sed -i 's/SUBDOMAIN.DOMAIN.COM/headlessx.yourdomain.com/g' /etc/nginx/sites-available/headlessx
# Example: If your domain is "api.example.com"
sudo sed -i 's/SUBDOMAIN.DOMAIN.COM/api.example.com/g' /etc/nginx/sites-available/headlessx
# Enable site and reload nginx
sudo ln -sf /etc/nginx/sites-available/headlessx /etc/nginx/sites-enabled/
sudo nginx -t && sudo systemctl reload nginxYour final URLs will be:
- Website:
https://your-subdomain.yourdomain.com - API Health:
https://your-subdomain.yourdomain.com/api/health - API Endpoints:
https://your-subdomain.yourdomain.com/api/*
| Endpoint | Method | Description | Auth Required |
|---|---|---|---|
/api/health |
GET | Health check | ❌ |
/api/status |
GET | Server status | ✅ |
/api/render |
POST | Full page rendering (JSON) | ✅ |
/api/html |
GET/POST | Raw HTML extraction | ✅ |
/api/content |
GET/POST | Clean text extraction | ✅ |
/api/screenshot |
GET | Screenshot generation | ✅ |
/api/pdf |
GET | PDF generation | ✅ |
/api/batch |
POST | Batch URL processing | ✅ |
All endpoints (except /api/health) require a token via:
- Query parameter:
?token=YOUR_TOKEN - Header:
X-Token: YOUR_TOKEN - Header:
Authorization: Bearer YOUR_TOKEN
Visit your HeadlessX website for full API documentation with examples, or check:
curl https://your-subdomain.yourdomain.com/api/health
curl "https://your-subdomain.yourdomain.com/api/status?token=YOUR_TOKEN"# PM2 logs
npm run pm2:logs
pm2 logs headlessx --lines 100
# Docker logs
docker-compose logs -f headlessx
# Nginx logs
sudo tail -f /var/log/nginx/access.loggit pull origin main
npm run build # Rebuild website
npm run pm2:restart # PM2
# OR
docker-compose restart # Docker"npm ci" Error (missing package-lock.json):
chmod +x scripts/generate-lockfiles.sh
./scripts/generate-lockfiles.sh # Generate lock files
# OR
npm install --production # Use install instead"Cannot find module 'express'":
npm install # Install dependenciesSystem dependency errors (Ubuntu):
sudo apt update && sudo apt install -y \
libatk1.0-0t64 libatk-bridge2.0-0t64 libcups2t64 \
libatspi2.0-0t64 libasound2t64 libxcomposite1PM2 not starting:
sudo npm install -g pm2
chmod +x scripts/setup.sh # Make script executable
pm2 start config/ecosystem.config.js
pm2 logs headlessx # Check errorsScript permission errors:
# Make all scripts executable
chmod +x scripts/*.sh
# Or use the quick setup
chmod +x scripts/quick-setup.sh && ./scripts/quick-setup.shPlaywright browser installation errors:
# Use dedicated Playwright setup script
chmod +x scripts/setup-playwright.sh
./scripts/setup-playwright.sh
# Or install manually:
sudo apt update && sudo apt install -y \
libgtk-3-0t64 libpangocairo-1.0-0 libcairo-gobject2 \
libgdk-pixbuf-2.0-0 libdrm2 libxss1 libxrandr2 \
libasound2t64 libatk1.0-0t64 libnss3
# Install only Chromium (most stable)
npx playwright install chromium
# Alternative: Use Docker (avoids dependency issues)
docker-compose up -d- Token Authentication: Secure API access with custom tokens
- Rate Limiting: Nginx-level request throttling
- Security Headers: XSS, CSRF, and clickjacking protection
- Bot Protection: Common attack vector blocking
- SSL/TLS: Automatic HTTPS with Let's Encrypt
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- 📖 Documentation: Visit your deployed website for full API docs
- 🐛 Issues: GitHub Issues
- 💬 Discussions: GitHub Discussions
HeadlessX v1.1.0 - The most advanced open-source browserless web scraping solution.
Made with ❤️ for the developer community.