Cloudflare Workers URL Scraper

A Cloudflare Workers monorepo that validates URLs, enqueues jobs, checks liveness, and stores results in D1 database.

🏗️ Architecture

This project consists of three main services:

API Service (api-service/) - HTTP Worker that handles job submission and status checking
Consumer Service (workers/) - Queue consumer that performs URL liveness checks
Frontend Service (frontend-service/) - Cloudflare Worker that serves the web interface

🚀 Services

API Service

POST /jobs - Submit a URL for processing
GET /status/:id - Check job status
GET /result/:id - Get processing results
CORS Support - Proper cross-origin request handling

Consumer Service

Processes jobs from the queue
Performs HEAD→GET liveness checks
Updates job status and stores results
Handles errors gracefully with retry logic

Frontend Service

GET / - Submit URL form
GET /status.html - Job status page
GET /result.html - Results page
No CORS Issues - Serves HTML directly from the same domain

📁 Project Structure

├── api-service/
│   ├── src/
│   │   └── api.ts          # API routes and D1 access
│   └── tsconfig.json
├── workers/
│   ├── src/
│   │   └── consumer.ts     # Queue handler and liveness check logic
│   └── tsconfig.json
├── frontend-service/
│   ├── src/
│   │   └── frontend.ts     # Frontend service with HTML templates
│   └── tsconfig.json
├── migrations/
│   └── 001_init.sql        # D1 database schema
├── wrangler.api.jsonc      # API service configuration
├── wrangler.consumer.jsonc # Consumer service configuration
├── wrangler.frontend.jsonc # Frontend service configuration
└── package.json

🛠️ Setup

Prerequisites

Node.js 20.18.1+
Cloudflare account
Wrangler CLI

Step-by-Step Installation

1. Clone and Install Dependencies

git clone https://github.com/SectemTechnologies/CloudflareServicesTestScrapExample.git
cd CloudflareServicesTestScrapExample
npm install

2. Create D1 Database

npx wrangler d1 create scraper_test_database

Expected Output:

✅ Successfully created DB 'scraper_test' in region UNKNOWN
Created your new D1 database.

To access your new D1 Database in your Worker, add the following snippet to your configuration file:
{
  "d1_databases": [
    {
      "binding": "scraper_test_database",
      "database_name": "scraper_test_database",
      "database_id": "3e41f9a7-133b-4ac6-b84e-0931397acf96"
    }
  ]
}

3. Update Database ID in Configuration Files

⚠️ IMPORTANT: Copy the database_id from the output above and update it in these files:

Update wrangler.api.jsonc:

{
  "d1_databases": [
    {
      "binding": "DB",
      "database_name": "scraper_test_database",
      "database_id": "YOUR_DATABASE_ID_HERE"  // ← Replace this
    }
  ]
}

Update wrangler.consumer.jsonc:

{
  "d1_databases": [
    {
      "binding": "DB", 
      "database_name": "scraper_test_database",
      "database_id": "YOUR_DATABASE_ID_HERE"  // ← Replace this
    }
  ]
}

4. Apply Database Migrations

# Apply to local database (for development)
npx wrangler d1 migrations apply scraper_test_database --config wrangler.api.jsonc

# Apply to remote database (for production)
npx wrangler d1 migrations apply scraper_test_database --config wrangler.api.jsonc --remote

5. Create Queue

npx wrangler queues create scraper-test-jobs-queue

Expected Output:

✅ Successfully created queue 'scraper-test-jobs-queue'

6. Verify Setup

# List databases
npx wrangler d1 list

# List queues  
npx wrangler queues list

# Check database tables
npx wrangler d1 execute scraper_test_database --command "SELECT name FROM sqlite_master WHERE type='table';" --config wrangler.api.jsonc

�� Common Setup Issues

Database ID Not Updated

Error: Database not found or Invalid database ID
Solution: Make sure you copied the correct database_id from step 2 and updated both config files

Queue Already Exists

Error: Queue already exists

Solution: Use a different queue name or delete the existing one:

npx wrangler queues delete scraper-test-jobs-queue
npx wrangler queues create scraper-test-jobs-queue

Migration Failed

Error: Migration failed

Solution: Check your database ID is correct and try again:

npx wrangler d1 migrations apply scraper_test --config wrangler.api.jsonc --remote

🏗️ Build Services

# Build individual services
npm run build:api
npm run build:consumer
npm run build:frontend

# Build all services
npm run build:all

🚀 Deployment

Deploy Individual Services

# Deploy API service (scraper_test_api)
npm run deploy:api

# Deploy consumer service (scraper_test_consumer)
npm run deploy:consumer

# Deploy frontend service (scraper_test_frontend)
npm run deploy:frontend

Deploy All Services

npm run deploy:all

Deployment Order

API Service (scraper_test_api) - Core functionality
Consumer Service (scraper_test_consumer) - Queue processing
Frontend Service (scraper_test_frontend) - User interface

📊 Database Schema

Jobs Table

CREATE TABLE jobs (
  id TEXT PRIMARY KEY,
  url TEXT NOT NULL,
  status TEXT NOT NULL DEFAULT 'queued',
  created_at TEXT NOT NULL DEFAULT (datetime('now')),
  updated_at TEXT NOT NULL DEFAULT (datetime('now'))
);

Results Table

CREATE TABLE results (
  job_id TEXT PRIMARY KEY,
  url TEXT NOT NULL,
  is_live INTEGER NOT NULL,
  http_status INTEGER,
  error TEXT,
  checked_at TEXT NOT NULL DEFAULT (datetime('now'))
);

🔌 API Endpoints

POST /jobs

Submit a URL for processing.

Request:

{
  "url": "https://example.com"
}

Response:

{
  "id": "uuid-string"
}

GET /status/:id

Check job processing status.

Response:

{
  "id": "uuid-string",
  "url": "https://example.com",
  "status": "queued|processing|done|failed",
  "updated_at": "2025-01-15T10:30:00Z"
}

GET /result/:id

Get processing results.

Response:

{
  "id": "uuid-string",
  "url": "https://example.com",
  "is_live": 1,
  "http_status": 200,
  "error": null,
  "checked_at": "2025-01-15T10:30:00Z"
}

🔧 Configuration

Environment Variables

CORS_ORIGIN - CORS origin for API requests (default: "*")
API_BASE_URL - API service URL for frontend service

Queue Configuration

Queue Name: scraper-test-jobs-queue
Max Batch Size: 10 messages
Producer Binding: JOBS
Consumer Binding: Auto-configured

🧪 Testing

Manual Testing

Frontend Service: Visit your deployed frontend URL
Submit a URL for processing
Check status using the status page
View results using the results page

API Testing

# Test job submission
curl -X POST https://your-api.workers.dev/jobs \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com"}'

# Test status check
curl https://your-api.workers.dev/status/JOB_ID

# Test result retrieval
curl https://your-api.workers.dev/result/JOB_ID

Database Queries

# Query jobs table
npx wrangler d1 execute scraper_test_database --command "SELECT * FROM jobs" --config wrangler.api.jsonc

# Query results table
npx wrangler d1 execute scraper_test_database --command "SELECT * FROM results" --config wrangler.api.jsonc

# Check job status distribution
npx wrangler d1 execute scraper_test_database --command "SELECT status, COUNT(*) FROM jobs GROUP BY status" --config wrangler.api.jsonc

📝 Development Notes

TypeScript: Strict mode enabled
Web APIs Only: No Node.js APIs (uses fetch, crypto, etc.)
Parameterized SQL: All queries use prepared statements
Error Handling: Graceful error handling with proper HTTP status codes
Idempotency: Results are upserted by job_id
CORS Support: Proper cross-origin request handling

🚨 Troubleshooting

Common Issues

Database ID Not Found

Error: Database not found or Invalid database ID

Solution:

# Check your database ID
npx wrangler d1 list

# Update config files with correct ID
# Then redeploy services
npm run deploy:all

Queue Not Found

Error: Queue not found

Solution:

# Create the queue
npx wrangler queues create scraper-test-jobs-queue

# Or check existing queues
npx wrangler queues list

CORS Errors

Error: Access to fetch blocked by CORS policy

Solution:

# Redeploy API service with latest CORS fixes
npm run build:api
npm run deploy:api

Migration Issues

Error: Migration failed

Solution:

# Apply migrations to remote database
npx wrangler d1 migrations apply scraper_test_database --config wrangler.api.jsonc --remote

# Check migration status
npx wrangler d1 migrations list scraper_test_database --config wrangler.api.jsonc

Build Errors

Error: TypeScript compilation errors

Solution:

# Check TypeScript configuration
npm run build:api
npm run build:consumer
npm run build:frontend

# Install missing dependencies
npm install

Deployment Failures

Error: Authentication or permission errors

Solution:

# Check authentication
npx wrangler whoami

# Re-authenticate if needed
npx wrangler login

Service Cleanup

Remove All Services

# Delete queue first (removes consumer binding)
npx wrangler queues delete scraper-test-jobs-queue

# Delete services
npx wrangler delete scraper_test_api --config wrangler.api.jsonc
npx wrangler delete scraper_test_consumer --config wrangler.consumer.jsonc
npx wrangler delete scraper_test_frontend --config wrangler.frontend.jsonc

# Delete database (optional - removes all data)
npx wrangler d1 delete scraper_test_database

Verify Cleanup

# List remaining services
npx wrangler list

# List remaining databases
npx wrangler d1 list

# List remaining queues
npx wrangler queues list

🌐 Live URLs

After deployment, your services will be available at:

Frontend: https://scraper_test_frontend.your-subdomain.workers.dev
API: https://scraper_test_api.your-subdomain.workers.dev
Consumer: https://scraper_test_consumer.your-subdomain.workers.dev

📄 License

This project is part of a Cloudflare Workers demonstration.

🤝 Contributing

Fork the repository
Create a feature branch
Make your changes
Test thoroughly
Submit a pull request

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
api-service		api-service
frontend-service		frontend-service
migrations		migrations
workers		workers
.gitignore		.gitignore
README.md		README.md
package-lock.json		package-lock.json
package.json		package.json
wrangler.api.jsonc		wrangler.api.jsonc
wrangler.consumer.jsonc		wrangler.consumer.jsonc
wrangler.frontend.jsonc		wrangler.frontend.jsonc

SectemTechnologies/CloudflareServicesTestScrapExample

Folders and files

Latest commit

History

Repository files navigation

Cloudflare Workers URL Scraper

🏗️ Architecture

🚀 Services

API Service

Consumer Service

Frontend Service

📁 Project Structure

🛠️ Setup

Prerequisites

Step-by-Step Installation

1. Clone and Install Dependencies

2. Create D1 Database

3. Update Database ID in Configuration Files

4. Apply Database Migrations

5. Create Queue

6. Verify Setup

�� Common Setup Issues

Database ID Not Updated

Queue Already Exists

Migration Failed

🏗️ Build Services

🚀 Deployment

Deploy Individual Services

Deploy All Services

Deployment Order

📊 Database Schema

Jobs Table

Results Table

🔌 API Endpoints

POST /jobs

GET /status/:id

GET /result/:id

🔧 Configuration

Environment Variables

Queue Configuration

🧪 Testing

Manual Testing

API Testing

Database Queries

📝 Development Notes

🚨 Troubleshooting

Common Issues

Database ID Not Found

Queue Not Found

CORS Errors

Migration Issues

Build Errors

Deployment Failures

Service Cleanup

Remove All Services

Verify Cleanup

🌐 Live URLs

📄 License

🤝 Contributing

📚 Additional Resources

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages