Skip to content

SectemTechnologies/CloudflareServicesTestScrapExample

Repository files navigation

Cloudflare Workers URL Scraper

A Cloudflare Workers monorepo that validates URLs, enqueues jobs, checks liveness, and stores results in D1 database.

πŸ—οΈ Architecture

This project consists of three main services:

  • API Service (api-service/) - HTTP Worker that handles job submission and status checking
  • Consumer Service (workers/) - Queue consumer that performs URL liveness checks
  • Frontend Service (frontend-service/) - Cloudflare Worker that serves the web interface

πŸš€ Services

API Service

  • POST /jobs - Submit a URL for processing
  • GET /status/:id - Check job status
  • GET /result/:id - Get processing results
  • CORS Support - Proper cross-origin request handling

Consumer Service

  • Processes jobs from the queue
  • Performs HEADβ†’GET liveness checks
  • Updates job status and stores results
  • Handles errors gracefully with retry logic

Frontend Service

  • GET / - Submit URL form
  • GET /status.html - Job status page
  • GET /result.html - Results page
  • No CORS Issues - Serves HTML directly from the same domain

πŸ“ Project Structure

β”œβ”€β”€ api-service/
β”‚   β”œβ”€β”€ src/
β”‚   β”‚   └── api.ts          # API routes and D1 access
β”‚   └── tsconfig.json
β”œβ”€β”€ workers/
β”‚   β”œβ”€β”€ src/
β”‚   β”‚   └── consumer.ts     # Queue handler and liveness check logic
β”‚   └── tsconfig.json
β”œβ”€β”€ frontend-service/
β”‚   β”œβ”€β”€ src/
β”‚   β”‚   └── frontend.ts     # Frontend service with HTML templates
β”‚   └── tsconfig.json
β”œβ”€β”€ migrations/
β”‚   └── 001_init.sql        # D1 database schema
β”œβ”€β”€ wrangler.api.jsonc      # API service configuration
β”œβ”€β”€ wrangler.consumer.jsonc # Consumer service configuration
β”œβ”€β”€ wrangler.frontend.jsonc # Frontend service configuration
└── package.json

πŸ› οΈ Setup

Prerequisites

  • Node.js 20.18.1+
  • Cloudflare account
  • Wrangler CLI

Step-by-Step Installation

1. Clone and Install Dependencies

git clone https://github.com/SectemTechnologies/CloudflareServicesTestScrapExample.git
cd CloudflareServicesTestScrapExample
npm install

2. Create D1 Database

npx wrangler d1 create scraper_test_database

Expected Output:

βœ… Successfully created DB 'scraper_test' in region UNKNOWN
Created your new D1 database.

To access your new D1 Database in your Worker, add the following snippet to your configuration file:
{
  "d1_databases": [
    {
      "binding": "scraper_test_database",
      "database_name": "scraper_test_database",
      "database_id": "3e41f9a7-133b-4ac6-b84e-0931397acf96"
    }
  ]
}

3. Update Database ID in Configuration Files

⚠️ IMPORTANT: Copy the database_id from the output above and update it in these files:

Update wrangler.api.jsonc:

{
  "d1_databases": [
    {
      "binding": "DB",
      "database_name": "scraper_test_database",
      "database_id": "YOUR_DATABASE_ID_HERE"  // ← Replace this
    }
  ]
}

Update wrangler.consumer.jsonc:

{
  "d1_databases": [
    {
      "binding": "DB", 
      "database_name": "scraper_test_database",
      "database_id": "YOUR_DATABASE_ID_HERE"  // ← Replace this
    }
  ]
}

4. Apply Database Migrations

# Apply to local database (for development)
npx wrangler d1 migrations apply scraper_test_database --config wrangler.api.jsonc

# Apply to remote database (for production)
npx wrangler d1 migrations apply scraper_test_database --config wrangler.api.jsonc --remote

5. Create Queue

npx wrangler queues create scraper-test-jobs-queue

Expected Output:

βœ… Successfully created queue 'scraper-test-jobs-queue'

6. Verify Setup

# List databases
npx wrangler d1 list

# List queues  
npx wrangler queues list

# Check database tables
npx wrangler d1 execute scraper_test_database --command "SELECT name FROM sqlite_master WHERE type='table';" --config wrangler.api.jsonc

οΏ½οΏ½ Common Setup Issues

Database ID Not Updated

  • Error: Database not found or Invalid database ID
  • Solution: Make sure you copied the correct database_id from step 2 and updated both config files

Queue Already Exists

  • Error: Queue already exists
  • Solution: Use a different queue name or delete the existing one:
    npx wrangler queues delete scraper-test-jobs-queue
    npx wrangler queues create scraper-test-jobs-queue

Migration Failed

  • Error: Migration failed
  • Solution: Check your database ID is correct and try again:
    npx wrangler d1 migrations apply scraper_test --config wrangler.api.jsonc --remote

πŸ—οΈ Build Services

# Build individual services
npm run build:api
npm run build:consumer
npm run build:frontend

# Build all services
npm run build:all

πŸš€ Deployment

Deploy Individual Services

# Deploy API service (scraper_test_api)
npm run deploy:api

# Deploy consumer service (scraper_test_consumer)
npm run deploy:consumer

# Deploy frontend service (scraper_test_frontend)
npm run deploy:frontend

Deploy All Services

npm run deploy:all

Deployment Order

  1. API Service (scraper_test_api) - Core functionality
  2. Consumer Service (scraper_test_consumer) - Queue processing
  3. Frontend Service (scraper_test_frontend) - User interface

πŸ“Š Database Schema

Jobs Table

CREATE TABLE jobs (
  id TEXT PRIMARY KEY,
  url TEXT NOT NULL,
  status TEXT NOT NULL DEFAULT 'queued',
  created_at TEXT NOT NULL DEFAULT (datetime('now')),
  updated_at TEXT NOT NULL DEFAULT (datetime('now'))
);

Results Table

CREATE TABLE results (
  job_id TEXT PRIMARY KEY,
  url TEXT NOT NULL,
  is_live INTEGER NOT NULL,
  http_status INTEGER,
  error TEXT,
  checked_at TEXT NOT NULL DEFAULT (datetime('now'))
);

πŸ”Œ API Endpoints

POST /jobs

Submit a URL for processing.

Request:

{
  "url": "https://example.com"
}

Response:

{
  "id": "uuid-string"
}

GET /status/:id

Check job processing status.

Response:

{
  "id": "uuid-string",
  "url": "https://example.com",
  "status": "queued|processing|done|failed",
  "updated_at": "2025-01-15T10:30:00Z"
}

GET /result/:id

Get processing results.

Response:

{
  "id": "uuid-string",
  "url": "https://example.com",
  "is_live": 1,
  "http_status": 200,
  "error": null,
  "checked_at": "2025-01-15T10:30:00Z"
}

πŸ”§ Configuration

Environment Variables

  • CORS_ORIGIN - CORS origin for API requests (default: "*")
  • API_BASE_URL - API service URL for frontend service

Queue Configuration

  • Queue Name: scraper-test-jobs-queue
  • Max Batch Size: 10 messages
  • Producer Binding: JOBS
  • Consumer Binding: Auto-configured

πŸ§ͺ Testing

Manual Testing

  1. Frontend Service: Visit your deployed frontend URL
  2. Submit a URL for processing
  3. Check status using the status page
  4. View results using the results page

API Testing

# Test job submission
curl -X POST https://your-api.workers.dev/jobs \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com"}'

# Test status check
curl https://your-api.workers.dev/status/JOB_ID

# Test result retrieval
curl https://your-api.workers.dev/result/JOB_ID

Database Queries

# Query jobs table
npx wrangler d1 execute scraper_test_database --command "SELECT * FROM jobs" --config wrangler.api.jsonc

# Query results table
npx wrangler d1 execute scraper_test_database --command "SELECT * FROM results" --config wrangler.api.jsonc

# Check job status distribution
npx wrangler d1 execute scraper_test_database --command "SELECT status, COUNT(*) FROM jobs GROUP BY status" --config wrangler.api.jsonc

πŸ“ Development Notes

  • TypeScript: Strict mode enabled
  • Web APIs Only: No Node.js APIs (uses fetch, crypto, etc.)
  • Parameterized SQL: All queries use prepared statements
  • Error Handling: Graceful error handling with proper HTTP status codes
  • Idempotency: Results are upserted by job_id
  • CORS Support: Proper cross-origin request handling

🚨 Troubleshooting

Common Issues

Database ID Not Found

  • Error: Database not found or Invalid database ID
  • Solution:
    # Check your database ID
    npx wrangler d1 list
    
    # Update config files with correct ID
    # Then redeploy services
    npm run deploy:all

Queue Not Found

  • Error: Queue not found
  • Solution:
    # Create the queue
    npx wrangler queues create scraper-test-jobs-queue
    
    # Or check existing queues
    npx wrangler queues list

CORS Errors

  • Error: Access to fetch blocked by CORS policy
  • Solution:
    # Redeploy API service with latest CORS fixes
    npm run build:api
    npm run deploy:api

Migration Issues

  • Error: Migration failed
  • Solution:
    # Apply migrations to remote database
    npx wrangler d1 migrations apply scraper_test_database --config wrangler.api.jsonc --remote
    
    # Check migration status
    npx wrangler d1 migrations list scraper_test_database --config wrangler.api.jsonc

Build Errors

  • Error: TypeScript compilation errors
  • Solution:
    # Check TypeScript configuration
    npm run build:api
    npm run build:consumer
    npm run build:frontend
    
    # Install missing dependencies
    npm install

Deployment Failures

  • Error: Authentication or permission errors
  • Solution:
    # Check authentication
    npx wrangler whoami
    
    # Re-authenticate if needed
    npx wrangler login

Service Cleanup

Remove All Services

# Delete queue first (removes consumer binding)
npx wrangler queues delete scraper-test-jobs-queue

# Delete services
npx wrangler delete scraper_test_api --config wrangler.api.jsonc
npx wrangler delete scraper_test_consumer --config wrangler.consumer.jsonc
npx wrangler delete scraper_test_frontend --config wrangler.frontend.jsonc

# Delete database (optional - removes all data)
npx wrangler d1 delete scraper_test_database

Verify Cleanup

# List remaining services
npx wrangler list

# List remaining databases
npx wrangler d1 list

# List remaining queues
npx wrangler queues list

🌐 Live URLs

After deployment, your services will be available at:

  • Frontend: https://scraper_test_frontend.your-subdomain.workers.dev
  • API: https://scraper_test_api.your-subdomain.workers.dev
  • Consumer: https://scraper_test_consumer.your-subdomain.workers.dev

πŸ“„ License

This project is part of a Cloudflare Workers demonstration.

🀝 Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Test thoroughly
  5. Submit a pull request

πŸ“š Additional Resources

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published