Ichabod Engine 🎃

Headless browser scraping microservice. Accepts a JSON recipe, executes it against a live page using Playwright, and returns structured data.

Powers Ichabod Browser.

Licensed under MIT — free to self-host and modify. Cannot be used to offer a competing commercial service.

How It Works

POST /scrape  →  Navigate  →  Act  →  Extract  →  Return JSON
POST /discover →  Fetch  →  Sanitize HTML  →  Return for LLM analysis
GET  /health  →  Browser + server status

Deploy to Railway

Fork or push this repo to GitHub
Create a new project in Railway
Connect your GitHub repo
Set environment variables in the Railway dashboard (see below)
Railway will build the Dockerfile and deploy automatically

Your service URL will be something like https://ichabod-engine.up.railway.app.

Environment Variables

Set these in your Railway dashboard under Variables.

Variable	Required	Default	Description
`ICHABOD_API_KEY`	Yes (production)	`null`	Bearer token for all authenticated endpoints
`PORT`	No	`3000`	Port the server listens on
`NODE_ENV`	No	`development`	Set to `production` on Railway
`BROWSER_HEADLESS`	No	`true`	Run Chromium headless
`BROWSER_TIMEOUT_MS`	No	`30000`	Max time for any browser operation
`BROWSER_NAV_TIMEOUT_MS`	No	`15000`	Max time for page navigation
`BROWSER_MAX_PAGES`	No	`5`	Max concurrent pages
`SCRAPE_SCROLL_PAUSE_MS`	No	`800`	Pause between scroll actions
`SCRAPE_WAIT_TIMEOUT_MS`	No	`10000`	Max time waiting for a selector
`SCRAPE_MAX_PAGES`	No	`10`	Max pages in a paginated scrape
`DISCOVERY_MAX_HTML_LENGTH`	No	`100000`	Max HTML chars returned to LLM
`LOG_LEVEL`	No	`info`	Pino log level (debug/info/warn/error)
`LOG_PRETTY`	No	`true`	Pretty-print logs (set to `false` in production)

Local Development

npm install
npx playwright install chromium
cp .env.local.example .env
npm run dev

.env is for local development only. Never use it in production.

API Reference

`POST /scrape`

Execute a scrape recipe.

Headers

Authorization: Bearer <ICHABOD_API_KEY>
Content-Type: application/json

Body

{
  "url": "https://www.upwork.com/search/jobs/?q=english+teacher&sort=recency",
  "waitFor": ".job-tile",
  "actions": [
    { "type": "scroll", "times": 3 }
  ],
  "extract": [
    {
      "name": "jobs",
      "selector": ".job-tile",
      "attribute": "selector",
      "multiple": true,
      "fields": [
        { "name": "title", "selector": ".job-title a", "attribute": "text" },
        { "name": "budget", "selector": ".budget", "attribute": "text" },
        { "name": "link", "selector": ".job-title a", "attribute": "href" }
      ]
    }
  ],
  "options": {
    "waitUntil": "networkidle"
  }
}

Response

{
  "ok": true,
  "data": [
    { "title": "English Tutor Needed", "budget": "$25/hr", "link": "/jobs/..." }
  ],
  "meta": {
    "url": "https://...",
    "itemCount": 12,
    "duration": 4821,
    "scrapedAt": "2026-02-23T10:00:00.000Z",
    "timestamp": "2026-02-23T10:00:00.000Z"
  }
}

`POST /discover`

Fetch a URL and return sanitized HTML for LLM analysis.

Body — single URL

{
  "url": "https://www.upwork.com/search/jobs/?q=english+teacher",
  "options": { "waitUntil": "networkidle" }
}

Body — multiple candidate URLs

{
  "candidates": [
    "https://www.upwork.com/search/jobs/?q=english+teacher",
    "https://www.upwork.com/freelance-jobs/english-teaching/"
  ],
  "options": { "waitUntil": "networkidle" }
}

`GET /health`

No authentication required.

Response

{
  "ok": true,
  "data": {
    "status": "ok",
    "browser": { "connected": true, "activePages": 0, "maxPages": 5 },
    "uptime": 3600,
    "memory": { "rss": 12345678 }
  }
}

Adding a New Action

Create src/actions/yourAction.js
Export a single async function (page, action) => {}
Register it in src/actions/index.js

Adding a New Route

Create src/routes/yourRoute.js
Create src/controllers/yourController.js
Add validation schema to src/middleware/validate.js
Mount it in src/routes/index.js

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
config		config
images		images
src		src
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
package-lock.json		package-lock.json
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Ichabod Engine 🎃

How It Works

Deploy to Railway

Environment Variables

Local Development

API Reference

`POST /scrape`

Headers

Body

Response

`POST /discover`

Body — single URL

Body — multiple candidate URLs

`GET /health`

Response

Adding a New Action

Adding a New Route

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Ichabod Engine 🎃

How It Works

Deploy to Railway

Environment Variables

Local Development

API Reference

POST /scrape

Headers

Body

Response

POST /discover

Body — single URL

Body — multiple candidate URLs

GET /health

Response

Adding a New Action

Adding a New Route

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`POST /scrape`

`POST /discover`

`GET /health`

Packages