Convert any webpage to clean markdown, with Chrome cookie auth support.
git clone git@github.com:fiale-plus/mdbrowser.git
cd mdbrowser && npm installmdbrowser <url> [options]--render Use headless Puppeteer for JS-rendered pages
--chrome Use Chrome cookies for authentication
--profile <name> Chrome profile name (default: "Default")
-o, --output <file> Write to file instead of stdout
--raw Full body markdown (skip article extraction)
--no-readability Same as --raw (backward compat)
--meta Prepend YAML frontmatter with page metadata
--json Output structured JSON
--selector <css> Extract specific CSS selector (implies --render)
--batch Process multiple URLs from args or stdin
--no-auto-render Disable automatic render fallback on thin content
--no-cache Bypass URL cache
--cache-ttl <sec> Cache TTL in seconds (default: 900)
--timeout <ms> Fetch/render timeout (default: 30000)
-h, --help Show help
-v, --version Show version
# Basic — fetch and convert to markdown
mdbrowser https://example.com
# Save to file
mdbrowser https://example.com -o page.md
# Authenticated page via Chrome cookies (macOS)
mdbrowser https://github.com/settings/profile --chrome
# JS-rendered SPA
mdbrowser https://some-spa.com --render
# YAML frontmatter with metadata
mdbrowser https://example.com --meta
# Structured JSON output
mdbrowser https://example.com --json
# Extract specific element
mdbrowser https://some-spa.com --selector "main"
# Batch mode — multiple URLs
echo "https://a.com
https://b.com" | mdbrowser --batch
# Raw body (skip article extraction)
mdbrowser https://example.com --raw
# Pipe to other tools
mdbrowser https://example.com | head -20mdbrowser open <url> [options] # Start browser session
mdbrowser click <ref|"text"> # Click element
mdbrowser type <ref|"text"> <value> # Type into input
mdbrowser read # Re-read page state
mdbrowser close # End session| Mode | Flag | How it works |
|---|---|---|
| Basic | (default) | fetch() → Defuddle → Markdown |
| Chrome cookies | --chrome |
Extracts cookies from Chrome's local DB, injects into fetch |
| Headless render | --render |
Puppeteer renders JS, then same pipeline |
| Auto-render | (automatic) | If extracted content < 50 words, retries with --render |
- Content extraction via Defuddle — better article detection, metadata, and markdown output
- Metadata extraction — title, author, published date, domain, word count
- YAML frontmatter (
--meta) — prepends metadata as YAML - JSON output (
--json) — structured output with all fields - Token estimation — always printed to stderr
- URL caching — 15-minute TTL file cache for repeated fetches (8x speedup)
- Batch mode (
--batch) — process multiple URLs from stdin - CSS selector (
--selector) — extract specific page elements via puppeteer - Auto-render fallback — thin content auto-retries with headless rendering
- Interactive sessions — open, click, type, read pages with puppeteer
Core install is lightweight (defuddle, jsdom). Two features need extra packages:
# For --chrome (Chrome cookie extraction)
npm install better-sqlite3
# For --render (headless browser rendering)
npm install puppeteerIf you use --chrome or --render without the corresponding package, you'll get a clear error with the install command.
The --chrome flag reads cookies from Chrome's local SQLite database and injects them as a Cookie header. This lets you fetch authenticated pages (GitHub, Google Docs, internal tools) without configuring API tokens.
- Requires macOS (Keychain access for the encryption key)
- Copies the DB to a temp file to avoid locking conflicts with Chrome
- Handles Chrome 130+ format (DB version ≥ 24) with domain hash prefix
- First run triggers a Keychain access prompt — allow it once
MIT