mdbrowser

Convert any webpage to clean markdown, with Chrome cookie auth support.

Install

git clone git@github.com:fiale-plus/mdbrowser.git
cd mdbrowser && npm install

Usage

mdbrowser <url> [options]

Options

--render             Use headless Puppeteer for JS-rendered pages
--chrome             Use Chrome cookies for authentication
--profile <name>     Chrome profile name (default: "Default")
-o, --output <file>  Write to file instead of stdout
--raw                Full body markdown (skip article extraction)
--no-readability     Same as --raw (backward compat)
--meta               Prepend YAML frontmatter with page metadata
--json               Output structured JSON
--selector <css>     Extract specific CSS selector (implies --render)
--batch              Process multiple URLs from args or stdin
--no-auto-render     Disable automatic render fallback on thin content
--no-cache           Bypass URL cache
--cache-ttl <sec>    Cache TTL in seconds (default: 900)
--timeout <ms>       Fetch/render timeout (default: 30000)
-h, --help           Show help
-v, --version        Show version

Examples

# Basic — fetch and convert to markdown
mdbrowser https://example.com

# Save to file
mdbrowser https://example.com -o page.md

# Authenticated page via Chrome cookies (macOS)
mdbrowser https://github.com/settings/profile --chrome

# JS-rendered SPA
mdbrowser https://some-spa.com --render

# YAML frontmatter with metadata
mdbrowser https://example.com --meta

# Structured JSON output
mdbrowser https://example.com --json

# Extract specific element
mdbrowser https://some-spa.com --selector "main"

# Batch mode — multiple URLs
echo "https://a.com
https://b.com" | mdbrowser --batch

# Raw body (skip article extraction)
mdbrowser https://example.com --raw

# Pipe to other tools
mdbrowser https://example.com | head -20

Interactive Mode

mdbrowser open <url> [options]      # Start browser session
mdbrowser click <ref|"text">        # Click element
mdbrowser type <ref|"text"> <value> # Type into input
mdbrowser read                      # Re-read page state
mdbrowser close                     # End session

How It Works

Mode	Flag	How it works
Basic	(default)	`fetch()` → Defuddle → Markdown
Chrome cookies	`--chrome`	Extracts cookies from Chrome's local DB, injects into fetch
Headless render	`--render`	Puppeteer renders JS, then same pipeline
Auto-render	(automatic)	If extracted content < 50 words, retries with `--render`

Features

Content extraction via Defuddle — better article detection, metadata, and markdown output
Metadata extraction — title, author, published date, domain, word count
YAML frontmatter (--meta) — prepends metadata as YAML
JSON output (--json) — structured output with all fields
Token estimation — always printed to stderr
URL caching — 15-minute TTL file cache for repeated fetches (8x speedup)
Batch mode (--batch) — process multiple URLs from stdin
CSS selector (--selector) — extract specific page elements via puppeteer
Auto-render fallback — thin content auto-retries with headless rendering
Interactive sessions — open, click, type, read pages with puppeteer

Optional Dependencies

Core install is lightweight (defuddle, jsdom). Two features need extra packages:

# For --chrome (Chrome cookie extraction)
npm install better-sqlite3

# For --render (headless browser rendering)
npm install puppeteer

If you use --chrome or --render without the corresponding package, you'll get a clear error with the install command.

Chrome Cookie Auth (macOS)

The --chrome flag reads cookies from Chrome's local SQLite database and injects them as a Cookie header. This lets you fetch authenticated pages (GitHub, Google Docs, internal tools) without configuring API tokens.

Requires macOS (Keychain access for the encryption key)
Copies the DB to a temp file to avoid locking conflicts with Chrome
Handles Chrome 130+ format (DB version ≥ 24) with domain hash prefix
First run triggers a Keychain access prompt — allow it once

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
src		src
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
package-lock.json		package-lock.json
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

mdbrowser

Install

Usage

Options

Examples

Interactive Mode

How It Works

Features

Optional Dependencies

Chrome Cookie Auth (macOS)

License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

fiale-plus/mdbrowser

Folders and files

Latest commit

History

Repository files navigation

mdbrowser

Install

Usage

Options

Examples

Interactive Mode

How It Works

Features

Optional Dependencies

Chrome Cookie Auth (macOS)

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages