mdclip

A command-line tool that clips web pages to Markdown files with YAML frontmatter. Think of it as the Obsidian Web Clipper browser extension, but for the terminal.

Features

Template-based routing: URLs are matched to templates that control output folder, tags, and frontmatter
Built-in category filters: 9 smart filters (@academic, @docs, @news, @wiki, etc.) automatically categorize URLs from 400+ domains using domain and path scoring
YAML frontmatter: Automatic metadata including title, author (if available), source URL, creation date, published date (if available), description, and tags
Multiple input formats: Single URLs, bookmark exports, markdown files with links, or text files with URL lists
Clipboard support: Automatically reads URLs from clipboard when no input is provided (macOS)
Obsidian integration: Auto-opens clipped notes in Obsidian after single-URL clipping
Bookmark section selection: For large bookmark files, interactively select which folder to process
Skip existing: Option to skip URLs that already have a clipped file with the same source
Rate limiting: Configurable delay between requests to the same domain (default 3s) with smart deferred queue
Optional formatting: Post-process output with mdformat for consistent styling
Batch processing: Clip multiple URLs in one command with confirmation for large batches
Shell completion: Bash completion with mdclip completion bash --install
Rich console output: Colored status messages and progress spinners

Installation

Prerequisites

Node.js (required) - Used for content extraction via defuddle:

# macOS with Homebrew
brew install node

# Or download from https://nodejs.org/

mdformat (optional) - Auto-formats output Markdown:

brew install mdformat

Install mdclip

# Clone the repository
git clone https://github.com/jdmonaco/mdclip.git
cd mdclip

# Install Node.js dependencies
npm install

# Install Python package
pip install -e .

# Or with uv
uv pip install -e .

Quick Start

# Clip a URL (config file is auto-created on first run)
mdclip "https://github.com/kepano/defuddle"

# Edit ~/.mdclip.yml to customize vault path and templates
vim ~/.mdclip.yml

Usage

usage: mdclip [-h] [-o FOLDER] [-t NAME] [--tags TAGS] [--force] [-n]
              [-y] [--all-sections] [--no-format] [--no-open]
              [--rate-limit SECONDS] [--cookies FILE] [--vault PATH]
              [--config FILE] [--list-templates] [--verbose] [--version]
              [INPUT ...]

Clip web pages to Markdown with YAML frontmatter.

positional arguments:
  INPUT                 URL, bookmarks HTML file, or text file with URLs
                        (reads clipboard if omitted)

options:
  -h, --help            show this help message and exit
  -o, --output FOLDER   Output folder (relative to vault or absolute path)
  -t, --template NAME   Use named template (bypasses URL pattern matching)
  --tags TAGS           Additional tags, comma-separated (e.g., --tags foo,bar,baz)
  --force               Force re-download even if file with same source URL exists
  -n, --dry-run         Show what would be done without writing files
  -y, --yes             Skip confirmation prompts
  --all-sections        Process all bookmark sections without prompting
  --no-format           Skip mdformat post-processing
  --no-open             Don't open note after clipping
  --rate-limit SECONDS  Seconds between requests to same domain (default: 3.0, 0 to disable)
  --cookies FILE        Load cookies from Netscape cookies.txt for authenticated requests
  --vault PATH          Override vault path from config
  --config FILE         Use alternate config file
  --list-templates      List configured templates and exit
  --verbose             Show detailed output
  --version             show program's version number and exit

Shell completion:
  mdclip completion bash            Output completion script
  mdclip completion bash --install  Install to user completions directory

Examples

# Clip a single URL
mdclip "https://docs.python.org/3/library/pathlib.html"

# Clip from clipboard (copy a URL, then run without arguments)
mdclip

# Clip multiple URLs
mdclip "https://github.com/..." "https://stackoverflow.com/..."

# Preview without saving
mdclip --dry-run "https://example.com"

# Override output folder
mdclip -o "Projects/Research" "https://arxiv.org/abs/..."

# Add extra tags (comma-separated)
mdclip --tags python,tutorial "https://realpython.com/..."

# Clip from a bookmarks export (prompts to select a section if >10 URLs)
mdclip ~/Downloads/bookmarks.html

# Process all bookmark sections without prompting
mdclip --all-sections ~/Downloads/bookmarks.html

# Skip confirmation for large batches
mdclip -y bookmarks.html

# Clip from a markdown file (extracts URLs from [text](url) links)
mdclip links.md

# Clip from a text file (one URL per line)
mdclip urls.txt

# Force a specific template
mdclip -t documentation "https://example.com/docs"

# Force re-download even if file exists (default is to skip)
mdclip --force urls.txt

# Clip without opening in Obsidian
mdclip --no-open "https://example.com"

Configuration

Configuration is stored in ~/.mdclip.yml. The config file is automatically created with sensible defaults on first run.

Key Settings

# Path to your notes vault
vault: ~/Documents/Obsidian/Notes

# Date format (for frontmatter 'created' and filenames)
date_format: "%Y-%m-%d"

# Default output folder (relative to vault)
default_folder: Inbox/Clips

# Enable mdformat post-processing
auto_format: false

# Open clipped note after single-URL processing
# Inside vault: opens in Obsidian; outside vault: opens in glow/less
open_in_obsidian: true

# Rate limiting: seconds between requests to the same domain
# Set to 0 to disable; override with --rate-limit flag
rate_limit_seconds: 3.0

Templates

Templates match URLs by pattern and control how they're saved:

templates:
  - name: github
    triggers:
      - "https://github.com/"
    folder: Reference/Software
    tags:
      - webclip
      - github
    filename: "{{title}}"
    properties:
      type: repository

  - name: default
    folder: Inbox/Clips
    tags:
      - webclip
    filename: "{{title}}"

Trigger types:

Substring match: "https://github.com/"
Regex pattern: "^https://[\\w-]+\\.github\\.io/"
Built-in filter: "@academic" (see below)

Filename variables: {{title}}, {{date}}, {{slug}}, {{domain}}

Built-in Triggers

mdclip includes built-in triggers for common content types using smart URL matching with domain and path scoring.

Trigger	Description	Domains
`@academic`	Scientific journals and publishers	113
`@docs`	Software documentation and references	35
`@edu`	Educational content and .edu sites	15+
`@gov`	Government sites (.gov/.mil)	45+
`@longform`	Magazines and longform journalism	75
`@news`	US-focused news sources	50
`@scitech`	Science & technology publications	35
`@social`	Social media & discussion platforms	35
`@wiki`	Wikis and encyclopedias	25

`@academic`

Matches academic and scientific journal article URLs from Nature, arXiv, PubMed, IEEE, ACM, Springer, Elsevier, and more. Requires domain + article path for high precision.

templates:
  - name: papers
    triggers:
      - "@academic"
    folder: Reference/Papers
    tags: [paper, research]

`@docs`

Matches software documentation URLs from official language docs (Python, MDN, Rust), documentation platforms (Read the Docs, GitHub Pages), cloud providers, and spec references. Requires domain + path for precision.

templates:
  - name: documentation
    triggers:
      - "@docs"
    folder: Reference/Docs
    tags: [docs, reference]

`@edu`

Matches educational content from online learning platforms (Coursera, Khan Academy, edX) and any .edu domain. The .edu TLD is matched automatically.

templates:
  - name: education
    triggers:
      - "@edu"
    folder: Reference/Education
    tags: [education, learning]

`@gov`

Matches government and official sites via .gov and .mil TLDs plus known agency domains. Works for federal, state, and international government sites.

templates:
  - name: government
    triggers:
      - "@gov"
    folder: Reference/Government
    tags: [government, official]

`@longform`

Matches general interest magazines and longform journalism sites including The Atlantic, New Yorker, Economist, Harper's, literary reviews (NYRB, LRB, Paris Review), and political magazines. Sources curated from aldaily.com.

templates:
  - name: longform
    triggers:
      - "@longform"
    folder: Reference/Longform
    tags: [longform, magazine]

`@news`

Matches news article URLs from major US newspapers, TV networks, wire services, and digital-native outlets. Trusts domain alone (lower threshold) for known news sources.

templates:
  - name: news
    triggers:
      - "@news"
    folder: Reference/News
    tags: [news, current-events]

`@scitech`

Matches popular science and technology publication URLs from Wired, Ars Technica, The Verge, Scientific American, and more. Trusts domain alone for known sources.

templates:
  - name: scitech
    triggers:
      - "@scitech"
    folder: Reference/SciTech
    tags: [scitech, reading]

`@social`

Matches social media and discussion platform URLs from Twitter/X, Reddit, Hacker News, YouTube, Mastodon, LinkedIn, and more. Trusts domain alone for known platforms.

templates:
  - name: social
    triggers:
      - "@social"
    folder: Reference/Social
    tags: [social, discussion]

`@wiki`

Matches wiki and encyclopedia URLs from Wikipedia, Fandom, Britannica, and software project wikis. Trusts domain alone for known wiki platforms.

templates:
  - name: wiki
    triggers:
      - "@wiki"
    folder: Reference/Wiki
    tags: [wiki, reference]

Authenticated/Paywalled Content

For content behind logins or paywalls, you can provide browser cookies:

Install a browser extension to export cookies:
- Firefox: cookies.txt
- Chrome: Get cookies.txt LOCALLY
Export cookies for the target site (Netscape format)

Use with mdclip:

mdclip --cookies ~/Downloads/cookies.txt "https://nature.com/articles/..."

Note: Cookies are used only for the HTTP request and are not stored or transmitted elsewhere.

Output

Each clipped page creates a Markdown file with YAML frontmatter:

---
title: "Page Title"
source: https://example.com/article
author:
  - "Author Name"
created: 2024-01-15
published: 2024-01-10
description: "A brief description of the page content"
tags:
  - webclip
  - docs
---

# Page Title

Content extracted from the web page...

Metadata is automatically extracted using defuddle, the same library used by Obsidian Web Clipper.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
scripts		scripts
src/mdclip		src/mdclip
tests		tests
.gitignore		.gitignore
.python-version		.python-version
CLAUDE.md		CLAUDE.md
README.md		README.md
package-lock.json		package-lock.json
package.json		package.json
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

mdclip

Features

Installation

Prerequisites

Install mdclip

Quick Start

Usage

Examples

Configuration

Key Settings

Templates

Built-in Triggers

`@academic`

`@docs`

`@edu`

`@gov`

`@longform`

`@news`

`@scitech`

`@social`

`@wiki`

Authenticated/Paywalled Content

Output

License

About

Uh oh!

Releases 4

Contributors 2

Uh oh!

Languages

jdmonaco/mdclip

Folders and files

Latest commit

History

Repository files navigation

mdclip

Features

Installation

Prerequisites

Install mdclip

Quick Start

Usage

Examples

Configuration

Key Settings

Templates

Built-in Triggers

@academic

@docs

@edu

@gov

@longform

@news

@scitech

@social

@wiki

Authenticated/Paywalled Content

Output

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases 4

Contributors 2

Uh oh!

Languages

`@academic`

`@docs`

`@edu`

`@gov`

`@longform`

`@news`

`@scitech`

`@social`

`@wiki`