Skip to content

Secure URL sanitization and hardening utilities for Markdown and HTML — includes `rehype-harden-urls` and `harden-react-markdown-urls` for safe rendering.

License

Notifications You must be signed in to change notification settings

tiny-md/harden-urls

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

72 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

🛡️ harden-urls Monorepo: Comprehensive URL Security for Content Pipelines

MIT License npm version Tests Status Code Coverage TypeScript

harden-urls Banner

A robust, multi-layered security suite for sanitizing and hardening URLs within user-generated content, markdown, and modern web application pipelines.

This monorepo provides the core harden-urls utility and a set of integration packages for popular ecosystems, ensuring your application is protected against malicious links, tracking parameters, and protocol evasion techniques.


🚀 The Security Gap We Fill

Basic URL sanitizers often fail against modern threats because they rely on simple prefix checks. This suite offers a multi-layered defense that includes:

  1. Strict Protocol Control: Allows only safe schemes (e.g., https:, mailto:).
  2. Tracking Cleanup: Strips common query parameters (utm_, fbclid, gclid).
  3. Obfuscation Defense: Applies Unicode normalization (NFKC) to defeat homoglyph and control-character attacks.
  4. Pattern-Based Filtering: Allows explicit domain whitelisting or blocklisting.

📦 Packages

All packages live under the libs/ directory. You can use them independently or together for end-to-end security.

Package Description Ecosystem Status
harden-urls The core utility for deep cleaning and sanitizing individual URL strings. Dependency-free and highly performant. Core Utility Available
rehype-harden-urls A rehype plugin to enforce policies on <a> and <img> URLs within HTML Abstract Syntax Trees (ASTs). Unified / Rehype Available
harden-react-markdown-urls A Higher-Order Component (HOC) for react-markdown that transparently integrates rehype-harden-urls. React / Markdown Available
remark-harden-urls A remark plugin for hardening URLs directly within the Markdown AST before conversion to HTML. Unified / Remark Coming Soon

🛠️ Installation & Setup

Install the core library and any integration package you need:

pnpm add harden-urls rehype-harden-urls

or

# Core utility and a popular integration
npm install harden-urls rehype-harden-urls

📖 Quick Usage Examples

1. Core Utility: harden-urls

Configure once, use everywhere. This is the foundation for the entire suite.

import { createUrlSanitizer, toRegexps } from "harden-urls";

const trustedDomains = toRegexps(["*.mycorp.com", "partner-api.io"]);

const sanitizer = createUrlSanitizer({
  allowedProtocols: ["https:", "mailto:"],
  allowedPatterns: trustedDomains,
  stripParams: ["utm_", "fbclid"],
});

sanitizer(
  "[https://sub.mycorp.com/path?utm_source=email](https://sub.mycorp.com/path?utm_source=email)"
);
// → "[https://sub.mycorp.com/path](https://sub.mycorp.com/path)" (cleaned tracking param)

sanitizer("javascript:alert('xss')");
// → null (blocked by protocol whitelist)

2. Rehype Plugin: rehype-harden-urls

Use in your Node.js or build-time pipelines (e.g., Gatsby, Next.js).

import { rehypeHardenUrls } from "rehype-harden-urls";
import { presets } from "rehype-harden-urls/utils";

// Use the 'balanced' preset for links and 'strict' for images
.use(rehypeHardenUrls, {
  link: presets.balanced,
  image: presets.strict,
})

Key features: Automatically adds rel="noopener noreferrer" to external links. Can be configured to remove elements entirely (prune: true) or replace them with a safer placeholder (prune: false).


3. React Integration: harden-react-markdown-urls

A drop-in Higher-Order Component for the popular react-markdown library.

import ReactMarkdown from "react-markdown";
import { hardenReactMarkdown } from "harden-react-markdown-urls";
import { presets } from "rehype-harden-urls/utils";

// Wrap ReactMarkdown with the desired default policy
const HardenedMarkdown = hardenReactMarkdown(ReactMarkdown, presets.balanced);

function MyComponent({ markdownText }) {
  return (
    <HardenedMarkdown
      // Instance-level override for maximum control
      hardenedOptions={{
        link: { allowedProtocols: new Set(["https:"]) },
        onUnsafeUrl: url => console.warn("Blocked:", url),
      }}>
      {markdownText}
    </HardenedMarkdown>
  );
}

⚠️ Best Practices and Gotchas

1. The Security Stack is Crucial

Crucial: These packages primarily focus on sanitizing the URL value (href or src). They do not replace a general HTML sanitizer.

If you process untrusted content or allow embedded HTML (e.g., using rehype-raw in your markdown pipeline), you MUST pair this with an HTML structure sanitizer.

The recommended secure chain is:

  1. rehype-raw (if allowing raw HTML)
  2. rehype-harden-urls (Deep URL Content Cleaning)
  3. rehype-sanitize (Structural Guardrail)

2. Protocol Whitelisting is Not Enough 🤯

Your strongest defense starts with a minimal, explicit list of allowedProtocols, but security risks persist even within whitelisted protocols:

A very brief example below

Protocol Basic Risk (Protocol Evasion) Advanced Risk (Phishing/Malicious Data) harden-urls Mitigation
javascript: XSS (Cross-Site Scripting) Malicious code execution. Blocks entirely by default.
mailto: Spambot links, mail client exploits. Targeted Phishing: Malicious headers (BCC, Subject, Body) can be injected via query parameters to trick users into sending unwanted/damaging emails. Strips malicious query parameters (subject, body, etc.) and non-mail protocols via Query Param Cleaning.
data: XSS, large payload DDoS. Can carry executable scripts or large, resource-consuming payloads disguised as images. Requires explicit whitelisting and should be tightly restricted to specific media types (e.g., data:image/png).

Action: Only allow what you absolutely need (e.g., https: and mailto:). If you allow mailto:, rely on this library's stripParams feature to neutralize potential phishing payloads embedded in the query string.


3. Regex Flags

Gotcha: When providing custom RegExp objects for patterns, do not use the global (/g) flag. The test() method with /g maintains state, which can cause security checks to be incorrectly skipped. Use harden-urls/toRegexps to safely convert patterns.


🤝 Contribution and Support

We welcome contributions of all kinds—from reporting bugs and suggesting new features to submitting code. Your feedback helps make web content safer for everyone!

  1. Fork this repository.
  2. Open an Issue to discuss the feature or fix.
  3. Submit a Pull Request against the main branch.

💖 Adopt and Support: If this suite helps secure your application, please give us a star on GitHub!


⚖️ License

This project is licensed under the MIT License.

MIT © Mayank Chaudhari

Inspired by the Unified, Rehype, and Vercel Labs communities.


with 💖 by Mayank Kumar Chaudhari

About

Secure URL sanitization and hardening utilities for Markdown and HTML — includes `rehype-harden-urls` and `harden-react-markdown-urls` for safe rendering.

Topics

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Packages

No packages published