pd2md

A best-in-class JavaScript/TypeScript library for converting PDF documents to Markdown format. Built with extensibility and configurability in mind, supporting all major JavaScript runtimes (Node.js, Bun, Deno).

Features

High-quality conversion: Preserves document structure, headings, lists, links, and formatting
Multiple input sources: Buffer, file path, URL, or readable stream
Configurable output: Customize heading detection, list handling, link extraction
Plugin architecture: Extend with custom transformers and post-processors
Multi-runtime support: Works with Node.js, Bun, and Deno
TypeScript-first: Full TypeScript definitions included
CLI tool: Command-line interface for batch processing
100% test coverage: Comprehensive test suite
Zero configuration: Works out of the box with sensible defaults
Memory efficient: Stream-based processing for large files

Installation

# npm
npm install pd2md

# yarn
yarn add pd2md

# pnpm
pnpm add pd2md

# bun
bun add pd2md

Quick Start

import { pdfToMarkdown } from 'pd2md';
import fs from 'fs';

// From file path
const markdown = await pdfToMarkdown('./document.pdf');
console.log(markdown);

// From buffer
const buffer = fs.readFileSync('./document.pdf');
const markdown = await pdfToMarkdown(buffer);

// With options
const markdown = await pdfToMarkdown('./document.pdf', {
  preserveLinks: true,
  detectHeadings: true,
  pageBreaks: true,
});

API Reference

`pdfToMarkdown(input, options?)`

Converts a PDF to Markdown format.

Parameters:

input (string | Buffer | Uint8Array | URL): PDF source - file path, buffer, or URL
options (object, optional): Conversion options

Options:

Option	Type	Default	Description
`preserveLinks`	`boolean`	`true`	Extract and preserve hyperlinks
`detectHeadings`	`boolean`	`true`	Detect headings based on font size
`pageBreaks`	`boolean`	`false`	Add page break markers (`---`)
`preserveLists`	`boolean`	`true`	Detect and format lists
`maxPages`	`number`	`0`	Maximum pages to process (0 = all)
`startPage`	`number`	`1`	First page to process
`endPage`	`number`	`0`	Last page to process (0 = last)
`verbose`	`boolean`	`false`	Enable verbose logging
`plugins`	`Plugin[]`	`[]`	Custom transformation plugins
`headingFontSize`	`object`	`{}`	Custom font size thresholds for heading levels
`lineSpacing`	`number`	`1.5`	Line spacing threshold for paragraph separation

Returns: Promise<string> - The converted Markdown content

`createConverter(options?)`

Creates a reusable converter instance with preset options.

import { createConverter } from 'pd2md';

const converter = createConverter({
  preserveLinks: true,
  pageBreaks: true,
});

const md1 = await converter.convert('./doc1.pdf');
const md2 = await converter.convert('./doc2.pdf');

Plugins

Create custom plugins to extend the conversion process:

import { pdfToMarkdown } from 'pd2md';

const myPlugin = {
  name: 'custom-headers',
  // Transform text items before markdown generation
  transformItems: (items) => {
    return items.map((item) => {
      if (item.text.startsWith('Chapter')) {
        return { ...item, isHeading: true, level: 1 };
      }
      return item;
    });
  },
  // Post-process the generated markdown
  postProcess: (markdown) => {
    return markdown.replace(/Chapter (\d+)/g, '# Chapter $1');
  },
};

const markdown = await pdfToMarkdown('./book.pdf', {
  plugins: [myPlugin],
});

CLI Usage

# Convert single file
npx pd2md document.pdf -o output.md

# Convert directory
npx pd2md ./pdfs/ -o ./markdown/ --recursive

# With options
npx pd2md document.pdf --no-links --page-breaks

# Pipe from stdin
cat document.pdf | npx pd2md > output.md

CLI Options:

Usage: pd2md [input] [options]

Options:
  -o, --output <path>     Output file or directory
  -r, --recursive         Process directories recursively
  --no-links              Don't preserve hyperlinks
  --no-headings           Don't detect headings
  --page-breaks           Add page break markers
  --start-page <n>        First page to process
  --end-page <n>          Last page to process
  --verbose               Enable verbose output
  -h, --help              Show help
  -v, --version           Show version

Advanced Usage

Stream Processing

import { createReadStream } from 'fs';
import { pdfStreamToMarkdown } from 'pd2md';

const stream = createReadStream('./large-document.pdf');
const markdown = await pdfStreamToMarkdown(stream);

Progress Callbacks

import { pdfToMarkdown } from 'pd2md';

const markdown = await pdfToMarkdown('./document.pdf', {
  onProgress: ({ currentPage, totalPages }) => {
    console.log(`Processing page ${currentPage}/${totalPages}`);
  },
  onPageComplete: ({ page, content }) => {
    console.log(`Page ${page} converted: ${content.length} chars`);
  },
});

Custom Font Size Mapping

const markdown = await pdfToMarkdown('./document.pdf', {
  headingFontSize: {
    h1: 24, // Font size >= 24 becomes H1
    h2: 20, // Font size >= 20 becomes H2
    h3: 16, // Font size >= 16 becomes H3
    h4: 14, // Font size >= 14 becomes H4
  },
});

Competitors & Alternatives

This library was designed to be the most feature-complete PDF to Markdown converter. Here's how it compares to alternatives:

Library	Language	Stars	License	Plugins	CLI	TypeScript	Active
pd2md (this)	JavaScript	-	Unlicense	Yes	Yes	Yes	Yes
@opendocsg/pdf2md	JavaScript	463	MIT	No	Yes	No	Yes
jzillmann/pdf-to-markdown	JavaScript	1514	MIT	No	No	No	Yes
zyocum/pdf2md	Python	-	-	No	Yes	N/A	Yes
leoneversberg/pdf2md_llm	Python	-	-	No	Yes	N/A	Yes
FutureUnreal/mcp-pdf2md	Python	-	-	No	Yes	N/A	Yes
Diselorya/pdf2md	Python	-	-	No	Yes	N/A	Yes
VikParuchuri/marker	Python	-	GPL	No	Yes	N/A	Yes

Feature Comparison

Feature	pd2md	@opendocsg/pdf2md	jzillmann/pdf-to-markdown
Heading detection	Yes	Yes	Yes
Link preservation	Yes	Yes	Yes
List detection	Yes	Partial	Partial
Table extraction	Yes	No	No
Plugin system	Yes	No	No
Streaming API	Yes	No	No
Progress callbacks	Yes	No	No
Custom font thresholds	Yes	No	No
Page range selection	Yes	No	No
Multi-runtime	Yes	Node.js only	Browser only

Alternative Package Names

In case pd2md is not available or you prefer a different name, here are 25+ alternatives considered for this project:

pdfmark - PDF to Markdown
pdftomd - PDF to MD
pdf2mark - PDF to Markdown
pdfconv - PDF Converter
mdconv - Markdown Converter
pdf-mark - PDF Mark
pd2mk - PD to MK
p2md - P to MD
pdf2m - PDF to M
p2m - P to M
to-md - To Markdown
d2md - Document to MD
dmkd - Document Markdown
depdf - De-PDF
re-pdf - Re-PDF
pdf-doc - PDF Document
doc-mark - Document Mark
pdftext - PDF Text
pdfread - PDF Read
readpdf - Read PDF
textpdf - Text PDF
makemark - Make Markdown
doc-to-md - Document to MD
pdftex - PDF Text
pdfmarkdown - PDF Markdown
pdf-parse-md - PDF Parse MD
md-from-pdf - MD from PDF
unmake-pdf - Unmake PDF
pdf-md-converter - PDF MD Converter

Project Structure

pd2md/
├── src/
│   ├── index.js           # Main entry point
│   ├── index.d.ts         # TypeScript definitions
│   ├── converter.js       # Core converter logic
│   ├── parser.js          # PDF parsing utilities
│   ├── transformer.js     # Text transformation
│   ├── plugins/           # Built-in plugins
│   │   ├── headings.js    # Heading detection
│   │   ├── links.js       # Link extraction
│   │   ├── lists.js       # List detection
│   │   └── tables.js      # Table extraction
│   └── cli.js             # CLI implementation
├── tests/
│   ├── converter.test.js  # Converter tests
│   ├── parser.test.js     # Parser tests
│   ├── plugins.test.js    # Plugin tests
│   └── fixtures/          # Test PDF files
├── examples/
│   ├── basic.js           # Basic usage
│   ├── advanced.js        # Advanced features
│   └── plugins.js         # Custom plugins
└── docs/
    ├── api.md             # API documentation
    └── plugins.md         # Plugin development guide

Contributing

Fork the repository
Create a feature branch: git checkout -b feature/my-feature
Make your changes
Create a changeset: bun run changeset
Commit your changes (pre-commit hooks will run automatically)
Push and create a Pull Request

License

Unlicense - Public Domain

This is free and unencumbered software released into the public domain. You can copy, modify, publish, use, compile, sell, or distribute this software for any purpose, commercial or non-commercial.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.changeset		.changeset
.github/workflows		.github/workflows
.husky		.husky
docs/case-studies		docs/case-studies
examples		examples
experiments		experiments
scripts		scripts
src		src
tests		tests
.gitignore		.gitignore
.jscpd.json		.jscpd.json
.prettierignore		.prettierignore
.prettierrc		.prettierrc
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
bunfig.toml		bunfig.toml
deno.json		deno.json
eslint.config.js		eslint.config.js
package-lock.json		package-lock.json
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

pd2md

Features

Installation

Quick Start

API Reference

`pdfToMarkdown(input, options?)`

`createConverter(options?)`

Plugins

CLI Usage

Advanced Usage

Stream Processing

Progress Callbacks

Custom Font Size Mapping

Competitors & Alternatives

Feature Comparison

Alternative Package Names

Project Structure

Contributing

License

About

Uh oh!

Releases

Packages

Contributors 3

Uh oh!

Languages

License

link-foundation/pdf-to-markdown

Folders and files

Latest commit

History

Repository files navigation

pd2md

Features

Installation

Quick Start

API Reference

pdfToMarkdown(input, options?)

createConverter(options?)

Plugins

CLI Usage

Advanced Usage

Stream Processing

Progress Callbacks

Custom Font Size Mapping

Competitors & Alternatives

Feature Comparison

Alternative Package Names

Project Structure

Contributing

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Uh oh!

Languages

`pdfToMarkdown(input, options?)`

`createConverter(options?)`

Packages