A best-in-class JavaScript/TypeScript library for converting PDF documents to Markdown format. Built with extensibility and configurability in mind, supporting all major JavaScript runtimes (Node.js, Bun, Deno).
- High-quality conversion: Preserves document structure, headings, lists, links, and formatting
- Multiple input sources: Buffer, file path, URL, or readable stream
- Configurable output: Customize heading detection, list handling, link extraction
- Plugin architecture: Extend with custom transformers and post-processors
- Multi-runtime support: Works with Node.js, Bun, and Deno
- TypeScript-first: Full TypeScript definitions included
- CLI tool: Command-line interface for batch processing
- 100% test coverage: Comprehensive test suite
- Zero configuration: Works out of the box with sensible defaults
- Memory efficient: Stream-based processing for large files
# npm
npm install pd2md
# yarn
yarn add pd2md
# pnpm
pnpm add pd2md
# bun
bun add pd2mdimport { pdfToMarkdown } from 'pd2md';
import fs from 'fs';
// From file path
const markdown = await pdfToMarkdown('./document.pdf');
console.log(markdown);
// From buffer
const buffer = fs.readFileSync('./document.pdf');
const markdown = await pdfToMarkdown(buffer);
// With options
const markdown = await pdfToMarkdown('./document.pdf', {
preserveLinks: true,
detectHeadings: true,
pageBreaks: true,
});Converts a PDF to Markdown format.
Parameters:
input(string | Buffer | Uint8Array | URL): PDF source - file path, buffer, or URLoptions(object, optional): Conversion options
Options:
| Option | Type | Default | Description |
|---|---|---|---|
preserveLinks |
boolean |
true |
Extract and preserve hyperlinks |
detectHeadings |
boolean |
true |
Detect headings based on font size |
pageBreaks |
boolean |
false |
Add page break markers (---) |
preserveLists |
boolean |
true |
Detect and format lists |
maxPages |
number |
0 |
Maximum pages to process (0 = all) |
startPage |
number |
1 |
First page to process |
endPage |
number |
0 |
Last page to process (0 = last) |
verbose |
boolean |
false |
Enable verbose logging |
plugins |
Plugin[] |
[] |
Custom transformation plugins |
headingFontSize |
object |
{} |
Custom font size thresholds for heading levels |
lineSpacing |
number |
1.5 |
Line spacing threshold for paragraph separation |
Returns: Promise<string> - The converted Markdown content
Creates a reusable converter instance with preset options.
import { createConverter } from 'pd2md';
const converter = createConverter({
preserveLinks: true,
pageBreaks: true,
});
const md1 = await converter.convert('./doc1.pdf');
const md2 = await converter.convert('./doc2.pdf');Create custom plugins to extend the conversion process:
import { pdfToMarkdown } from 'pd2md';
const myPlugin = {
name: 'custom-headers',
// Transform text items before markdown generation
transformItems: (items) => {
return items.map((item) => {
if (item.text.startsWith('Chapter')) {
return { ...item, isHeading: true, level: 1 };
}
return item;
});
},
// Post-process the generated markdown
postProcess: (markdown) => {
return markdown.replace(/Chapter (\d+)/g, '# Chapter $1');
},
};
const markdown = await pdfToMarkdown('./book.pdf', {
plugins: [myPlugin],
});# Convert single file
npx pd2md document.pdf -o output.md
# Convert directory
npx pd2md ./pdfs/ -o ./markdown/ --recursive
# With options
npx pd2md document.pdf --no-links --page-breaks
# Pipe from stdin
cat document.pdf | npx pd2md > output.mdCLI Options:
Usage: pd2md [input] [options]
Options:
-o, --output <path> Output file or directory
-r, --recursive Process directories recursively
--no-links Don't preserve hyperlinks
--no-headings Don't detect headings
--page-breaks Add page break markers
--start-page <n> First page to process
--end-page <n> Last page to process
--verbose Enable verbose output
-h, --help Show help
-v, --version Show version
import { createReadStream } from 'fs';
import { pdfStreamToMarkdown } from 'pd2md';
const stream = createReadStream('./large-document.pdf');
const markdown = await pdfStreamToMarkdown(stream);import { pdfToMarkdown } from 'pd2md';
const markdown = await pdfToMarkdown('./document.pdf', {
onProgress: ({ currentPage, totalPages }) => {
console.log(`Processing page ${currentPage}/${totalPages}`);
},
onPageComplete: ({ page, content }) => {
console.log(`Page ${page} converted: ${content.length} chars`);
},
});const markdown = await pdfToMarkdown('./document.pdf', {
headingFontSize: {
h1: 24, // Font size >= 24 becomes H1
h2: 20, // Font size >= 20 becomes H2
h3: 16, // Font size >= 16 becomes H3
h4: 14, // Font size >= 14 becomes H4
},
});This library was designed to be the most feature-complete PDF to Markdown converter. Here's how it compares to alternatives:
| Library | Language | Stars | License | Plugins | CLI | TypeScript | Active |
|---|---|---|---|---|---|---|---|
| pd2md (this) | JavaScript | - | Unlicense | Yes | Yes | Yes | Yes |
| @opendocsg/pdf2md | JavaScript | 463 | MIT | No | Yes | No | Yes |
| jzillmann/pdf-to-markdown | JavaScript | 1514 | MIT | No | No | No | Yes |
| zyocum/pdf2md | Python | - | - | No | Yes | N/A | Yes |
| leoneversberg/pdf2md_llm | Python | - | - | No | Yes | N/A | Yes |
| FutureUnreal/mcp-pdf2md | Python | - | - | No | Yes | N/A | Yes |
| Diselorya/pdf2md | Python | - | - | No | Yes | N/A | Yes |
| VikParuchuri/marker | Python | - | GPL | No | Yes | N/A | Yes |
| Feature | pd2md | @opendocsg/pdf2md | jzillmann/pdf-to-markdown |
|---|---|---|---|
| Heading detection | Yes | Yes | Yes |
| Link preservation | Yes | Yes | Yes |
| List detection | Yes | Partial | Partial |
| Table extraction | Yes | No | No |
| Plugin system | Yes | No | No |
| Streaming API | Yes | No | No |
| Progress callbacks | Yes | No | No |
| Custom font thresholds | Yes | No | No |
| Page range selection | Yes | No | No |
| Multi-runtime | Yes | Node.js only | Browser only |
In case pd2md is not available or you prefer a different name, here are 25+ alternatives considered for this project:
- pdfmark - PDF to Markdown
- pdftomd - PDF to MD
- pdf2mark - PDF to Markdown
- pdfconv - PDF Converter
- mdconv - Markdown Converter
- pdf-mark - PDF Mark
- pd2mk - PD to MK
- p2md - P to MD
- pdf2m - PDF to M
- p2m - P to M
- to-md - To Markdown
- d2md - Document to MD
- dmkd - Document Markdown
- depdf - De-PDF
- re-pdf - Re-PDF
- pdf-doc - PDF Document
- doc-mark - Document Mark
- pdftext - PDF Text
- pdfread - PDF Read
- readpdf - Read PDF
- textpdf - Text PDF
- makemark - Make Markdown
- doc-to-md - Document to MD
- pdftex - PDF Text
- pdfmarkdown - PDF Markdown
- pdf-parse-md - PDF Parse MD
- md-from-pdf - MD from PDF
- unmake-pdf - Unmake PDF
- pdf-md-converter - PDF MD Converter
pd2md/
├── src/
│ ├── index.js # Main entry point
│ ├── index.d.ts # TypeScript definitions
│ ├── converter.js # Core converter logic
│ ├── parser.js # PDF parsing utilities
│ ├── transformer.js # Text transformation
│ ├── plugins/ # Built-in plugins
│ │ ├── headings.js # Heading detection
│ │ ├── links.js # Link extraction
│ │ ├── lists.js # List detection
│ │ └── tables.js # Table extraction
│ └── cli.js # CLI implementation
├── tests/
│ ├── converter.test.js # Converter tests
│ ├── parser.test.js # Parser tests
│ ├── plugins.test.js # Plugin tests
│ └── fixtures/ # Test PDF files
├── examples/
│ ├── basic.js # Basic usage
│ ├── advanced.js # Advanced features
│ └── plugins.js # Custom plugins
└── docs/
├── api.md # API documentation
└── plugins.md # Plugin development guide
- Fork the repository
- Create a feature branch:
git checkout -b feature/my-feature - Make your changes
- Create a changeset:
bun run changeset - Commit your changes (pre-commit hooks will run automatically)
- Push and create a Pull Request
Unlicense - Public Domain
This is free and unencumbered software released into the public domain. You can copy, modify, publish, use, compile, sell, or distribute this software for any purpose, commercial or non-commercial.