- 
                Notifications
    You must be signed in to change notification settings 
- Fork 2
Pandoc Batch Processor
The pandoc_batch.py script is a utility built into the write-book-template workflow for automating Pandoc conversions across all Markdown (.md) files in a book project.
It handles:
- 
Batch processing of all .mdfiles (recursively from a root folder)
- Parallel conversions for faster builds
- Output path mirroring (maintains the same directory structure as the source)
- Optional auto-patching of Markdown before conversion to fix common Pandoc pitfalls
- 
Safe test mode (--test-only) for checking files without generating outputs
Pandoc is strict about Markdown syntax. Certain patterns can cause build errors or unexpected rendering.
One frequent issue in book manuscripts is horizontal rules (---, ***, ___) immediately followed by text without a blank line:
---
*This text may be misinterpreted by Pandoc*This can break EPUB, PDF, and HTML builds because Pandoc merges the text into the horizontal rule block.
Solution: Always insert a blank line after thematic breaks:
---
*This is now correctly parsed as a new paragraph.*The auto-patching system detects and fixes these issues automatically during the build.
- 
Scans a root folder (default: manuscript/) for all.mdfiles.
- 
Processes them in parallel ( --jobsoption) for speed.
- 
Supports output formats: epub,html,pdf,docx,odt,rtf.
- 
Reads default settings from [tool.pandoc_batch]inpyproject.toml.
- 
Allows running poetry run pandoc-batchwith no extra flags.
Example pyproject.toml block:
[tool.pandoc_batch]
root = "manuscript"
outdir = "output"
to = "epub"
metadata_file = "config/metadata.yaml"
resource_path = ["assets"]
lang = "en"
jobs = 4
verbose = true
standalone = true
test_only = false
patch_md = true
fix_inplace = falseBy default (patch_md = true), the script will:
- 
Remove UTF-8 BOMs (Byte Order Marks) from the start of files. 
- 
Normalize line endings to \n(Unix-style).
- 
Insert blank lines after thematic breaks if missing. 
Pattern detection:
- 
Horizontal rules: ---,***,___
- 
Regex used: ^(?:-{3}|(?:\*\s*){3}|(?:_\s*){3})\s*\n(?!\s*\n) 
| Before (will cause issues) | After (auto-patched) | 
|---|---|
| markdown<br>---<br>*Dieses Buch ist keine Schlussfolgerung.*<br> | markdown<br>---<br><br>*Dieses Buch ist keine Schlussfolgerung.*<br> | 
| markdown<br>***<br>**Chapter End**<br> | markdown<br>***<br><br>**Chapter End**<br> | 
| markdown<br>___<br>Text starts immediately<br> | markdown<br>___<br><br>Text starts immediately<br> | 
 ββββββββββββββββββββ
 β  Markdown Files   β
 β  (manuscript/)    β
 ββββββββββ¬ββββββββββ
          β
          βΌ
 ββββββββββββββββββββ
 β  Auto-Patch Step  β
 β  - Strip BOM      β
 β  - Normalize \n   β
 β  - Fix HR blocks  β
 ββββββββββ¬ββββββββββ
          β
          βΌ
 ββββββββββββββββββββ
 β   Pandoc Convert  β
 β   - Format: EPUB  β
 β   - Resources     β
 β   - Metadata      β
 ββββββββββ¬ββββββββββ
          β
          βΌ
 ββββββββββββββββββββ
 β   Output Files    β
 β  (output/...)     β
 ββββββββββββββββββββ
poetry run pandoc-batchUses settings from pyproject.toml.
poetry run pandoc-batch --test-onlyChecks all .md files for Pandoc parsing errors without writing output.
poetry run pandoc-batch --to pdf --extra --pdf-engine xelatexpoetry run pandoc-batch --no-patch-mdpoetry run pandoc-batch --fix-inplaceWhen used with --fix-inplace, the script writes patched Markdown back to the original files instead of only using a temporary patched copy for Pandoc.
What gets fixed:
- 
Adds a blank line after horizontal rules ( ---,***,___) if the next line is not blank.
- 
Removes any UTF-8 BOM at the start of the file. 
- 
Normalizes all line endings to \n(Unix style).
Important notes:
- 
Only files where a change is actually detected will be modified. 
- 
If your file has ---but itβs not on a line by itself (e.g.,--- text), it will not be changed β thatβs intentional.
- 
If you want to verify which files will be changed before running --fix-inplace, you can run:poetry run pandoc-batch --test-only This will use the in-memory patched version to test the build without altering your sources. 
Troubleshooting:
If --fix-inplace makes no changes when you expect it to, check that your horizontal rule is exactly on its own line with no extra text.
- 
File Collection 
 Finds all.mdfiles under--rootusingPath.rglob().
- 
Output Path Mapping 
 Builds a mirrored path under--outdirwith the correct file extension.
- 
Auto-Patching (Optional) - 
Reads the file in memory 
- 
Runs the patching regex and BOM remover 
- 
Writes the result to a temporary file (or in-place if --fix-inplaceis set)
- 
Passes the patched file to Pandoc 
 
- 
- 
Pandoc Invocation - 
Uses flags from CLI or pyproject.toml
- 
Supports extra arguments via --extra
 
- 
- 
Parallel Execution - 
Uses ThreadPoolExecutorto run multiple Pandoc processes simultaneously
- 
Job count is configurable with --jobs
 
- 
- 
Prepare Manuscript 
 Place all.mdfiles inmanuscript/, with subfolders forfront-matter/,chapters/,back-matter/.
- 
Check for Pandoc Errors Without Output poetry run pandoc-batch --test-only 
- 
Convert to EPUB (using defaults in pyproject.toml)poetry run pandoc-batch 
- 
Convert to PDF (override defaults) poetry run pandoc-batch --to pdf --extra --pdf-engine xelatex 
- 
Pandoc βwithBinaryFile: does not existβ error 
 β Usually caused by missing output directories. The script now creates them automatically.
- 
Strange formatting after ---
 β Caused by missing blank lines; the auto-patch fixes this automatically.
- 
Encoding errors 
 β Auto-patching removes BOMs and ensures UTF-8 compliance.
- 
Want to disable patching for performance? 
 β Run with--no-patch-mdor setpatch_md = falseinpyproject.toml.
The enhanced pandoc_batch.py is designed for robust, automated, and error-resistant Pandoc builds in book projects.
Its auto-patching ensures consistent formatting and eliminates one of the most common causes of EPUB/PDF generation failures.
Tip: Keep
patch_mdenabled for all production builds β itβs a safety net that costs almost no performance.
- π Home
- Project Initialization
- Generate Project Structure
- How to Write a Book
- Developer Workflow & Makefile
- Chapter File Generator
- Generate Images
- Convert Markdown Images
- Bulk Change File Extensions
- Restructure Chapters
- Translate Markdown with DeepL
- Translate with LM Studio
- Translation CLI Commands
- Shortcuts for Translation
- Automatic Book Export
- Shortcuts for Export
- Export HTML Chapters (Comics)
- Export to EPUB 2
- Pandoc Batch Processor
- Export HTML Books to PDF (KDP Ready)
Use this sidebar to navigate all key workflows β from setup to translation, export, and testing.