A simple Node.js script to split a PDF into individual pages and create JPEG thumbnails. Split PDF to individual pages with inherited file name. In some business cases is beneficial to use chunking strategy based on individual pages especially for semantic search when the goal is to show specific page rather than compile response to user request. PDF Splitter is right tool for initial PDF processing. After splitting you can use another my tool for text and image extraction, MistralOCR.
- Node.js & npm: Must be installed on your system.
- Homebrew: Required for installing Poppler on macOS.
- Poppler: The script uses the
pdftocairo
command-line tool. Install via Homebrew:brew install poppler
- Node.js Dependencies: The script uses
pdf-lib
.
- Save the script (e.g.,
split_pdf.js
) in a folder. - Place the PDF you want to process in the same folder.
- Open your terminal and navigate (
cd
) into that folder. - Install the necessary Node.js package:
(If you have a
npm install pdf-lib
package.json
file, just runnpm install
)
Execute the script using Node.js, providing the name of your PDF file as a command-line argument:
node split_pdf.js your_file_name.pdf