Extract bookmarks from PDFs and split documents using a companion CSV—served as a FastAPI web service and shipped as a Podman container.
-
Bookmark Export
Parse PDF outlines and generate per-level CSV files containing start/end page ranges. -
PDF Splitting
Accept a CSV withsplit,name,from,toand split the original PDF into multiple fragments. -
Containerized API
Lightweight container runs a FastAPI app on port8080. -
CLI Client (Bash)
Includes a script to upload PDFs, retrieve ZIPs, and save results to a dedicated per-file directory.
split-pdf-bookmarks/
├── app
│ ├── __init__.py
│ ├── main.py
│ ├── routers
│ │ ├── bookmarks.py
│ │ ├── split.py
│ │ └── utils.py
│ └── services
│ ├── bookmarks.py
│ ├── exceptions.py
│ ├── split.py
│ └── utils.py
├── LICENSE
├── podman
│ ├── app
│ │ ├── Containerfile
│ │ ├── entrypoint.sh
│ │ └── requirements.txt
│ └── tests
│ ├── Containerfile
│ └── requirements.txt
├── README.md
├── split-pdf-bookmarks.sh
└── tests
├── conftest.py
├── __init__.py
└── test_bookmarks_zip.py
ln -s "$(realpath split-pdf-bookmarks.sh)" ~/.local/bin/split-pdf-bookmarkspodman run -d -p 8080:8080 docker.io/pjfsu/split-pdf-bookmarks:latestYou can use another host port (container port is 8080)
split-pdf-bookmarks "Effective DevOps.pdf"
ls -1 "Effective DevOps"/
bookmarks.zipunzip "Effective DevOps"/bookmarks.zip -d "Effective DevOps"/
ls -1 "Effective DevOps"/
bookmarks_level_0.csv
bookmarks_level_1.csv
bookmarks_level_2.csv
bookmarks_level_3.csv
bookmarks.zipSet "split" to "y" for the entries you want to extract:
vim "Effective DevOps"/bookmarks_level_1.csv
"split","name","from","to"
"n","Introducing Effective Devops",22,22
...
"y","Chapter 1. The Big Picture",33,42
"y","Chapter 2. What Is Devops?",43,48
...
"n","Chapter 20. Further Resources",387,392
split-pdf-bookmarks "Effective DevOps.pdf" "Effective DevOps/bookmarks_level_1.csv"
ls -1 "Effective DevOps"/*zip
'Effective DevOps/bookmarks.zip'
'Effective DevOps/pdfs.zip'unzip "Effective DevOps"/pdfs.zip -d "Effective DevOps"
ls -1 "Effective DevOps"/Chapter*
'Effective DevOps/Chapter 1. The Big Picture.pdf'
'Effective DevOps/Chapter 2. What Is Devops.pdf'POST a pdf --> returns ZIP of per-level bookmarks in CSV.
POST a pdf + csvfile --> returns ZIP of PDF fragments.
split,name,from,to
split:"y"/"n"means the row will/won't be usedname: filename for the generated PDFfrom,to: start/end page (inclusive)
Use split-pdf-bookmarks.sh to send requests locally:
- Automatically detects running container
- Determines endpoint based on arguments
- Creates a dedicated output folder named after the input PDF
Usage:
./split-pdf-bookmarks.sh book.pdf # Export bookmarks
./split-pdf-bookmarks.sh book.pdf bookmarks.csv # Split PDFpodman network create testnet
podman build -t split-pdf-bookmarks-tests -f ./podman/tests/Containerfile .
podman run -d --rm --network testnet -p 8080:8080 --name split-pdf-bookmarks docker.io/pjfsu/split-pdf-bookmarks:latest
podman run --rm --network testnet -e API_URL=http://split-pdf-bookmarks:8080 split-pdf-bookmarks-tests:latestGPLv3 License. See LICENSE for terms.
- Web UI front-end for preview and interaction
I hope this program is useful to you. Thank you very much for visiting this repository!
Espero que este programa te sea útil. Muchas gracias por visitar este repositorio!
Espero que este programa séache de utilidade. Moitas grazas por visitar este repositorio!