Benchmarking PDF libraries
-
Updated
Jul 2, 2025 - Python
Benchmarking PDF libraries
pdfgui_tools is a user interface tool developed in Qt and Python that integrates with poppler-utils and PyPDF2 for PDF document management. It's a simple and user-friendly tool that includes various utilities.
A simple pdftotext conversion tool for Windows 8.1/10/11 and FEDORA/UBUNTU/DEBIAN/ARCH based linux distros using poppler-utils and Google's tesseract-ocr.
Usage of stylometry and machine learning in computer forensics - real tools used in 2019 by the polish police. Everything in/for polish language.
Create a searchable pdf from a scanned PDF
Multi-Modal RAG: An AI-powered pipeline for extracting, chunking, and summarizing content from PDF documents using advanced chunking strategies and generative models. Includes support for text, tables, and images, with vector search and retrieval via ChromaDB.
Split PDF pages horizontally into two separate images utilizing Pillow image processing and Poppler-utils.
Extracted data from pdf files of resumes written in English. Used libraries: spacy, pdf2image, easyocr, poppler-utils.
Digitizes and structures voter information from scanned electoral roll PDFs using computer vision and OCR with with OpenCV, Tesseract, and Pandas
A Jekyll plugin to generate thumbnails for your PDF files
Here, extracted information from sample random webscraped passports from both pdf and jpg file extensions
Add a description, image, and links to the poppler-utils topic page so that developers can more easily learn about it.
To associate your repository with the poppler-utils topic, visit your repo's landing page and select "manage topics."