Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.
-
Updated
Feb 3, 2025 - HTML
Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.
🏭 PDF text extraction pipeline: self-hosted, local-first, Docker-based
Aspose.PDF for Javascript via C++
Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.
Extract structured text and data from documents like invoices, book pages, tables, etc.. using OpenCV and Tesseract OCR
Add a description, image, and links to the pdf-to-text topic page so that developers can more easily learn about it.
To associate your repository with the pdf-to-text topic, visit your repo's landing page and select "manage topics."