A self-hosted search engine for documents.
-
Updated
Apr 22, 2025 - Java
A self-hosted search engine for documents.
Bachelor Thesis | Text extraction from complex video scenes
Tess4J CLI OCR Tool is a command-line application that extracts text from images and PDFs using the Tess4J library, with support for multiple languages. The extracted text is automatically copied to the clipboard for easy access.
Tika per page PDF extractor server returning content as JSON.
Simple server to extract text from a PDF
Arachnio client library for Java 11+
A Spring Boot-based OCR Exporter tool that extracts text from image or PDF files using the OCR Space API and exports the results to various formats such as PDF, text, Word, or a database.
Yet Another Document 2 Text for pdf/doc/html/rft/etc - Extract text - or - convert to simplified HTML to retain layout information
Run Apache Tika as a service in AWS Lambda by scanning documents in S3 and storing the extracted text back to S3
Extract and detect text from the captured image and also selected images from the gallery.
A Cloud-Native Infrastructure for License Plate Recognition and Text Extraction with Python Integration
Text extraction: a highway to systematically process car reviews
Add a description, image, and links to the text-extraction topic page so that developers can more easily learn about it.
To associate your repository with the text-extraction topic, visit your repo's landing page and select "manage topics."