Viewers for statistics and dashboarding of Domain Search Engine data
-
Updated
Jan 19, 2016 - Python
Viewers for statistics and dashboarding of Domain Search Engine data
Tika-Similarity uses the Tika-Python package (Python port of Apache Tika) to compute file similarity based on Metadata features.
A modern Python REST client for Apache Tika server
Benchmarking unstructured data extraction libraries
Processing system for the search engine service in Liquid Investigations.
Helps to parse bank statement(PDF)
Tesseract OCR wrapper for Apache Tika and/or Open Semantic ETL caching the OCR results, so Tika-Server or Open Semantic ETL has not to reprocess slow and expensive OCR on same images again
Google Translator API + Qt
Directory tree metadata parser using Apache Tika
Flask application for OCR and extraction of text from documents with support for repository applications
EphemerAl is a local-only AI chat server for Windows PCs (WSL2 + Docker) using Ollama (Gemma 3), Apache Tika and Streamlit. Runs on your LAN only: no cloud APIs, no RAG index, no user accounts and no persistent logs. Built for schools and small teams that need private, disposable conversations.
This is a proxy for Apache Tika that splits large documents into pages for parallel processing
PDF parser component (Apache Tika) for PCU project
Sample pipeline for parsing PDF and performing text processing
Extracting information from PDF files.
Add a description, image, and links to the tika topic page so that developers can more easily learn about it.
To associate your repository with the tika topic, visit your repo's landing page and select "manage topics."