🐊 Snappy's unique approach unifies vision-language late interaction with structured OCR for region-level knowledge retrieval. Like the project? Drop a star! ⭐
-
Updated
Dec 5, 2025 - Python
🐊 Snappy's unique approach unifies vision-language late interaction with structured OCR for region-level knowledge retrieval. Like the project? Drop a star! ⭐
Jina VDR is a multilingual, multi-domain benchmark for visual document retrieval
Python program for searching pdf text, ranking the results and exporting highlighted search results in pdf. Uses trie structure, stack, heap, page graph. Converts queries to postfix notation. Allows for logical expressions and phrases. Offers did you mean functionality.
DocuVisQA(Document Visual Question Answering) is a Python project that leverages Google's Generative AI and Langchain for document processing, text splitting, and question answering. It also supports image processing with Streamlit for interactive UI.
A web interface that allows searching for PDFs by their content
Use semantic search on PDFs locally
CLI for merging PDF contexts.
In Development
Given a set of PDFs and the query, the most relevant pdf can be found with the help of TF-IDF. The code has not used any library to implement TF-IDF
A tool to search for text in PDF files using multiple methods, including OCR (Optical Character Recognition).
Cognivia AI is a powerful AI-powered PDF search and question-answering system built with LangChain, Pinecone Vector Store, OpenAI, and Supabase. Upload PDFs, ask questions, and get intelligent answers with persistent conversation memory.
Programa que busca uma lista de nomes das Partes Processuais nos PDFs do Diário Oficial.
Are you short on time?! Can't you search all the PDFs one by one for the content you want?! Well, PDF-Founder is here...
Repository for the Indexing, Search and Evaluation of UniChemFinder
This Python script allows users to search through PDF documents located in predefined directories for specific keywords. It uses PyPDF2 to extract text from PDFs and supports single or dual keyword searches.
A high-performance RAG system for PDFs using multi-vector embeddings (ColPali / ColQwen / ColSmol) with vector search in Qdrant, prefetch optimization, and reranking for improved relevance. Designed for speed, accuracy, and scalability, this system is ideal for building intelligent search, document understanding, and QA applications.
Python console app that uses smart searching through the provided PDF. It showcases the use of tries for word searching.
📄 PDF Search Engine – Advanced keyword-based PDF search with logical operators, graph-based ranking, autocomplete, and highlighted exports.
An AI-powered Streamlit app for PDF and web-based Q&A using RAG (Retrieval-Augmented Generation), Groq’s Mixtral LLM, and DeepAI image generation.
Build a workflow using CrewAI tools to scrape the content from the docs and then perform RAG on it.
Add a description, image, and links to the pdf-search topic page so that developers can more easily learn about it.
To associate your repository with the pdf-search topic, visit your repo's landing page and select "manage topics."