Document Search Tools Server

A standalone server that allows users to upload PDF documents and after documents are parsed and processed, users are able to call /v1/tools-execute, passing a conversation with pending tool calls. Server answers tool calls, producing new Messages with Results: RAG Messages, Helper Messages, or Text Messages.

Please, visit http://localhost:8011/docs to access the API documentation All models that endpoints accept or output, are also available. Documentation is always up-to-date as it's generated on APP's startup.

Hint: in documentation "tick" show Response Schema. If it's too long to read: copy-paste to LLM and ask it to create structures in your language/library

Alternatively, RAW OpenAPI documentation can be accessed at http://localhost/v1/openapi.json -- Useful for alternative API Clients: Yaak, Postman, etc.

Overview

This application provides a complete pipeline for:

Uploading PDF files OR putting PDF files into shared folder
Extracting text content from PDFs
Processing the extracted content, uploading to OpenAI's File and Vector Store APIs.
Providing an API interface to List and Execute Tools

Features

File Management: Upload, list, and delete PDF files
Asynchronous Processing: Background workers handle resource-intensive tasks
Vector Search: Semantic search capabilities using OpenAI's vector stores
Stateful Processing: Track processing status of documents from upload to completion

Installation

Download docker-compose-stack.yaml

wget -O docker-compose-stack.yaml https://raw.githubusercontent.com/valaises/pdf-chat/refs/heads/main/docker-compose-stack.yaml

Set in your environment following variables:
OPENAI_API_KEY and OPENROUTER_API_KEY

read more about openrouter -- a unified API interface to LLMs

verify that:

printenv | grep -E 'OPENAI_API_KEY|OPENROUTER_API_KEY'

Start docker compose

docker compose -f docker-compose-stack.yaml up -d

Installation, Dev

Clone repository

git clone -b eval https://github.com/valaises/pdf-chat.git

cd into directory

cd pdf-chat

Set in your environment following variables:
OPENAI_API_KEY and OPENROUTER_API_KEY

read more about openrouter -- a unified API interface to LLMs

verify that:

printenv | grep -E 'OPENAI_API_KEY|OPENROUTER_API_KEY'

Start docker compose

docker compose -f docker-compose-stack-dev.yaml up -d

Specify Endpoints & API KEY in Chat UI

In your browser open: http://localhost:5173 Click on settings, and specify:

Section: Connections

API Endpoint: http://localhost:7016/v1

API KEY: admin1234

Section: MCPL Servers

Add Server: http://pdf-chat:8011/v1

Head back to Chat

Head back to chat and on "Select Model" select model e.g. gpt-4o

Make a test request: "what documents do I have"?

Expected output:

Telescope Emoji + Tool Name -- means model decided to call tool "Tool Name"

PaperClip Emoji + Tool Name -- means tool call of tool "Tool Name" is completed and results are attached to the chat

Hint: click on those elements to expand them, and view their internals

Evaluation

Evaluation has a dedicated README.md

Access Evaluation Results in GUI

visit http://localhost:8011/v1/experiments to access interactive view format for finished evaluations

Technical Details

Architecture

The application follows a modular architecture with these key components:

FastAPI Web Server: Handles HTTP requests for file uploads, listing, and chat interactions
Background Workers: Additional Threads that process files without blocking the web server (CPU/IO intensive tasks)
Repository Layer: Abstracts database operations for file metadata
PDF Extraction Library: Custom approaches for extracting structured text from PDFs
OpenAI Integration: Wrappers around OpenAI's API for vector stores and file uploads

Storage

SQLite Database: Stores file metadata including:
- Original and hashed filenames
- User ID
- Creation timestamp
- Processing status
- Vector store ID
File System Storage:
- Uploaded PDFs stored with hashed filenames
- Extracted text stored in JSONL format
- Visualization of highlights of extracted paragraphs (optional, hardcoded in w_extractor.py)
OpenAI Vector Stores:
- Semantic search capabilities using OpenAI's embeddings
- Enables natural language querying of document content

Processing Pipeline

Upload Phase:
- User uploads a PDF file via the /v1/file-upload endpoint OR File is put into shared folder and v1/file-create is called to mark file to process.
- File is saved with a hashed filename
- Database record of a file is created with empty processing status
Extraction Phase:
- The extractor worker monitors for new files
- When a new file is detected, text is extracted from the PDF
- Extraction parses paragraphs and sections, assign highlight coordinates
- Status is updated to "extracted" when complete
Processing Phase:
- The processor worker monitors for files with "extracted" status
- Status is updated to "processing" during this phase
- A vector store is created if it does not exist
- Paragraphs are uploaded as files into OpenAI's File API
- Those 'files' are assigned to created vector stored
- Status is updated to "complete" when finished
- Any orphaned files are cleaned up from OpenAI's File and Vector Store APIs.
Query Phase:
- User calls tools-execute with given messages in OpenAI format
- IF messages (after latest user message) contain unanswered tool calls, tools are executed: list_documents or search_in_doc
- After tools are validated, and then executed, returning tool answer messages in output

Status Tracking

Files progress through these statuses:

Empty status: Newly uploaded, awaiting processing
"extracted": Text has been extracted from the PDF
"processing": Currently being processed by the processor worker
"incomplete": Processing was interrupted and needs to be resumed
"complete": Fully processed
"error: [message]": An error occurred during processing

Roadmap TBD

Sentence / Smaller parts then paragraphs Highlights (without coords) (~Easy-Moderate)
RAG re-ranking (with OR without summarizations) (~Moderate)
Documents' summarization pipeline (~Moderate)
Other Document Storing Options -- e.g. S3 API (~Moderate)
Better PDF object detection using CV Model (~Difficult, Research needed)
Non-text PDFs support using CV model for OD, then extraction of text using OCR (~Moderate, after CV Model implemented)
Questions about Drawings (~Difficult-Very Difficult, after CV Model implemented)

Name		Name	Last commit message	Last commit date
Latest commit History 120 Commits
.github/workflows		.github/workflows
Drawings		Drawings
assets		assets
scripts		scripts
src		src
.dockerignore		.dockerignore
.python-version		.python-version
Dockerfile		Dockerfile
README.md		README.md
docker-compose-stack-dev.yaml		docker-compose-stack-dev.yaml
docker-compose-stack.yaml		docker-compose-stack.yaml
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Document Search Tools Server

Overview

Features

Installation

Installation, Dev

Specify Endpoints & API KEY in Chat UI

Section: Connections

Section: MCPL Servers

Head back to Chat

Evaluation

Access Evaluation Results in GUI

Technical Details

Architecture

Storage

Processing Pipeline

Status Tracking

Roadmap TBD

About

Uh oh!

Releases

Packages

Languages

v4ler11/pdf-chat

Folders and files

Latest commit

History

Repository files navigation

Document Search Tools Server

Overview

Features

Installation

Installation, Dev

Specify Endpoints & API KEY in Chat UI

Section: Connections

Section: MCPL Servers

Head back to Chat

Evaluation

Access Evaluation Results in GUI

Technical Details

Architecture

Storage

Processing Pipeline

Status Tracking

Roadmap TBD

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages