A standalone server that allows users to upload PDF documents and after documents are parsed and processed, users are able to call /v1/tools-execute, passing a conversation with pending tool calls. Server answers tool calls, producing new Messages with Results: RAG Messages, Helper Messages, or Text Messages.
Please, visit http://localhost:8011/docs to access the API documentation All models that endpoints accept or output, are also available. Documentation is always up-to-date as it's generated on APP's startup.
Hint: in documentation "tick" show Response Schema. If it's too long to read: copy-paste to LLM and ask it to create structures in your language/library
Alternatively, RAW OpenAPI documentation can be accessed at http://localhost/v1/openapi.json -- Useful for alternative API Clients: Yaak, Postman, etc.
This application provides a complete pipeline for:
- Uploading PDF files OR putting PDF files into shared folder
- Extracting text content from PDFs
- Processing the extracted content, uploading to OpenAI's File and Vector Store APIs.
- Providing an API interface to List and Execute Tools
- File Management: Upload, list, and delete PDF files
- Asynchronous Processing: Background workers handle resource-intensive tasks
- Vector Search: Semantic search capabilities using OpenAI's vector stores
- Stateful Processing: Track processing status of documents from upload to completion
Download docker-compose-stack.yaml
wget -O docker-compose-stack.yaml https://raw.githubusercontent.com/valaises/pdf-chat/refs/heads/main/docker-compose-stack.yamlSet in your environment following variables:
OPENAI_API_KEY and OPENROUTER_API_KEY
read more about openrouter -- a unified API interface to LLMs
verify that:
printenv | grep -E 'OPENAI_API_KEY|OPENROUTER_API_KEY'Start docker compose
docker compose -f docker-compose-stack.yaml up -dClone repository
git clone -b eval https://github.com/valaises/pdf-chat.gitcd into directory
cd pdf-chatSet in your environment following variables:
OPENAI_API_KEY and OPENROUTER_API_KEY
read more about openrouter -- a unified API interface to LLMs
verify that:
printenv | grep -E 'OPENAI_API_KEY|OPENROUTER_API_KEY'Start docker compose
docker compose -f docker-compose-stack-dev.yaml up -dIn your browser open: http://localhost:5173 Click on settings, and specify:
API Endpoint: http://localhost:7016/v1
API KEY: admin1234
Add Server: http://pdf-chat:8011/v1
Head back to chat and on "Select Model" select model e.g. gpt-4o
Make a test request: "what documents do I have"?
Expected output:
Telescope Emoji + Tool Name -- means model decided to call tool "Tool Name"
PaperClip Emoji + Tool Name -- means tool call of tool "Tool Name" is completed and results are attached to the chat
Hint: click on those elements to expand them, and view their internals
Evaluation has a dedicated README.md
visit http://localhost:8011/v1/experiments to access interactive view format for finished evaluations

The application follows a modular architecture with these key components:
- FastAPI Web Server: Handles HTTP requests for file uploads, listing, and chat interactions
- Background Workers: Additional Threads that process files without blocking the web server (CPU/IO intensive tasks)
- Repository Layer: Abstracts database operations for file metadata
- PDF Extraction Library: Custom approaches for extracting structured text from PDFs
- OpenAI Integration: Wrappers around OpenAI's API for vector stores and file uploads
-
SQLite Database: Stores file metadata including:
- Original and hashed filenames
- User ID
- Creation timestamp
- Processing status
- Vector store ID
-
File System Storage:
- Uploaded PDFs stored with hashed filenames
- Extracted text stored in JSONL format
- Visualization of highlights of extracted paragraphs (optional, hardcoded in
w_extractor.py)
-
OpenAI Vector Stores:
- Semantic search capabilities using OpenAI's embeddings
- Enables natural language querying of document content
-
Upload Phase:
- User uploads a PDF file via the
/v1/file-uploadendpoint OR File is put into shared folder andv1/file-createis called to mark file to process. - File is saved with a hashed filename
- Database record of a file is created with empty processing status
- User uploads a PDF file via the
-
Extraction Phase:
- The extractor worker monitors for new files
- When a new file is detected, text is extracted from the PDF
- Extraction parses paragraphs and sections, assign highlight coordinates
- Status is updated to
"extracted"when complete
-
Processing Phase:
- The processor worker monitors for files with
"extracted"status - Status is updated to
"processing"during this phase - A vector store is created if it does not exist
- Paragraphs are uploaded as files into OpenAI's File API
- Those 'files' are assigned to created vector stored
- Status is updated to
"complete"when finished - Any orphaned files are cleaned up from OpenAI's File and Vector Store APIs.
- The processor worker monitors for files with
-
Query Phase:
- User calls tools-execute with given messages in OpenAI format
- IF messages (after latest user message) contain unanswered tool calls, tools are executed: list_documents or search_in_doc
- After tools are validated, and then executed, returning tool answer messages in output
Files progress through these statuses:
- Empty status: Newly uploaded, awaiting processing
"extracted": Text has been extracted from the PDF"processing": Currently being processed by the processor worker"incomplete": Processing was interrupted and needs to be resumed"complete": Fully processed"error: [message]": An error occurred during processing
- Sentence / Smaller parts then paragraphs Highlights (without coords) (~Easy-Moderate)
- RAG re-ranking (with OR without summarizations) (~Moderate)
- Documents' summarization pipeline (~Moderate)
- Other Document Storing Options -- e.g. S3 API (~Moderate)
- Better PDF object detection using CV Model (~Difficult, Research needed)
- Non-text PDFs support using CV model for OD, then extraction of text using OCR (~Moderate, after CV Model implemented)
- Questions about Drawings (~Difficult-Very Difficult, after CV Model implemented)
