This project is a sophisticated, multi-functional chat service designed to interact with your documents. It leverages a modular architecture to provide context-aware answers using Retrieval-Augmented Generation (RAG), and can use external tools like web search and a calculator to enhance its capabilities. It supports a wide variety of Large Language Model (LLM) backends and is built with a focus on robustness and modularity.
Note: This project is under active development. The API server implementation and Docker containerization are currently works in progress and not yet production-ready.
- Retrieval-Augmented Generation (RAG): Implemented as a core tool, it ingests documents to provide answers based on their content.
- Multi-LLM Support: A pluggable architecture allows for using various LLM providers, including OpenAI, Google AI, Hugging Face, Together AI, and a local Llama.cpp client, configured via
config.yaml
. - Tool Integration: Extensible agent-like capabilities with tools for document retrieval, web search, and calculations.
- Robust Engineering: Includes production-ready features like a circuit breaker, error recovery, and retry mechanisms for external API calls.
- Flexible Configuration: Easily configure the application, including the LLM client, model parameters, and tools, via a central
config.yaml
file managed with Pydantic. - Containerization: (In Progress) Dockerfiles are provided but are currently under development.
- API Server: (In Progress) A FastAPI application is being developed but is not yet complete.
The service is built using a modern Python stack, centered around a modular architecture that facilitates Retrieval-Augmented Generation and tool use.
-
API Layer: (In Development) The service will include a FastAPI application (
documentchatter/controller.py
) to provide a robust and high-performance asynchronous API for handling user chat requests. -
RAG Pipeline: The RAG capability is implemented through a dedicated
retrieval_tool
andvector_store
manager:- Document Loading & Processing: Documents are loaded, split into chunks, and converted into numerical vector embeddings.
- Vector Storage: The embeddings and their corresponding text chunks are stored and managed by the
vector_store/manager.py
, which supports vector databases like ChromaDB or FAISS. - Retrieval as a Tool: When a user asks a question, the
retrieval_tool
is invoked. It embeds the user's query and searches the vector store to find the most relevant text chunks from the documents.
-
LLM and Agent Orchestration:
- An orchestration layer within the
controller
manages the interaction between the user query, the available tools, and the selected LLM. - It uses an agent-like framework to decide whether to answer directly, use the
retrieval_tool
to get document context, or use other tools like theweb_search_tool
orcalculator_tool
. - The Multi-LLM support is achieved through an abstraction layer (
documentchatter/base/base_llm_client.py
) with specific implementations for each provider indocumentchatter/llm_clients/
.
- An orchestration layer within the
-
Resilience:
- The
documentchatter/utils/
module contains robust engineering components. - Retries: Network requests to external services are wrapped in a retry mechanism (
retry_config.py
) to handle transient failures. - Circuit Breaker: A circuit breaker pattern (
circuit_breaker.py
) prevents the application from repeatedly calling a failing service, improving system stability.
- The
The project is organized within the documentchatter
Python package for modularity and scalability.
document-chat-service/
├── documentchatter/ # Main application source code
│ ├── controller.py # FastAPI application entry point (in development)
│ ├── bin/main_v2.py # Application runner script
│ ├── base/ # Abstract base classes for core components
│ ├── config/ # Configuration loading (pydantic_config.py)
│ ├── llm_clients/ # Implementations for different LLM providers
│ ├── tools/ # Implementations for tools (retrieval, web search, etc.)
│ ├── utils/ # Resilience and utility modules (circuit breaker, retry)
│ ├── vector_store/ # Vector database management for RAG
│ ├── prompt_templates/ # Prompt templates for the LLM
│ ├── Dockerfile # Docker configuration (in development)
│ └── requirements.txt # Python dependencies
├── documents/ # Directory for storing your documents (needs to be created)
└── README.md # This file
- Python 3.11+
- Docker (optional, for containerized deployment - not yet fully implemented)
- API keys for the LLM providers and tools you plan to use.
-
Clone the repository:
git clone https://github.com/singultek/document-chat-service.git cd document-chat-service
-
Install the required Python packages:
pip install -r documentchatter/requirements.txt
-
Create a
documents
directory in the project root and place your files (e.g., PDF, TXT) inside it.mkdir documents
The application is configured through the config.yaml
file in the project root. Here you can set:
- The LLM provider (
openai
,google
,huggingface
,together
,local
) - Model parameters (e.g.,
temperature
,max_tokens
) - Tool settings (e.g., enable/disable web search)
- Vector store and embedding model configurations
Note: The API server functionality is still under development. The current implementation provides basic capabilities but is not yet feature-complete.
Run the application by executing the main script from the project's root directory:
python documentchatter/bin/main.py
The application will process documents in the documents
directory and initialize the necessary components based on your configuration.
To stop the application, press CTRL+C
in your terminal.