This project provides a Retrieval Augmented Generation (RAG) API that acts as an intelligent chatbot, named Meera, capable of answering user queries based on your custom data. It combines LlamaIndex for efficient retrieval, ChromaDB for vector storage, and Edge TTS for generating lifelike speech responses.
- RAG Chatbot (Meera): Answers user queries using vector-based retrieval and large language models (LLMs).
- Vector Store Indexing: Uses HuggingFace embeddings and ChromaDB for storing document vectors and enabling fast similarity searches.
- Chat Engine: Supports conversational memory, enabling context-aware responses.
- Text-to-Speech (TTS): Converts responses into natural-sounding audio with Microsoft Edge TTS.
- Gradio Demo: Interactive UI for voice input and response playback.
-
Run the API Server: The API provides endpoints to interact with the chat engine and generate speech.
uvicorn app:app --port 8000 --reload
Endpoints:
POST /query– Retrieve query response.POST /tts– Generate speech from text.
-
Run the Gradio Voice Assistant Demo: Launch the Gradio demo to test the end-to-end functionality
python main_demo.py Or, uvicorn main_demo:app --port 8000 --reload
- Accepts voice input from the user via microphone
- Transcribes audio using speech recognition
- Retrieves query response from query / chat engine
- Speaks response in the selected AI generated voice
-
Run Speech Recognition Test: Try out speech recognition functionality using Gradio.
python gradio_sr.py Or, uvicorn gradio_sr:app --port 8000 --reload
-
Indexing Data
- Documents in
./dataare embedded usingBAAI/bge-base-en-v1.5. - Vectors are stored in ChromaDB (
./indexes/chroma). - Collections allow fast retrieval of similar content during queries.
- Documents in
-
Retrieval and Chat
- Queries use
VectorIndexRetrieverfor fetching relevant chunks. - Responses are generated via LLMs (ChatGPT, Mistral AI, Hugging Face models, etc.).
- The system supports conversational memory for natural dialogue.
- Queries use
-
Text-to-Speech
- Responses can be synthesized into audio using Edge TTS with customizable voices.
- FastAPI – REST API framework
- LlamaIndex – Data indexing, retrieval & chat engine
- ChromaDB – Vector store for efficient similarity search
- HuggingFace Embeddings – To generate dense vector representations
- Edge TTS – Text-to-speech synthesis
- Gradio – For building interactive demos
- Quickly index and query your own document collections.
- Swap between LLMs (ChatGPT, HuggingFace, Mistral) with minimal changes.
- Supports multilingual TTS with lifelike voices.