Chatbot: RAG with Speech

This project provides a Retrieval Augmented Generation (RAG) API that acts as an intelligent chatbot, named Meera, capable of answering user queries based on your custom data. It combines LlamaIndex for efficient retrieval, ChromaDB for vector storage, and Edge TTS for generating lifelike speech responses.

Features

RAG Chatbot (Meera): Answers user queries using vector-based retrieval and large language models (LLMs).
Vector Store Indexing: Uses HuggingFace embeddings and ChromaDB for storing document vectors and enabling fast similarity searches.
Chat Engine: Supports conversational memory, enabling context-aware responses.
Text-to-Speech (TTS): Converts responses into natural-sounding audio with Microsoft Edge TTS.
Gradio Demo: Interactive UI for voice input and response playback.

How to Run

Run the API Server: The API provides endpoints to interact with the chat engine and generate speech.
```
uvicorn app:app --port 8000 --reload
```
Endpoints:
- POST /query – Retrieve query response.
- POST /tts – Generate speech from text.
Run the Gradio Voice Assistant Demo: Launch the Gradio demo to test the end-to-end functionality
```
python main_demo.py
Or,
uvicorn main_demo:app --port 8000 --reload
```
- Accepts voice input from the user via microphone
- Transcribes audio using speech recognition
- Retrieves query response from query / chat engine
- Speaks response in the selected AI generated voice
Run Speech Recognition Test: Try out speech recognition functionality using Gradio.
```
python gradio_sr.py
Or,
uvicorn gradio_sr:app --port 8000 --reload
```

How It Works

Indexing Data
- Documents in ./data are embedded using BAAI/bge-base-en-v1.5.
- Vectors are stored in ChromaDB (./indexes/chroma).
- Collections allow fast retrieval of similar content during queries.
Retrieval and Chat
- Queries use VectorIndexRetriever for fetching relevant chunks.
- Responses are generated via LLMs (ChatGPT, Mistral AI, Hugging Face models, etc.).
- The system supports conversational memory for natural dialogue.
Text-to-Speech
- Responses can be synthesized into audio using Edge TTS with customizable voices.

Tech Stack

FastAPI – REST API framework
LlamaIndex – Data indexing, retrieval & chat engine
ChromaDB – Vector store for efficient similarity search
HuggingFace Embeddings – To generate dense vector representations
Edge TTS – Text-to-speech synthesis
Gradio – For building interactive demos

Highlights

Quickly index and query your own document collections.
Swap between LLMs (ChatGPT, HuggingFace, Mistral) with minimal changes.
Supports multilingual TTS with lifelike voices.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
data		data
indexes/chroma		indexes/chroma
rag		rag
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
app.py		app.py
gradio_sr.py		gradio_sr.py
main_demo.py		main_demo.py
requirements.txt		requirements.txt
scratch.py		scratch.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Chatbot: RAG with Speech

Features

How to Run

How It Works

Tech Stack

Highlights

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Chatbot: RAG with Speech

Features

How to Run

How It Works

Tech Stack

Highlights

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages