Voice Assistant for a Clinic

🎙️ A bilingual, AI-powered voice assistant* designed to simplify doctor appointment bookings at a Clinic. This solution combines speech-to-text (STT), text-to-speech (TTS), and large language models (LLMs) to deliver intuitive and efficient interactions.

🌟 Key Features

🗣️ Speech-to-Text (STT)

Real-Time Transcription:
- Powered by Whisper Streaming and AWS Transcribe.
- Supports Deepgram for prerecorded audio files.
- Multilingual capability: English and Arabic.
- Real-time partial transcription for better interactivity.

🔊 Text-to-Speech (TTS)

Natural voice responses using AWS Polly.
Supports high-quality voices for both English and Arabic.

🤖 AI-Powered Conversations

Context-Aware Dialogues: A ReAct LLM agent that can:
- Collect patient details like name, age, and insurance status.
- Suggest available clinic locations and doctor specialties.
- Schedule and book appointments with MongoDB integration.
Powered by GROQ or OpenAI GPT, offering flexible LLM backends.

🎨 Intuitive User Interface

Built with Streamlit for a clean and responsive design.
Sidebar for language selection and session management.
Real-time chat display with user-friendly message bubbles.
Debug Mode:
- View intermediate tool outputs and chat history.
- Separate debug panel for tracing.

📊 Data Integration

MongoDB as the database for managing:
- Patient records.
- Doctor information (schedules, specialties, locations).
- Appointment bookings.
Ready-to-use database seeding script with dummy data for testing.

🛠️ Tech Stack

Core Components

Frontend: Streamlit
Backend: Python
Database: MongoDB
LLM: GROQ or OpenAI GPT
Speech Processing:
- Whisper WhisperStreaming
- AWS Transcribe
- Deepgram (prerecorded audio)
Text-to-Speech: AWS Polly

Libraries & Tools

LangChain: Manages LLM conversation workflows and tools.
LangSmith: For tracing and debugging LLMs.
Boto3: AWS SDK for Polly and Transcribe integrations.
Librosa & SoundDevice: Audio handling and preprocessing.
Pymongo: MongoDB integration.

🚀 Getting Started

Prerequisites

Python 3.10+
MongoDB (Ensure it’s running locally or on the cloud).
API Keys:
- AWS Polly & Transcribe (or use Whisper Streaming).
- GROQ or OpenAI
- Deepgram (optional for prerecorded audio).

Installation

Clone the Repository:

git clone https://github.com/Fatma-Moanes/voice-assistant.git
cd voice-assistant

Install Dependencies:
```
poetry install
```
Set Up Environment Variables:
- Copy .env.example to .env:
```
cp .env.example .env
```
- Fill in your API keys and database credentials in the .env file.
Seed the Database: Populate the database with dummy data:
```
python utils/create_db.py
```
Run the Application: Start the Streamlit app for live audio:
```
poetry run streamlit run app/streamlit_app_streaming.py
```

🎨 UI Overview

Chat Interface

Dynamic Conversation: User and assistant messages are displayed in visually distinct bubbles.
Audio Input:
- Live microphone recording.
Error Handling:
- Clear notifications for failed transcription or processing.

Sidebar Settings

Language Selection:
- Choose between English, Arabic, or Auto Detect.
Session Controls:
- Clear chat history.
- Toggle Debug Mode.

Debug Panel

Real-time insight into:
- Chat history used by the LLM.
- Intermediate tool calls and responses.

🧰 Configuration

All application settings are managed through:

config.yml:
- Speech-to-Text model selection (WhisperStreaming, AWSStreaming, or Deepgram).
- Language preferences and model-specific configurations.
- Text-to-Speech (AWS Polly voices and region).
- LLM provider (GROQ or OpenAI).
.env:
- Store sensitive credentials such as API keys and database connection strings. Use .env.example as a template.

🌐 Environment Variables

Below is a summary of required .env variables:

# MongoDB
MONGODB_CONNECTION_STRING="your_connection_string"
DB_NAME="DoctorAppointmentDB"

# GROQ API
GROQ_API_KEY="your_groq_api_key"

# OpenAI API
OPENAI_API_KEY="your_openai_api_key"

# AWS Polly & Transcribe
AWS_ACCESS_KEY_ID="your_aws_access_key"
AWS_SECRET_ACCESS_KEY="your_aws_secret_key"

# Deepgram API (optional)
DEEPGRAM_API_KEY="your_deepgram_api_key"

✅ Completed Milestones

Speech-to-Text Enhancements:
- Real-time and partial transcription.
- Improved error handling for silent inputs.
Streamlined Conversations:
- Context-aware doctor booking logic.
- Integration with MongoDB for data persistence.
Debug Mode:
- Displays:
  - Intermediate steps (tool calls).
  - Processed chat history used by the AI agent.
Extensible Configuration:
- Support for multiple STT and LLM models.

✨ Acknowledgments

LangChain for managing LLM integrations.
Amazon Polly for high-quality voice synthesis.
Whisper Streaming and Deepgram for advanced transcription.
Streamlit for the responsive and interactive UI.
GROQ and OpenAI GPT for powering the conversational AI.

💬 Support

If you have any questions or encounter issues, feel free to open an issue or reach out via email at fmoanesnoureldin@gmail.com

Enjoy using the Voice Assistant for FM-Clinic! 🚀

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
.devcontainer		.devcontainer
app		app
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
__init__.py		__init__.py
packages.txt		packages.txt
poetry.lock		poetry.lock
poetry.toml		poetry.toml
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Voice Assistant for a Clinic

🌟 Key Features

🗣️ Speech-to-Text (STT)

🔊 Text-to-Speech (TTS)

🤖 AI-Powered Conversations

🎨 Intuitive User Interface

📊 Data Integration

🛠️ Tech Stack

Core Components

Libraries & Tools

🚀 Getting Started

Prerequisites

Installation

🎨 UI Overview

Chat Interface

Sidebar Settings

Debug Panel

🧰 Configuration

🌐 Environment Variables

✅ Completed Milestones

✨ Acknowledgments

💬 Support

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Fatma-Moanes/voice-assistant

Folders and files

Latest commit

History

Repository files navigation

Voice Assistant for a Clinic

🌟 Key Features

🗣️ Speech-to-Text (STT)

🔊 Text-to-Speech (TTS)

🤖 AI-Powered Conversations

🎨 Intuitive User Interface

📊 Data Integration

🛠️ Tech Stack

Core Components

Libraries & Tools

🚀 Getting Started

Prerequisites

Installation

🎨 UI Overview

Chat Interface

Sidebar Settings

Debug Panel

🧰 Configuration

🌐 Environment Variables

✅ Completed Milestones

✨ Acknowledgments

💬 Support

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages