🎙️ A bilingual, AI-powered voice assistant* designed to simplify doctor appointment bookings at a Clinic. This solution combines speech-to-text (STT), text-to-speech (TTS), and large language models (LLMs) to deliver intuitive and efficient interactions.
- Real-Time Transcription:
- Powered by Whisper Streaming and AWS Transcribe.
- Supports Deepgram for prerecorded audio files.
- Multilingual capability: English and Arabic.
- Real-time partial transcription for better interactivity.
- Natural voice responses using AWS Polly.
- Supports high-quality voices for both English and Arabic.
- Context-Aware Dialogues:
A ReAct LLM agent that can:
- Collect patient details like name, age, and insurance status.
- Suggest available clinic locations and doctor specialties.
- Schedule and book appointments with MongoDB integration.
- Powered by GROQ or OpenAI GPT, offering flexible LLM backends.
- Built with Streamlit for a clean and responsive design.
- Sidebar for language selection and session management.
- Real-time chat display with user-friendly message bubbles.
- Debug Mode:
- View intermediate tool outputs and chat history.
- Separate debug panel for tracing.
- MongoDB as the database for managing:
- Patient records.
- Doctor information (schedules, specialties, locations).
- Appointment bookings.
- Ready-to-use database seeding script with dummy data for testing.
- Frontend: Streamlit
- Backend: Python
- Database: MongoDB
- LLM: GROQ or OpenAI GPT
- Speech Processing:
- Whisper WhisperStreaming
- AWS Transcribe
- Deepgram (prerecorded audio)
- Text-to-Speech: AWS Polly
- LangChain: Manages LLM conversation workflows and tools.
- LangSmith: For tracing and debugging LLMs.
- Boto3: AWS SDK for Polly and Transcribe integrations.
- Librosa & SoundDevice: Audio handling and preprocessing.
- Pymongo: MongoDB integration.
- Python 3.10+
- MongoDB (Ensure it’s running locally or on the cloud).
- API Keys:
- AWS Polly & Transcribe (or use Whisper Streaming).
- GROQ or OpenAI
- Deepgram (optional for prerecorded audio).
-
Clone the Repository:
git clone https://github.com/Fatma-Moanes/voice-assistant.git cd voice-assistant
-
Install Dependencies:
poetry install
-
Set Up Environment Variables:
- Copy
.env.example
to.env
:cp .env.example .env
- Fill in your API keys and database credentials in the
.env
file.
- Copy
-
Seed the Database: Populate the database with dummy data:
python utils/create_db.py
-
Run the Application: Start the Streamlit app for live audio:
poetry run streamlit run app/streamlit_app_streaming.py
- Dynamic Conversation: User and assistant messages are displayed in visually distinct bubbles.
- Audio Input:
- Live microphone recording.
- Error Handling:
- Clear notifications for failed transcription or processing.
- Language Selection:
- Choose between English, Arabic, or Auto Detect.
- Session Controls:
- Clear chat history.
- Toggle Debug Mode.
- Real-time insight into:
- Chat history used by the LLM.
- Intermediate tool calls and responses.
All application settings are managed through:
-
config.yml
:- Speech-to-Text model selection (
WhisperStreaming
,AWSStreaming
, orDeepgram
). - Language preferences and model-specific configurations.
- Text-to-Speech (AWS Polly voices and region).
- LLM provider (GROQ or OpenAI).
- Speech-to-Text model selection (
-
.env
:- Store sensitive credentials such as API keys and database connection strings. Use
.env.example
as a template.
- Store sensitive credentials such as API keys and database connection strings. Use
Below is a summary of required .env
variables:
# MongoDB
MONGODB_CONNECTION_STRING="your_connection_string"
DB_NAME="DoctorAppointmentDB"
# GROQ API
GROQ_API_KEY="your_groq_api_key"
# OpenAI API
OPENAI_API_KEY="your_openai_api_key"
# AWS Polly & Transcribe
AWS_ACCESS_KEY_ID="your_aws_access_key"
AWS_SECRET_ACCESS_KEY="your_aws_secret_key"
# Deepgram API (optional)
DEEPGRAM_API_KEY="your_deepgram_api_key"
- Speech-to-Text Enhancements:
- Real-time and partial transcription.
- Improved error handling for silent inputs.
- Streamlined Conversations:
- Context-aware doctor booking logic.
- Integration with MongoDB for data persistence.
- Debug Mode:
- Displays:
- Intermediate steps (tool calls).
- Processed chat history used by the AI agent.
- Displays:
- Extensible Configuration:
- Support for multiple STT and LLM models.
- LangChain for managing LLM integrations.
- Amazon Polly for high-quality voice synthesis.
- Whisper Streaming and Deepgram for advanced transcription.
- Streamlit for the responsive and interactive UI.
- GROQ and OpenAI GPT for powering the conversational AI.
If you have any questions or encounter issues, feel free to open an issue or reach out via email at fmoanesnoureldin@gmail.com
Enjoy using the Voice Assistant for FM-Clinic! 🚀