This repository expands upon Pipecat's Python framework for building voice and multimodal conversational agents. Our implementation creates AI meeting agents that can join and participate in Google Meet and Microsoft Teams meetings with distinct personalities and capabilities defined in Markdown files.
This project extends Pipecat's WebSocket server implementation to create:
- Meeting agents that can join Google Meet or Microsoft Teams through the MeetingBaas API
- Customizable personas with unique context
- Support for running multiple instances via a simple API
- WebSocket-based communication for real-time interaction
Pipecat provides the foundational framework with:
- Real-time audio processing pipeline
- WebSocket communication
- Voice activity detection
- Message context management
In this implementation, Pipecat is integrated with Cartesia for speech generation (text-to-speech), Gladia or Deepgram for speech-to-text conversion, and OpenAI's GPT-4 as the underlying LLM.
The project follows a streamlined API-first approach with:
- A lightweight FastAPI server that handles bot management via direct MeetingBaas API calls
- WebSocket server for real-time communication between MeetingBaas and Pipecat
- Properly typed Pydantic models for request/response validation
- Clean separation of concerns with modular components
-
Root endpoint (
GET /
):- Health check endpoint
- Returns:
{"message": "MeetingBaas Bot API is running"}
-
Run Bots (
POST /run-bots
):{ "meeting_url": "https://meet.google.com/xxx-yyyy-zzz", "personas": ["interviewer"], "meeting_baas_api_key": "your-api-key", "bot_image": "https://example.com/avatar.jpg", "entry_message": "Hello, I'm here to help!" }
- Required fields:
meeting_url
andmeeting_baas_api_key
- The WebSocket URL is determined automatically (see WebSocket URL Resolution below)
- Returns: MeetingBaas bot ID and client ID for WebSocket connections
- Required fields:
-
WebSocket endpoint (
/ws/{client_id}
):- Real-time communication channel for audio streaming
- Binary audio data and control messages
-
Pipecat WebSocket endpoint (
/pipecat/{client_id}
):- Connection point for Pipecat services
- Bidirectional conversion between raw audio and Protobuf frames
The server determines the WebSocket URL to use in the following priority order:
- User-provided URL in the request (if specified in the
websocket_url
field) BASE_URL
environment variable (recommended for production)- ngrok URL in local development mode
- Auto-detection from request headers (fallback, not reliable in production)
For production deployments, it's strongly recommended to set the BASE_URL
environment variable to your server's public domain (e.g., https://your-server-domain.com
).
Building upon Pipecat, we've added:
- Persona system with Markdown-based configuration for:
- Core personality traits and behaviors
- Knowledge base and domain expertise
- Additional contextual information (websites formatted to MD, technical documentation, etc.)
- AI image generation via Replicate
- Image hosting through UploadThing (UTFS)
- MeetingBaas integration for video meeting platform support
- Multi-agent orchestration via API
- OpenAI (LLM)
- Cartesia (text-to-speech)
- Gladia or Deepgram (speech-to-text)
- MeetingBaas (video meeting platform integration)
- OpenAI (LLM to complete the user prompt and match to a Cartesia Voice ID)
- Replicate (AI image generation)
- UploadThing (UTFS) (image hosting)
For speech-related services (TTS/STT) and LLM choice (like Claude, GPT-4, etc), you can freely choose and swap between any of the integrations available in Pipecat's supported services.
OpenAI's GPT-4, UploadThing (UTFS), and Replicate are currently hard-coded specifically for the CLI-based persona generation features: matching personas to available voices from Cartesia, generating AI avatars, and creating initial personality descriptions and knowledge bases. You do not need a Replicat or UTFS API key to run the project if you're not using the CLI-based persona creation feature and edit Markdowns manually.
-
Real-time audio processing pipeline
-
WebSocket-based communication
-
Tool integration (weather, time)
-
Voice activity detection
-
Message context management
-
Dynamic persona loading from markdown files
-
Customizable personality traits and behaviors
-
Support for multiple languages
-
Voice characteristic customization
-
Image generation for persona avatars
-
Metadata management for each persona
Each persona is defined in the @personas
directory with:
- A README.md defining their personality
- Space for additional markdown files to expand knowledge and behaviour
@personas/
└── quantum_physicist/
├── README.md
└── (additional beVhavior files)
- Python 3.x
grpc_tools
for protocol buffer compilation- Ngrok (for local deployment)
- Poetry for dependency management
# Install Poetry (Unix/macOS)
curl -sSL https://install.python-poetry.org | python3 -
# Install Poetry (Windows)
(Invoke-WebRequest -Uri https://install.python-poetry.org -UseBasicParsing).Content | py -
The project requires certain system dependencies for scientific libraries:
# macOS (using Homebrew)
brew install llvm cython
# Ubuntu/Debian
sudo apt-get install llvm python3-dev cython
# Fedora/RHEL
sudo dnf install llvm-devel python3-devel Cython
# Clone the repository (if you haven't already)
git clone https://github.com/yourusername/speaking-meeting-bot.git
cd speaking-meeting-bot
# Configure Poetry to use Python 3.11+
poetry env use python3.11
# Install dependencies with LLVM config path
# On macOS:
LLVM_CONFIG=$(brew --prefix llvm)/bin/llvm-config poetry install
# On Linux (path may vary):
# LLVM_CONFIG=/usr/bin/llvm-config poetry install
# Activate virtual environment
poetry shell
poetry run python -m grpc_tools.protoc --proto_path=./protobufs --python_out=./protobufs frames.proto
cp env.example .env
Edit .env
with your MeetingBaas credentials and add the BASE_URL
variable for production deployments.
Example .env
file:
MEETING_BAAS_API_KEY=your_api_key_here
BASE_URL=https://your-server-domain.com # For production
There are two ways to run the server:
# Standard mode
poetry run uvicorn app:app --reload --host 0.0.0.0 --port ${PORT}
# Local development mode with ngrok auto-configuration
poetry run python run.py --local-dev
The local development mode simplifies WebSocket setup by:
- Automatically detecting ngrok tunnels
- Handling WebSocket URL configuration for MeetingBaas
- Supporting up to 2 bots (limited by free ngrok tunnels)
- Providing clear warnings about limitations
-
Install ngrok if you haven't already:
brew install ngrok # macOS
-
Start ngrok tunnels for your bot connections:
# Start ngrok with the provided configuration ngrok start --all --config config/ngrok/config.yml
-
Start the server in local development mode:
poetry run python run.py --local-dev
-
When prompted, enter the ngrok URLs shown in the ngrok terminal.
The WebSocket URL is optional in all cases. The server determines the appropriate URL based on the priority list described in the WebSocket URL Resolution section:
curl -X POST http://localhost:${PORT}/run-bots \
-H "Content-Type: application/json" \
-d '{
"meeting_url": "https://meet.google.com/xxx-yyyy-zzz",
"personas": ["interviewer"],
"meeting_baas_api_key": "your-api-key"
}'
You can still manually specify a WebSocket URL if needed:
curl -X POST http://localhost:${PORT}/run-bots \
-H "Content-Type: application/json" \
-d '{
"meeting_url": "https://meet.google.com/xxx-yyyy-zzz",
"personas": ["interviewer"],
"websocket_url": "ws://your-custom-websocket-url:${PORT}",
"meeting_baas_api_key": "your-api-key"
}'
When deploying to production, always set the BASE_URL
environment variable to ensure reliable WebSocket connections:
-
Set
BASE_URL
to your server's public domain:export BASE_URL=https://your-server-domain.com
-
Ensure your server is accessible on the public internet
-
Consider using HTTPS/WSS for secure connections in production
If you encounter issues with the local development mode:
- Make sure ngrok is running with the correct configuration
- Verify that you've entered the correct ngrok URLs when prompted
- Check that your ngrok URLs are accessible (try opening in a browser)
- Remember that the free tier of ngrok limits you to 2 concurrent tunnels
The persona architecture is designed to support:
- Scrapping the websites given by the user to MD for the bot knowledge base
- Containerizing this nicely
- Verify Poetry environment is activated
- Check Ngrok connection status
- Validate environment variables
- Ensure unique Ngrok URLs for multiple agents
For more detailed information about specific personas or deployment options, check the respective documentation in the @personas
directory.
Sometimes, due to WebSocket connection delays through ngrok, the Meeting Baas bots may join the meeting before your local bot connects. If this happens:
- Simply press
Enter
to respawn your bot - This will reinitiate the connection and allow your bot to join the meeting
This is a normal occurrence and can be easily resolved with a quick bot respawn.
# Install dependencies
poetry install
# Compile Protocol Buffers
poetry run python -m grpc_tools.protoc --proto_path=./protobufs --python_out=./protobufs frames.proto
# Run the API server with hot reload
poetry run uvicorn app:app --reload --host 0.0.0.0 --port ${PORT}
For local development and testing with multiple bots, you'll need two terminals:
# Terminal 1: Start the API server
poetry run uvicorn app:app --reload --host 0.0.0.0 --port ${PORT}
# Terminal 2: Start ngrok to expose your local server
ngrok http ${PORT}
Once ngrok is running, it will provide you with a public URL that the server will use for WebSocket connections in local development mode.
The API has been completely redesigned for simplicity and reliability:
- Direct integration with the MeetingBaas API without subprocess management
- Strongly typed Pydantic models with proper validation
- Cleaner WebSocket handling with better error management
- Improved logging with better visibility into the system
- Enhanced JSON message processing for debugging
- Intelligent WebSocket URL resolution with multiple fallback methods
- Support for explicit BASE_URL configuration for production environments
The direct API integration provides several benefits:
# Direct API call to MeetingBaas
meetingbaas_bot_id = create_meeting_bot(
meeting_url=request.meeting_url,
websocket_url=websocket_url, # Determined by the server via multiple methods
bot_id=bot_client_id,
persona_name=persona_name,
api_key=request.meeting_baas_api_key,
# Additional parameters
bot_image=request.bot_image,
entry_message=request.entry_message,
extra=request.extra,
)
This approach eliminates the complexity of subprocess management, provides immediate feedback on bot creation, and returns both the MeetingBaas bot ID and client ID for WebSocket connections.
For production deployment, always set the BASE_URL environment variable:
# Set the BASE_URL for WebSocket connections
export BASE_URL=https://your-server-domain.com
# Run the API server in production mode
poetry run uvicorn app:app --host 0.0.0.0 --port ${PORT}
Once the server is running, you can access:
- Interactive API docs:
http://localhost:${PORT}/docs
- OpenAPI specification:
http://localhost:${PORT}/openapi.json
The API-first approach enables several planned features:
-
Parent API Integration:
- Authentication and authorization
- Rate limiting
- User management
- Billing integration
-
Enhanced Bot Management:
- Real-time bot status monitoring
- Dynamic persona loading
- Bot lifecycle management
- Meeting recording and transcription
-
WebSocket Features:
- Real-time bot control
- Live transcription streaming
- Meeting analytics
- Multi-bot coordination
-
Persona Management:
- Dynamic persona creation via API
- Persona validation and testing
- Knowledge base expansion
- Voice characteristic customization