Speaking Bot

This repository expands upon Pipecat's Python framework for building voice and multimodal conversational agents. Our implementation creates AI meeting agents that can join and participate in Google Meet and Microsoft Teams meetings with distinct personalities and capabilities defined in Markdown files.

Overview

This project extends Pipecat's WebSocket server implementation to create:

Meeting agents that can join Google Meet or Microsoft Teams through the MeetingBaas API
Customizable personas with unique context
Support for running multiple instances via a simple API
WebSocket-based communication for real-time interaction

Architecture

Core Framework: Pipecat Integration

Pipecat provides the foundational framework with:

Real-time audio processing pipeline
WebSocket communication
Voice activity detection
Message context management

In this implementation, Pipecat is integrated with Cartesia for speech generation (text-to-speech), Gladia or Deepgram for speech-to-text conversion, and OpenAI's GPT-4 as the underlying LLM.

API-First Architecture

The project follows a streamlined API-first approach with:

A lightweight FastAPI server that handles bot management via direct MeetingBaas API calls
WebSocket server for real-time communication between MeetingBaas and Pipecat
Properly typed Pydantic models for request/response validation
Clean separation of concerns with modular components

API Endpoints

Root endpoint (GET /):
- Health check endpoint
- Returns: {"message": "MeetingBaas Bot API is running"}

Run Bots (POST /run-bots):

{
  "meeting_url": "https://meet.google.com/xxx-yyyy-zzz",
  "personas": ["interviewer"],
  "meeting_baas_api_key": "your-api-key",
  "bot_image": "https://example.com/avatar.jpg",
  "entry_message": "Hello, I'm here to help!"
}

Required fields: meeting_url and meeting_baas_api_key
The WebSocket URL is determined automatically (see WebSocket URL Resolution below)
Returns: MeetingBaas bot ID and client ID for WebSocket connections

WebSocket endpoint (/ws/{client_id}):
- Real-time communication channel for audio streaming
- Binary audio data and control messages
Pipecat WebSocket endpoint (/pipecat/{client_id}):
- Connection point for Pipecat services
- Bidirectional conversion between raw audio and Protobuf frames

WebSocket URL Resolution

The server determines the WebSocket URL to use in the following priority order:

User-provided URL in the request (if specified in the websocket_url field)
BASE_URL environment variable (recommended for production)
ngrok URL in local development mode
Auto-detection from request headers (fallback, not reliable in production)

For production deployments, it's strongly recommended to set the BASE_URL environment variable to your server's public domain (e.g., https://your-server-domain.com).

Project Extensions

Building upon Pipecat, we've added:

Persona system with Markdown-based configuration for:
- Core personality traits and behaviors
- Knowledge base and domain expertise
- Additional contextual information (websites formatted to MD, technical documentation, etc.)
AI image generation via Replicate
Image hosting through UploadThing (UTFS)
MeetingBaas integration for video meeting platform support
Multi-agent orchestration via API

Required API Keys

For Pipecat-related Services

OpenAI (LLM)
Cartesia (text-to-speech)
Gladia or Deepgram (speech-to-text)
MeetingBaas (video meeting platform integration)

For Project-specific Add-ons

OpenAI (LLM to complete the user prompt and match to a Cartesia Voice ID)
Replicate (AI image generation)
UploadThing (UTFS) (image hosting)

For speech-related services (TTS/STT) and LLM choice (like Claude, GPT-4, etc), you can freely choose and swap between any of the integrations available in Pipecat's supported services.

Important Note

OpenAI's GPT-4, UploadThing (UTFS), and Replicate are currently hard-coded specifically for the CLI-based persona generation features: matching personas to available voices from Cartesia, generating AI avatars, and creating initial personality descriptions and knowledge bases. You do not need a Replicat or UTFS API key to run the project if you're not using the CLI-based persona creation feature and edit Markdowns manually.

Persona System

Bot Service

Real-time audio processing pipeline
WebSocket-based communication
Tool integration (weather, time)
Voice activity detection
Message context management
Dynamic persona loading from markdown files
Customizable personality traits and behaviors
Support for multiple languages
Voice characteristic customization
Image generation for persona avatars
Metadata management for each persona

Persona Structure

Each persona is defined in the @personas directory with:

A README.md defining their personality
Space for additional markdown files to expand knowledge and behaviour

Example Persona Structure

@personas/
└── quantum_physicist/
    ├── README.md
    └── (additional beVhavior files)

Prerequisites

Python 3.x
grpc_tools for protocol buffer compilation
Ngrok (for local deployment)
Poetry for dependency management

Installation

1. Set Up Poetry Environment

# Install Poetry (Unix/macOS)
curl -sSL https://install.python-poetry.org | python3 -

# Install Poetry (Windows)
(Invoke-WebRequest -Uri https://install.python-poetry.org -UseBasicParsing).Content | py -

2. Install System Dependencies

The project requires certain system dependencies for scientific libraries:

# macOS (using Homebrew)
brew install llvm cython

# Ubuntu/Debian
sudo apt-get install llvm python3-dev cython

# Fedora/RHEL
sudo dnf install llvm-devel python3-devel Cython

3. Set up Project with Poetry

# Clone the repository (if you haven't already)
git clone https://github.com/yourusername/speaking-meeting-bot.git
cd speaking-meeting-bot

# Configure Poetry to use Python 3.11+
poetry env use python3.11

# Install dependencies with LLVM config path
# On macOS:
LLVM_CONFIG=$(brew --prefix llvm)/bin/llvm-config poetry install

# On Linux (path may vary):
# LLVM_CONFIG=/usr/bin/llvm-config poetry install

# Activate virtual environment
poetry shell

4. Compile Protocol Buffers

poetry run python -m grpc_tools.protoc --proto_path=./protobufs --python_out=./protobufs frames.proto

5. Configure Environment

cp env.example .env

Edit .env with your MeetingBaas credentials and add the BASE_URL variable for production deployments.

Example .env file:

MEETING_BAAS_API_KEY=your_api_key_here
BASE_URL=https://your-server-domain.com  # For production

Running Meeting Agents

API Server Setup

There are two ways to run the server:

# Standard mode
poetry run uvicorn app:app --reload --host 0.0.0.0 --port ${PORT}

# Local development mode with ngrok auto-configuration
poetry run python run.py --local-dev

The local development mode simplifies WebSocket setup by:

Automatically detecting ngrok tunnels
Handling WebSocket URL configuration for MeetingBaas
Supporting up to 2 bots (limited by free ngrok tunnels)
Providing clear warnings about limitations

Setting Up ngrok for Local Development

Install ngrok if you haven't already:
```
brew install ngrok  # macOS
```

Start ngrok tunnels for your bot connections:

# Start ngrok with the provided configuration
ngrok start --all --config config/ngrok/config.yml

Start the server in local development mode:
```
poetry run python run.py --local-dev
```
When prompted, enter the ngrok URLs shown in the ngrok terminal.

Creating Bots via API

The WebSocket URL is optional in all cases. The server determines the appropriate URL based on the priority list described in the WebSocket URL Resolution section:

curl -X POST http://localhost:${PORT}/run-bots \
  -H "Content-Type: application/json" \
  -d '{
    "meeting_url": "https://meet.google.com/xxx-yyyy-zzz",
    "personas": ["interviewer"],
    "meeting_baas_api_key": "your-api-key"
  }'

You can still manually specify a WebSocket URL if needed:

curl -X POST http://localhost:${PORT}/run-bots \
  -H "Content-Type: application/json" \
  -d '{
    "meeting_url": "https://meet.google.com/xxx-yyyy-zzz",
    "personas": ["interviewer"],
    "websocket_url": "ws://your-custom-websocket-url:${PORT}",
    "meeting_baas_api_key": "your-api-key"
  }'

Production Deployment Considerations

When deploying to production, always set the BASE_URL environment variable to ensure reliable WebSocket connections:

Set BASE_URL to your server's public domain:

export BASE_URL=https://your-server-domain.com

Ensure your server is accessible on the public internet
Consider using HTTPS/WSS for secure connections in production

Troubleshooting Local Development

If you encounter issues with the local development mode:

Make sure ngrok is running with the correct configuration
Verify that you've entered the correct ngrok URLs when prompted
Check that your ngrok URLs are accessible (try opening in a browser)
Remember that the free tier of ngrok limits you to 2 concurrent tunnels

Future Extensibility

The persona architecture is designed to support:

Scrapping the websites given by the user to MD for the bot knowledge base
Containerizing this nicely

Troubleshooting

Verify Poetry environment is activated
Check Ngrok connection status
Validate environment variables
Ensure unique Ngrok URLs for multiple agents

For more detailed information about specific personas or deployment options, check the respective documentation in the @personas directory.

Troubleshooting WebSocket Connections

Handling Timing Issues with ngrok and Meeting Baas Bots

Sometimes, due to WebSocket connection delays through ngrok, the Meeting Baas bots may join the meeting before your local bot connects. If this happens:

Simply press Enter to respawn your bot
This will reinitiate the connection and allow your bot to join the meeting

This is a normal occurrence and can be easily resolved with a quick bot respawn.

Running the API Server

Local Development

# Install dependencies
poetry install

# Compile Protocol Buffers
poetry run python -m grpc_tools.protoc --proto_path=./protobufs --python_out=./protobufs frames.proto

# Run the API server with hot reload
poetry run uvicorn app:app --reload --host 0.0.0.0 --port ${PORT}

Local Testing with Multiple Bots

For local development and testing with multiple bots, you'll need two terminals:

# Terminal 1: Start the API server
poetry run uvicorn app:app --reload --host 0.0.0.0 --port ${PORT}

# Terminal 2: Start ngrok to expose your local server
ngrok http ${PORT}

Once ngrok is running, it will provide you with a public URL that the server will use for WebSocket connections in local development mode.

API Improvements

The API has been completely redesigned for simplicity and reliability:

Direct integration with the MeetingBaas API without subprocess management
Strongly typed Pydantic models with proper validation
Cleaner WebSocket handling with better error management
Improved logging with better visibility into the system
Enhanced JSON message processing for debugging
Intelligent WebSocket URL resolution with multiple fallback methods
Support for explicit BASE_URL configuration for production environments

The direct API integration provides several benefits:

# Direct API call to MeetingBaas
meetingbaas_bot_id = create_meeting_bot(
    meeting_url=request.meeting_url,
    websocket_url=websocket_url,  # Determined by the server via multiple methods
    bot_id=bot_client_id,
    persona_name=persona_name,
    api_key=request.meeting_baas_api_key,
    # Additional parameters
    bot_image=request.bot_image,
    entry_message=request.entry_message,
    extra=request.extra,
)

This approach eliminates the complexity of subprocess management, provides immediate feedback on bot creation, and returns both the MeetingBaas bot ID and client ID for WebSocket connections.

Production Deployment

For production deployment, always set the BASE_URL environment variable:

# Set the BASE_URL for WebSocket connections
export BASE_URL=https://your-server-domain.com

# Run the API server in production mode
poetry run uvicorn app:app --host 0.0.0.0 --port ${PORT}

API Documentation

Once the server is running, you can access:

Interactive API docs: http://localhost:${PORT}/docs
OpenAPI specification: http://localhost:${PORT}/openapi.json

Future Development

The API-first approach enables several planned features:

Parent API Integration:
- Authentication and authorization
- Rate limiting
- User management
- Billing integration
Enhanced Bot Management:
- Real-time bot status monitoring
- Dynamic persona loading
- Bot lifecycle management
- Meeting recording and transcription
WebSocket Features:
- Real-time bot control
- Live transcription streaming
- Meeting analytics
- Multi-bot coordination
Persona Management:
- Dynamic persona creation via API
- Persona validation and testing
- Knowledge base expansion
- Voice characteristic customization

Name		Name	Last commit message	Last commit date
Latest commit History 168 Commits
.github		.github
app		app
config		config
core		core
images		images
meetingbaas_pipecat		meetingbaas_pipecat
protobufs		protobufs
scripts		scripts
tests		tests
utils		utils
.cursorrules		.cursorrules
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
MULTIPLE.md		MULTIPLE.md
README.md		README.md
__init__.py		__init__.py
env.example		env.example
fly.toml		fly.toml
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

License

Meeting-BaaS/speaking-meeting-bot

Folders and files

Latest commit

History

Repository files navigation

Speaking Bot

Overview

Architecture

Core Framework: Pipecat Integration

API-First Architecture

API Endpoints

WebSocket URL Resolution

Project Extensions

Required API Keys

For Pipecat-related Services

For Project-specific Add-ons

Important Note

Persona System

Bot Service

Persona Structure

Example Persona Structure

Prerequisites

Installation

1. Set Up Poetry Environment

2. Install System Dependencies

3. Set up Project with Poetry

4. Compile Protocol Buffers

5. Configure Environment

Running Meeting Agents

API Server Setup

Setting Up ngrok for Local Development

Creating Bots via API

Production Deployment Considerations

Troubleshooting Local Development

Future Extensibility

Troubleshooting

Troubleshooting WebSocket Connections

Handling Timing Issues with ngrok and Meeting Baas Bots

Running the API Server

Local Development

Local Testing with Multiple Bots

API Improvements

Production Deployment

API Documentation

Future Development

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 4

Uh oh!

Languages

Packages