Skip to content

Meeting-BaaS/speaking-meeting-bot

Repository files navigation

Join our Discord!

Speaking Bot

Speaking Bot Banner

This repository expands upon Pipecat's Python framework for building voice and multimodal conversational agents. Our implementation creates AI meeting agents that can join and participate in Google Meet and Microsoft Teams meetings with distinct personalities and capabilities defined in Markdown files.

Overview

This project extends Pipecat's WebSocket server implementation to create:

  • Meeting agents that can join Google Meet or Microsoft Teams through the MeetingBaas API
  • Customizable personas with unique context
  • Support for running multiple instances via a simple API
  • WebSocket-based communication for real-time interaction

Architecture

Core Framework: Pipecat Integration

Pipecat provides the foundational framework with:

  • Real-time audio processing pipeline
  • WebSocket communication
  • Voice activity detection
  • Message context management

In this implementation, Pipecat is integrated with Cartesia for speech generation (text-to-speech), Gladia or Deepgram for speech-to-text conversion, and OpenAI's GPT-4 as the underlying LLM.

API-First Architecture

The project follows a streamlined API-first approach with:

  • A lightweight FastAPI server that handles bot management via direct MeetingBaas API calls
  • WebSocket server for real-time communication between MeetingBaas and Pipecat
  • Properly typed Pydantic models for request/response validation
  • Clean separation of concerns with modular components

API Endpoints

  1. Root endpoint (GET /):

    • Health check endpoint
    • Returns: {"message": "MeetingBaas Bot API is running"}
  2. Run Bots (POST /run-bots):

    {
      "meeting_url": "https://meet.google.com/xxx-yyyy-zzz",
      "personas": ["interviewer"],
      "meeting_baas_api_key": "your-api-key",
      "bot_image": "https://example.com/avatar.jpg",
      "entry_message": "Hello, I'm here to help!"
    }
    • Required fields: meeting_url and meeting_baas_api_key
    • The WebSocket URL is determined automatically (see WebSocket URL Resolution below)
    • Returns: MeetingBaas bot ID and client ID for WebSocket connections
  3. WebSocket endpoint (/ws/{client_id}):

    • Real-time communication channel for audio streaming
    • Binary audio data and control messages
  4. Pipecat WebSocket endpoint (/pipecat/{client_id}):

    • Connection point for Pipecat services
    • Bidirectional conversion between raw audio and Protobuf frames

WebSocket URL Resolution

The server determines the WebSocket URL to use in the following priority order:

  1. User-provided URL in the request (if specified in the websocket_url field)
  2. BASE_URL environment variable (recommended for production)
  3. ngrok URL in local development mode
  4. Auto-detection from request headers (fallback, not reliable in production)

For production deployments, it's strongly recommended to set the BASE_URL environment variable to your server's public domain (e.g., https://your-server-domain.com).

Project Extensions

Building upon Pipecat, we've added:

  • Persona system with Markdown-based configuration for:
    • Core personality traits and behaviors
    • Knowledge base and domain expertise
    • Additional contextual information (websites formatted to MD, technical documentation, etc.)
  • AI image generation via Replicate
  • Image hosting through UploadThing (UTFS)
  • MeetingBaas integration for video meeting platform support
  • Multi-agent orchestration via API

Required API Keys

For Pipecat-related Services

For Project-specific Add-ons

  • OpenAI (LLM to complete the user prompt and match to a Cartesia Voice ID)
  • Replicate (AI image generation)
  • UploadThing (UTFS) (image hosting)

For speech-related services (TTS/STT) and LLM choice (like Claude, GPT-4, etc), you can freely choose and swap between any of the integrations available in Pipecat's supported services.

Important Note

OpenAI's GPT-4, UploadThing (UTFS), and Replicate are currently hard-coded specifically for the CLI-based persona generation features: matching personas to available voices from Cartesia, generating AI avatars, and creating initial personality descriptions and knowledge bases. You do not need a Replicat or UTFS API key to run the project if you're not using the CLI-based persona creation feature and edit Markdowns manually.

Persona System

Bot Service

  • Real-time audio processing pipeline

  • WebSocket-based communication

  • Tool integration (weather, time)

  • Voice activity detection

  • Message context management

  • Dynamic persona loading from markdown files

  • Customizable personality traits and behaviors

  • Support for multiple languages

  • Voice characteristic customization

  • Image generation for persona avatars

  • Metadata management for each persona

Persona Structure

Each persona is defined in the @personas directory with:

  • A README.md defining their personality
  • Space for additional markdown files to expand knowledge and behaviour

Example Persona Structure

@personas/
└── quantum_physicist/
    ├── README.md
    └── (additional beVhavior files)

Prerequisites

  • Python 3.x
  • grpc_tools for protocol buffer compilation
  • Ngrok (for local deployment)
  • Poetry for dependency management

Installation

1. Set Up Poetry Environment

# Install Poetry (Unix/macOS)
curl -sSL https://install.python-poetry.org | python3 -

# Install Poetry (Windows)
(Invoke-WebRequest -Uri https://install.python-poetry.org -UseBasicParsing).Content | py -

2. Install System Dependencies

The project requires certain system dependencies for scientific libraries:

# macOS (using Homebrew)
brew install llvm cython

# Ubuntu/Debian
sudo apt-get install llvm python3-dev cython

# Fedora/RHEL
sudo dnf install llvm-devel python3-devel Cython

3. Set up Project with Poetry

# Clone the repository (if you haven't already)
git clone https://github.com/yourusername/speaking-meeting-bot.git
cd speaking-meeting-bot

# Configure Poetry to use Python 3.11+
poetry env use python3.11

# Install dependencies with LLVM config path
# On macOS:
LLVM_CONFIG=$(brew --prefix llvm)/bin/llvm-config poetry install

# On Linux (path may vary):
# LLVM_CONFIG=/usr/bin/llvm-config poetry install

# Activate virtual environment
poetry shell

4. Compile Protocol Buffers

poetry run python -m grpc_tools.protoc --proto_path=./protobufs --python_out=./protobufs frames.proto

5. Configure Environment

cp env.example .env

Edit .env with your MeetingBaas credentials and add the BASE_URL variable for production deployments.

Example .env file:

MEETING_BAAS_API_KEY=your_api_key_here
BASE_URL=https://your-server-domain.com  # For production

Running Meeting Agents

API Server Setup

There are two ways to run the server:

# Standard mode
poetry run uvicorn app:app --reload --host 0.0.0.0 --port ${PORT}

# Local development mode with ngrok auto-configuration
poetry run python run.py --local-dev

The local development mode simplifies WebSocket setup by:

  • Automatically detecting ngrok tunnels
  • Handling WebSocket URL configuration for MeetingBaas
  • Supporting up to 2 bots (limited by free ngrok tunnels)
  • Providing clear warnings about limitations

Setting Up ngrok for Local Development

  1. Install ngrok if you haven't already:

    brew install ngrok  # macOS
  2. Start ngrok tunnels for your bot connections:

    # Start ngrok with the provided configuration
    ngrok start --all --config config/ngrok/config.yml
  3. Start the server in local development mode:

    poetry run python run.py --local-dev
  4. When prompted, enter the ngrok URLs shown in the ngrok terminal.

Creating Bots via API

The WebSocket URL is optional in all cases. The server determines the appropriate URL based on the priority list described in the WebSocket URL Resolution section:

curl -X POST http://localhost:${PORT}/run-bots \
  -H "Content-Type: application/json" \
  -d '{
    "meeting_url": "https://meet.google.com/xxx-yyyy-zzz",
    "personas": ["interviewer"],
    "meeting_baas_api_key": "your-api-key"
  }'

You can still manually specify a WebSocket URL if needed:

curl -X POST http://localhost:${PORT}/run-bots \
  -H "Content-Type: application/json" \
  -d '{
    "meeting_url": "https://meet.google.com/xxx-yyyy-zzz",
    "personas": ["interviewer"],
    "websocket_url": "ws://your-custom-websocket-url:${PORT}",
    "meeting_baas_api_key": "your-api-key"
  }'

Production Deployment Considerations

When deploying to production, always set the BASE_URL environment variable to ensure reliable WebSocket connections:

  1. Set BASE_URL to your server's public domain:

    export BASE_URL=https://your-server-domain.com
    
  2. Ensure your server is accessible on the public internet

  3. Consider using HTTPS/WSS for secure connections in production

Troubleshooting Local Development

If you encounter issues with the local development mode:

  1. Make sure ngrok is running with the correct configuration
  2. Verify that you've entered the correct ngrok URLs when prompted
  3. Check that your ngrok URLs are accessible (try opening in a browser)
  4. Remember that the free tier of ngrok limits you to 2 concurrent tunnels

Future Extensibility

The persona architecture is designed to support:

  • Scrapping the websites given by the user to MD for the bot knowledge base
  • Containerizing this nicely

Troubleshooting

  • Verify Poetry environment is activated
  • Check Ngrok connection status
  • Validate environment variables
  • Ensure unique Ngrok URLs for multiple agents

For more detailed information about specific personas or deployment options, check the respective documentation in the @personas directory.

Troubleshooting WebSocket Connections

Handling Timing Issues with ngrok and Meeting Baas Bots

Sometimes, due to WebSocket connection delays through ngrok, the Meeting Baas bots may join the meeting before your local bot connects. If this happens:

  • Simply press Enter to respawn your bot
  • This will reinitiate the connection and allow your bot to join the meeting

This is a normal occurrence and can be easily resolved with a quick bot respawn.

Running the API Server

Local Development

# Install dependencies
poetry install

# Compile Protocol Buffers
poetry run python -m grpc_tools.protoc --proto_path=./protobufs --python_out=./protobufs frames.proto

# Run the API server with hot reload
poetry run uvicorn app:app --reload --host 0.0.0.0 --port ${PORT}

Local Testing with Multiple Bots

For local development and testing with multiple bots, you'll need two terminals:

# Terminal 1: Start the API server
poetry run uvicorn app:app --reload --host 0.0.0.0 --port ${PORT}

# Terminal 2: Start ngrok to expose your local server
ngrok http ${PORT}

Once ngrok is running, it will provide you with a public URL that the server will use for WebSocket connections in local development mode.

API Improvements

The API has been completely redesigned for simplicity and reliability:

  • Direct integration with the MeetingBaas API without subprocess management
  • Strongly typed Pydantic models with proper validation
  • Cleaner WebSocket handling with better error management
  • Improved logging with better visibility into the system
  • Enhanced JSON message processing for debugging
  • Intelligent WebSocket URL resolution with multiple fallback methods
  • Support for explicit BASE_URL configuration for production environments

The direct API integration provides several benefits:

# Direct API call to MeetingBaas
meetingbaas_bot_id = create_meeting_bot(
    meeting_url=request.meeting_url,
    websocket_url=websocket_url,  # Determined by the server via multiple methods
    bot_id=bot_client_id,
    persona_name=persona_name,
    api_key=request.meeting_baas_api_key,
    # Additional parameters
    bot_image=request.bot_image,
    entry_message=request.entry_message,
    extra=request.extra,
)

This approach eliminates the complexity of subprocess management, provides immediate feedback on bot creation, and returns both the MeetingBaas bot ID and client ID for WebSocket connections.

Production Deployment

For production deployment, always set the BASE_URL environment variable:

# Set the BASE_URL for WebSocket connections
export BASE_URL=https://your-server-domain.com

# Run the API server in production mode
poetry run uvicorn app:app --host 0.0.0.0 --port ${PORT}

API Documentation

Once the server is running, you can access:

  • Interactive API docs: http://localhost:${PORT}/docs
  • OpenAPI specification: http://localhost:${PORT}/openapi.json

Future Development

The API-first approach enables several planned features:

  1. Parent API Integration:

    • Authentication and authorization
    • Rate limiting
    • User management
    • Billing integration
  2. Enhanced Bot Management:

    • Real-time bot status monitoring
    • Dynamic persona loading
    • Bot lifecycle management
    • Meeting recording and transcription
  3. WebSocket Features:

    • Real-time bot control
    • Live transcription streaming
    • Meeting analytics
    • Multi-bot coordination
  4. Persona Management:

    • Dynamic persona creation via API
    • Persona validation and testing
    • Knowledge base expansion
    • Voice characteristic customization

About

Fully autonomous speaking bots built using the MeetingBaas API and Pipecat.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •