Skip to content

sean-public/conductor

Repository files navigation

Conductor

Have your locally-running LLM be the DJ for your live gaming session! Conductor is a real-time music director for tabletop gaming sessions (e.g. TTRPGs) that listens to your game, understands the narrative, and automatically selects appropriate music.

Conductor Web Interface Screenshot

Features

  • Real-time transcription: Uses Whisper to transcribe game conversation
  • Intelligent silence detection: Filters out background noise and silence
  • Context-aware recommendations: LLM analyzes recent gameplay to suggest music and explains its reasons
  • Manual override: Click any track to play immediately in the web UI
  • Smooth transitions: Configurable crossfades between tracks
  • Configurable timing: Adjust frequency for querying the LLM and music-changing "debounce" time
  • Game context: Provide optional setting/game info to improve recommendations from the AI
  • Multiple audio formats: Supports MP3 & FLAC files, just add them to the config file
  • Cross-platform: Tested on macOS & Win11
Conductor Web Interface Screenshot

Installation

Prerequisites

  • Python 3.10+
  • uv installed
  • Ollama or OpenAI compatible LLM server running anywhere
  • Microphone and speaker

Initial Setup

  1. Create required directories:

    mkdir tracks  # Your MP3 & FLAC files go here
    mkdir static  # Required by the app
  2. Install deps with uv:

    uv venv
    uv run pip install -r requirements.txt
  3. Configure LLM endpoint: Create .env file (adjust endpoint as needed; see also .env.example)

  4. Add your music:

    • Copy MP3 and/or FLAC files to the tracks directory
    • Create a tracks.yaml at the top level with your track info (see example below)
  5. Run the application:

    uv run python main.py
  6. Open browser to http://localhost:6969 and start using it

Web Interface

  1. Start Recording - Click to begin listening and transcribing audio
  2. Game Context - Fill in setting and notes (expand for more space)
  3. Configuration - Adjust sliders:
    • Update Frequency: How often to check for music changes (5s default)
    • Debounce Time: Minimum seconds between track changes (30s default)
    • Crossfade Duration: Transition time between tracks (2s default)
    • Silence Threshold: Audio level below which chunks are considered silence (0.01 default, dee details below)
    • Min Speech Duration: Minimum seconds of speech required before transcribing (1.0s default)
  4. Manual Override - Click any track in the library to play immediately

Track Configuration

Edit tracks.yaml to define your music library. Here's an example:

tracks:
  - id: "combat-intense"
    title: "Plasma Fire"
    filename: "tracks/plasma_fire.mp3"
    bpm: 140
    duration_seconds: 150
    description: "High-energy combat music with driving beats and sweet sax solo."

  - id: "ambient-space"
    title: "Cosmic Drift"
    filename: "tracks/cosmic_drift.flac"
    bpm: 85
    duration_seconds: 180
    description: "Ambient track for space exploration, very slow development."

The id field is used internally by the LLM to recommend a specific track based on the recent transcribed audio, so make it whatever you like so long as it's unique across the songs. The title is shown in the UI and the filename has a path relative to where you launched the server from.

All music is looped indefinitely, but the duration in seconds is provided to the LLM so it can tell if it's a short transition or longer piece and recommend accordingly. If you want really surprising, fun results, try giving evocative descriptions and multifarious music options (think: at least an album's worth of playlist appropriate to the game).

Silence Detection

The app includes intelligent silence detection to prevent transcribing background noise and silence, which tends to give repeated nonsense that muddies the resulting transcript.

  • Silence Threshold: Audio chunks below this RMS energy level are considered silence

    • Lower values (0.001-0.005): Very sensitive, picks up quiet sounds
    • Medium values (0.01-0.02): Good for typical indoor environments, especially if you have speech-filtering turned on in the system tray for macOS
    • Higher values (0.03-0.05): Only loud/clear speech triggers transcription
  • Minimum Speech Duration: How long speech must be detected before transcribing

    • Prevents short noise bursts from being transcribed
    • 1.0s default works well for most conversations
    • Increase for noisy environments, decrease for quiet settings

About

Have your locally-running LLM be the DJ for your live gaming session!

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published