LiveKit Plugins – Turn Detector

livekit-plugins-external-turn-detector provides end-of-turn detection for LiveKit Agents using custom models to determine when a user has finished speaking.

This plugin enables accurate conversation flow management by leveraging language models trained specifically for turn detection, offering superior performance compared to traditional VAD-based approaches.

✨ Features

🎯 Built-in Models — English and multilingual models that run locally
🔌 LiveKit plugin integration — plug-and-play support for LiveKit workflows
🤖 Compatible with livekit-agents — seamless integration with agent framework
🚀 External Server Support — use custom models via OpenAI-compatible APIs, vLLM, or NVIDIA Triton
⚡ Low-latency inference — ~10ms (English) / ~25ms (multilingual) per inference
🌍 Multilingual support — 13+ languages in the multilingual model
🔧 Flexible backends — choose between local inference or remote servers

🔧 Installation

# from PyPI
pip install -U livekit-plugins-external-turn-detector

# from source
pip install git+https://github.com/dangvansam/livekit-plugins-turn-detector.git

🔌 Usage

Built-in Models

English model

The English model is the smaller of the two models. It requires 200MB of RAM and completes inference in ~10ms

from livekit.plugins.turn_detector.english import EnglishModel

session = AgentSession(
    ...
    turn_detection=EnglishModel(),
)

Multilingual model

We've trained a separate multilingual model that supports the following languages: English, French, Spanish, German, Italian, Portuguese, Dutch, Chinese, Japanese, Korean, Indonesian, Russian, Turkish

The multilingual model requires ~400MB of RAM and completes inferences in ~25ms.

from livekit.plugins.turn_detector.multilingual import MultilingualModel

session = AgentSession(
    ...
    turn_detection=MultilingualModel(),
)

External Server Models

For custom models or when you need to offload inference to a dedicated server, you can use external backends. The plugin supports flexible model configuration, allowing you to use any compatible language model for turn detection.

Supported Backends:

vLLM: High-performance inference with any HuggingFace-compatible model
OpenAI API: Direct integration with OpenAI models
Triton: Enterprise-grade inference server with custom model support
Custom APIs: Any OpenAI-compatible API endpoint

Using vLLM Backend

For high-performance inference with custom models using vLLM:

from livekit.plugins.turn_detector.external import ExternalModel

# Using vLLM with OpenAI-compatible API
turn_detector = ExternalModel(
    provider="openai",  # vLLM uses OpenAI-compatible API
    base_url="http://localhost:8000",  # Your vLLM server endpoint
    model_name="Qwen/Qwen3-0.6B",  # Model name in vLLM (or your custom model)
    api_key="EMPTY",  # Usually "EMPTY" for vLLM or your custom key
    temperature=0.1,
    max_tokens=20,
    timeout=1.0,
    system_prompt="You are a speaking turn-ending identifier. Your task is to identify whether the user's speaking turn is complete or not. Respond with 'end' if the user's turn is complete, or 'continue' if it is not."
)

session = AgentSession(
    ...
    turn_detection=turn_detector,
)

Using NVIDIA Triton Inference Server

For high-performance inference with custom models:

from livekit.plugins.turn_detector.external import ExternalModel

turn_detector = ExternalModel(
    provider="triton",
    url="localhost:7001",  # Your Triton server gRPC endpoint
    model_name="ensemble",      # Your model name in Triton
    tokenizer="Qwen/Qwen3-0.6B",
    temperature=0.1,
    max_tokens=20,
    timeout=1.0,
)

session = AgentSession(
    ...
    turn_detection=turn_detector,
)

Using OpenAI Backend

Environment Variables (shared across all providers):

See .env.example for a complete configuration template with examples for different use cases.

Core Configuration:

export TURN_DETECTION_PROVIDER="openai"  # Provider: "openai" or "triton"
export TURN_DETECTION_BASE_URL="http://localhost:8000"  # Server URL
export TURN_DETECTION_MODEL="Qwen/Qwen3-0.6B"  # Any compatible model
export TURN_DETECTION_API_KEY="EMPTY"  # API key (EMPTY for vLLM, required for OpenAI)

Optional Tuning Parameters:

export TURN_DETECTION_TEMPERATURE="0.1"  # Lower = more deterministic
export TURN_DETECTION_MAX_TOKENS="20"  # Response length limit
export TURN_DETECTION_SUPPORT_LANGUAGES="en,zh"  # Target languages
export TURN_DETECTION_SYSTEM_PROMPT="Custom instructions..."  # Model behavior
export TURN_DETECTION_TOKENIZER="Qwen/Qwen3-0.6B"  # Triton only: preprocessing

Flexible Model Options:

Use any HuggingFace model ID: "microsoft/DialoGPT-medium", "Qwen/Qwen2.5-7B-Instruct"
Deploy custom fine-tuned models: "your-org/custom-turn-detector"
Point to local model paths with Triton or vLLM
Configure multi-language support for your specific use case

You can then use the turn detector with just environment variables:

from livekit.plugins.turn_detector.external import ExternalModel

# Using environment variables only (provider auto-detected from TURN_DETECTION_PROVIDER)
turn_detector = ExternalModel()

session = AgentSession(
    ...
    turn_detection=turn_detector,
)

Easy Provider Switching: With unified environment variables, you can easily switch between providers:

# For vLLM/OpenAI
export TURN_DETECTION_PROVIDER="openai"
export TURN_DETECTION_BASE_URL="http://localhost:8000"
export TURN_DETECTION_MODEL="Qwen/Qwen3-0.6B"

# For Triton (same variables, different values)
export TURN_DETECTION_PROVIDER="triton"
export TURN_DETECTION_BASE_URL="localhost:7001"
export TURN_DETECTION_MODEL="ensemble"
export TURN_DETECTION_TOKENIZER="Qwen/Qwen3-0.6B"

Setting Up vLLM Server

For flexible model deployment with vLLM:

# Install vLLM
pip install vllm

# Option 1: Use Qwen models (recommended for turn detection)
python -m vllm.entrypoints.openai.api_server \
    --model Qwen/Qwen3-0.6B \
    --host 0.0.0.0 \
    --port 8000

# Option 2: Use your custom fine-tuned model
python -m vllm.entrypoints.openai.api_server \
    --model your-username/your-turn-detection-model \
    --host 0.0.0.0 \
    --port 8000

# Option 3: Local model path
python -m vllm.entrypoints.openai.api_server \
    --model /path/to/your/local/model \
    --host 0.0.0.0 \
    --port 8000

Model Flexibility:

Any HuggingFace Model: Use any compatible model for turn detection
Custom Fine-tuned Models: Deploy your domain-specific turn detection models
Multi-language Support: Configure TURN_DETECTION_SUPPORT_LANGUAGES for your target languages
Performance Tuning: Adjust temperature and max_tokens based on your model's characteristics

Using NVIDIA Triton Inference Server

from livekit.plugins.turn_detector.external import ExternalModel

turn_detector = ExternalModel(
    provider="triton",
    url="localhost:7001",  # Your Triton server gRPC endpoint
    model_name="ensemble",      # Your model name in Triton
    tokenizer="Qwen/Qwen3-0.6B",
    temperature=0.1,
    max_tokens=20,
    timeout=1.0,
)

session = AgentSession(
    ...
    turn_detection=turn_detector,
)

Triton Server Configuration

Your Triton server should have models that accept:

Inputs:

text_input (BYTES): Input prompt
max_tokens (INT32): Max tokens to generate
temperature (FP32): Sampling temperature
Additional generation parameters as needed

Outputs:

text_output (BYTES): Generated text ("end" or "continue")

Usage with RealtimeModel

The turn detector can be used even with speech-to-speech models such as OpenAI's Realtime API. You'll need to provide a separate STT to ensure our model has access to the text content.

session = AgentSession(
    ...
    stt=deepgram.STT(model="nova-3", language="multi"),
    llm=openai.realtime.RealtimeModel(),
    turn_detection=MultilingualModel(),
)

🚀 Running your agent

This plugin requires model files. Before starting your agent for the first time, or when building Docker images for deployment, run the following command to download the model files:

python my_agent.py download-files

📊 Model system requirements

Built-in Models

The built-in end-of-turn models are optimized to run on CPUs with modest system requirements. They are designed to run on the same server hosting your agents.

English model: ~200MB RAM, ~10ms inference time
Multilingual model: ~400MB RAM, ~25ms inference time
Both models run within a shared inference server, supporting multiple concurrent sessions

External Models

When using external backends, system requirements depend on your chosen configuration:

vLLM Backend

Highly optimized for transformer models with GPU acceleration
Supports continuous batching for improved throughput
Memory-efficient PagedAttention for handling multiple concurrent requests
Recommended for production deployments requiring high performance
Compatible with most Hugging Face models

Triton Inference Server

Server requirements depend on your model size and configuration
Supports GPU acceleration for faster inference
Can handle high-throughput scenarios with proper scaling
Recommended for production deployments with custom models

📚 Documentation

For more information, see the official documentation.

📄 License

The plugin source code is licensed under the Apache-2.0 license.

The end-of-turn model is licensed under the LiveKit Model License.

🙏 Acknowledgments

This plugin leverages language models specifically trained for turn detection, providing more accurate conversation flow management compared to traditional VAD-based approaches.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
livekit/plugins/turn_detector		livekit/plugins/turn_detector
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

LiveKit Plugins – Turn Detector

✨ Features

🔧 Installation

🔌 Usage

Built-in Models

English model

Multilingual model

External Server Models

Using vLLM Backend

Using NVIDIA Triton Inference Server

Using OpenAI Backend

Setting Up vLLM Server

Using NVIDIA Triton Inference Server

Triton Server Configuration

Usage with RealtimeModel

🚀 Running your agent

📊 Model system requirements

Built-in Models

External Models

vLLM Backend

Triton Inference Server

📚 Documentation

📄 License

🙏 Acknowledgments

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

dangvansam/livekit-plugins-turn-detector

Folders and files

Latest commit

History

Repository files navigation

LiveKit Plugins – Turn Detector

✨ Features

🔧 Installation

🔌 Usage

Built-in Models

English model

Multilingual model

External Server Models

Using vLLM Backend

Using NVIDIA Triton Inference Server

Using OpenAI Backend

Setting Up vLLM Server

Using NVIDIA Triton Inference Server

Triton Server Configuration

Usage with RealtimeModel

🚀 Running your agent

📊 Model system requirements

Built-in Models

External Models

vLLM Backend

Triton Inference Server

📚 Documentation

📄 License

🙏 Acknowledgments

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages