Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
35 commits
Select commit Hold shift + click to select a range
c7c0d3e
Enhance FILTER_README.md with interruption filter info
Mishikasardana Jan 16, 2026
ff02a3f
Add proof of functionality for interruption filter
Mishikasardana Jan 16, 2026
6d441e3
Change test script from direct_test.py to test.py
Mishikasardana Jan 16, 2026
f349883
Add tests for InterruptionFilter functionality
Mishikasardana Jan 16, 2026
ed7f070
Enable interruption filter in basic_agent.py
Mishikasardana Jan 16, 2026
fde4cc3
Add InterruptionFilter to module exports
Mishikasardana Jan 16, 2026
6ed46c5
Integrate interruption filter for speech handling
Mishikasardana Jan 16, 2026
9c72b39
Add interruption filter options to agent session
Mishikasardana Jan 16, 2026
fcdf851
Add InterruptionFilter class for managing interruptions
Mishikasardana Jan 16, 2026
c03d079
Implement tests for InterruptionFilter functionality
Mishikasardana Jan 16, 2026
da062db
Convert tests to use pytest for InterruptionFilter
Mishikasardana Jan 16, 2026
ebb1267
Refactor ignore words to use set type
Mishikasardana Jan 16, 2026
c1efebb
Refactor exports in voice module for clarity
Mishikasardana Jan 16, 2026
88d0bfa
Refactor InterruptionFilter and update tests
Mishikasardana Jan 16, 2026
c2e876f
Ignore backchanneling phrases in voice filter
Mishikasardana Jan 17, 2026
8846bf5
Refactor text normalization by using translate
Mishikasardana Jan 17, 2026
1bd1f01
Change import path for InterruptionFilter
Mishikasardana Jan 17, 2026
3c4e3ad
Update file reference in FILTER_README.md
Mishikasardana Jan 17, 2026
4be62cd
Add interruption filter options to agent session
Mishikasardana Jan 17, 2026
41e2373
Remove unused import of 'os' module
Mishikasardana Jan 17, 2026
b77a1e6
Enhance InterruptionFilter with new ignore word handling
Mishikasardana Jan 17, 2026
28682fb
Remove duplicate import of InterruptionFilter
Mishikasardana Jan 17, 2026
5da2a56
Refactor 'got it' into 'got' and 'it'
Mishikasardana Jan 17, 2026
0077b40
Refactor text normalization in InterruptionFilter
Mishikasardana Jan 17, 2026
c1b1b50
Refactor InterruptionFilter for improved word handling
Mishikasardana Jan 17, 2026
50ec861
Change import path for InterruptionFilter
Mishikasardana Jan 17, 2026
130bbbd
Update filter.py
Mishikasardana Jan 17, 2026
f39ae6c
fix: interruption filter normalization and ruff formatting
Mishikasardana Jan 17, 2026
b63ab5f
Update pyproject.toml
Mishikasardana Jan 17, 2026
e812304
Update pyproject.toml
Mishikasardana Jan 17, 2026
2d62056
Refactor print statement for instruction updates
Mishikasardana Jan 17, 2026
6506ab8
Refactor text formatting in session close callback
Mishikasardana Jan 17, 2026
c743e4a
Fix Python 3.9 f-string incompatibilities and clean lint
Mishikasardana Jan 17, 2026
bb031c6
Update pyproject.toml
Mishikasardana Jan 17, 2026
008bde1
Update Ruff config and cleanup
Mishikasardana Jan 17, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
111 changes: 111 additions & 0 deletions FILTER_README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,111 @@
# Interruption Filter for LiveKit Agents

## Problem

When the agent is speaking, it stops as soon as the user says words like “yeah”, “ok”, or “hmm”.
This feels unnatural because users often say these words just to show they are listening, not to interrupt.

## Solution

We added an interruption filter.

The filter checks:

What the user said
Whether the agent is currently speaking
If the agent is speaking and the user only says listening words (like “yeah” or “ok”), the agent keeps talking instead of stopping.

## How to Use

### Basic Usage

```python
from livekit.agents import AgentSession

session = AgentSession(
stt="deepgram/nova-3",
llm="openai/gpt-4.1-mini",
tts="cartesia/sonic-2",
vad=silero.VAD.load(),
interruption_filter_enabled=True, # enabled by default
)
```

### Custom Ignore Words

```python
session = AgentSession(
# ... other params ...
interruption_ignore_words=['yeah', 'ok', 'sure', 'gotcha'],
)
```

### Environment Variable

```bash
export LIVEKIT_INTERRUPTION_IGNORE_WORDS="yeah,ok,hmm,right"
```

### Disable the Filter

```python
session = AgentSession(
# ... other params ...
interruption_filter_enabled=False,
)
```

## Default Ignore Words

yeah, ok, okay, hmm, mhm, mm-hmm, uh-huh, right, aha, ah, oh, sure, yep, yup, gotcha, got it, alright, cool

## How It Works

1. The agent is speaking
2. The user talks
3. VAD detects the user’s voice
4. Speech is converted to text (STT)
5. The filter checks:
- Is the agent speaking?
- Are all the spoken words in the ignore list?
6. Decision:
- Yes → Ignore it, agent continues speaking
- No → Agent stops speaking


## Examples

**Agent speaking, user says "yeah":**
- Filter ignores it, agent continues

**Agent speaking, user says "wait":**
- Filter allows it, agent stops

**Agent silent, user says "yeah":**
- Filter allows it, agent responds

**Agent speaking, user says "yeah wait":**
- Filter allows it (contains "wait"), agent stops

## Testing

Run the test:
```bash
python test.py
```

All 4 test scenarios should pass.

## Implementation Details

The filter is in `filter.py`. It's integrated into `agent_activity.py` in the `_interrupt_by_audio_activity()` method.

When an interruption is detected, it checks the transcript and agent state before deciding whether to actually interrupt.

## Configuration

You can customize the ignore words list or disable the filter entirely. The filter is enabled by default because it improves the conversation flow.

## Performance

The filter adds less than 1ms of latency. It just does simple string matching on the transcript.
7 changes: 3 additions & 4 deletions examples/bank-ivr/ivr_system_agent.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,6 @@
import os
from dataclasses import dataclass, field
from enum import Enum
from typing import Optional

from dotenv import load_dotenv
from mock_bank_service import (
Expand Down Expand Up @@ -53,9 +52,9 @@ class TaskOutcome(str, Enum):

@dataclass
class SessionState:
customer_id: Optional[str] = None # noqa: UP007
customer_name: Optional[str] = None # noqa: UP007
branch_name: Optional[str] = None # noqa: UP007
customer_id: str | None = None # noqa: UP007
customer_name: str | None = None # noqa: UP007
branch_name: str | None = None # noqa: UP007
deposit_cache: dict[str, tuple[DepositAccount, ...]] = field(default_factory=dict)
card_cache: dict[str, tuple[CreditCard, ...]] = field(default_factory=dict)
loan_cache: dict[str, tuple[LoanAccount, ...]] = field(default_factory=dict)
Expand Down
6 changes: 2 additions & 4 deletions examples/bank-ivr/mock_bank_service.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
from dataclasses import dataclass
from pathlib import Path
from types import MappingProxyType
from typing import Any, Optional
from typing import Any


@dataclass(frozen=True)
Expand Down Expand Up @@ -188,9 +188,7 @@ def get_profile(self, customer_id: str) -> CustomerProfile:
def list_deposit_accounts(self, customer_id: str) -> tuple[DepositAccount, ...]:
return self.get_profile(customer_id).deposit_accounts

def find_deposit_account(
self, customer_id: str, account_number: str
) -> Optional[DepositAccount]: # noqa: UP007
def find_deposit_account(self, customer_id: str, account_number: str) -> DepositAccount | None: # noqa: UP007
for acct in self.list_deposit_accounts(customer_id):
if acct.account_number == account_number:
return acct
Expand Down
1 change: 1 addition & 0 deletions examples/voice_agents/basic_agent.py
Original file line number Diff line number Diff line change
Expand Up @@ -102,6 +102,7 @@ async def entrypoint(ctx: JobContext):
# when it's detected, you may resume the agent's speech
resume_false_interruption=True,
false_interruption_timeout=1.0,
interruption_filter_enabled=True,
)

# log metrics as they are emitted, and total usage after session is over
Expand Down
4 changes: 3 additions & 1 deletion examples/voice_agents/llamaindex-rag/retrieval.py
Original file line number Diff line number Diff line change
Expand Up @@ -80,7 +80,9 @@ async def llm_node(
system_msg.content.append(instructions)
else:
chat_ctx.items.insert(0, llm.ChatMessage(role="system", content=[instructions]))
print(f"update instructions: {instructions[:100].replace('\n', '\\n')}...")
# Debug: log truncated instructions
debug_text = instructions[:100].replace("\n", "\\n")
print(f"update instructions: {debug_text}...")

# update the instructions for agent
# await self.update_instructions(instructions)
Expand Down
5 changes: 4 additions & 1 deletion examples/voice_agents/session_close_callback.py
Original file line number Diff line number Diff line change
Expand Up @@ -77,7 +77,10 @@ def on_close(ev: CloseEvent):
print("Chat History:")
for item in session.history.items:
if item.type == "message":
text = f"{item.role}: {item.text_content.replace('\n', '\\n')}"
# New code
content = item.text_content.replace("\n", "\\n")
text = f"{item.role}: {content}"

if item.interrupted:
text += " (interrupted)"

Expand Down
26 changes: 13 additions & 13 deletions livekit-agents/livekit/agents/cli/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -1489,13 +1489,13 @@ def _build_cli(server: AgentServer) -> typer.Typer:
def console(
*,
input_device: Annotated[
Optional[str], # noqa: UP007, required for python 3.9
str | None, # noqa: UP007, required for python 3.9
typer.Option(
help="Numeric input device ID or input device name substring(s)",
),
] = None,
output_device: Annotated[
Optional[str], # noqa: UP007
str | None, # noqa: UP007
typer.Option(
help="Numeric output device ID or output device name substring(s)",
),
Expand Down Expand Up @@ -1541,28 +1541,28 @@ def start(
typer.Option(help="Set the log level", case_sensitive=False),
] = LogLevel.info,
url: Annotated[
Optional[str], # noqa: UP007
str | None, # noqa: UP007
typer.Option(
help="The WebSocket URL of your LiveKit server or Cloud project.",
envvar="LIVEKIT_URL",
),
] = None,
api_key: Annotated[
Optional[str], # noqa: UP007
str | None, # noqa: UP007
typer.Option(
help="API key for authenticating with your LiveKit server or Cloud project.",
envvar="LIVEKIT_API_KEY",
),
] = None,
api_secret: Annotated[
Optional[str], # noqa: UP007
str | None, # noqa: UP007
typer.Option(
help="API secret for authenticating with your LiveKit server or Cloud project.",
envvar="LIVEKIT_API_SECRET",
),
] = None,
drain_timeout: Annotated[
Optional[int], # noqa: UP007
int | None, # noqa: UP007
typer.Option(
help="Time in seconds to wait for jobs to finish before shutting down.",
),
Expand Down Expand Up @@ -1593,21 +1593,21 @@ def dev(
typer.Option(help="Enable auto-reload of the server when (code) files change."),
] = True,
url: Annotated[
Optional[str], # noqa: UP007
str | None, # noqa: UP007
typer.Option(
help="The WebSocket URL of your LiveKit server or Cloud project.",
envvar="LIVEKIT_URL",
),
] = None,
api_key: Annotated[
Optional[str], # noqa: UP007
str | None, # noqa: UP007
typer.Option(
help="API key for authenticating with your LiveKit server or Cloud project.",
envvar="LIVEKIT_API_KEY",
),
] = None,
api_secret: Annotated[
Optional[str], # noqa: UP007
str | None, # noqa: UP007
typer.Option(
help="API secret for authenticating with your LiveKit server or Cloud project.",
envvar="LIVEKIT_API_SECRET",
Expand Down Expand Up @@ -1673,21 +1673,21 @@ def connect(
typer.Option(help="Set the log level", case_sensitive=False),
] = LogLevel.debug,
url: Annotated[
Optional[str], # noqa: UP007
str | None, # noqa: UP007
typer.Option(
help="The WebSocket URL of your LiveKit server or Cloud project.",
envvar="LIVEKIT_URL",
),
] = None,
api_key: Annotated[
Optional[str], # noqa: UP007
str | None, # noqa: UP007
typer.Option(
help="API key for authenticating with your LiveKit server or Cloud project.",
envvar="LIVEKIT_API_KEY",
),
] = None,
api_secret: Annotated[
Optional[str], # noqa: UP007
str | None, # noqa: UP007
typer.Option(
help="API secret for authenticating with your LiveKit server or Cloud project.",
envvar="LIVEKIT_API_SECRET",
Expand All @@ -1698,7 +1698,7 @@ def connect(
typer.Option(help="Room name to connect to"),
],
participant_identity: Annotated[
Optional[str], # noqa: UP007
str | None, # noqa: UP007
typer.Option(help="Participant identity"),
] = None,
) -> None:
Expand Down
37 changes: 22 additions & 15 deletions livekit-agents/livekit/agents/voice/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@
UserInputTranscribedEvent,
UserStateChangedEvent,
)
from .filter import InterruptionFilter
from .room_io import (
_ParticipantAudioOutput,
_ParticipantStreamTranscriptionOutput,
Expand All @@ -25,29 +26,35 @@
from .transcription import TranscriptSynchronizer

__all__ = [
"AgentSession",
"VoiceActivityVideoSampler",
# Core agent
"Agent",
"ModelSettings",
"AgentTask",
"ModelSettings",
"AgentSession",
"VoiceActivityVideoSampler",
# Speech / transcription
"SpeechHandle",
"RunContext",
"UserInputTranscribedEvent",
"TranscriptSynchronizer",
# Interruption handling
"InterruptionFilter",
# Events
"AgentEvent",
"MetricsCollectedEvent",
"AgentFalseInterruptionEvent",
"AgentStateChangedEvent",
"ConversationItemAddedEvent",
"SpeechCreatedEvent",
"UserInputTranscribedEvent",
"UserStateChangedEvent",
"FunctionToolsExecutedEvent",
"MetricsCollectedEvent",
"ErrorEvent",
"CloseEvent",
"CloseReason",
"UserStateChangedEvent",
"AgentStateChangedEvent",
"FunctionToolsExecutedEvent",
"AgentFalseInterruptionEvent",
"TranscriptSynchronizer",
# IO / results
"RunContext",
"io",
"room_io",
"run_result",
# Internal outputs (intentionally exported)
"_ParticipantAudioOutput",
"_ParticipantTranscriptionOutput",
"_ParticipantStreamTranscriptionOutput",
Expand All @@ -57,7 +64,7 @@
_module = dir()
NOT_IN_ALL = [m for m in _module if m not in __all__]

__pdoc__ = {}
__pdoc__: dict[str, bool] = {}

for n in NOT_IN_ALL:
__pdoc__[n] = False
for name in NOT_IN_ALL:
__pdoc__[name] = False
Loading