Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 14 additions & 10 deletions livekit-plugins/livekit-plugins-speechmatics/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,48 +21,52 @@ You should adjust your system instructions to inform the LLM of this format for

## Usage (Speechmatics end of utterance detection and speaker ID)

To use the Speechmatics end of utterance detection and speaker ID, you can use the following configuration:
To use the Speechmatics end of utterance detection and speaker ID, you can use the following configuration.

Note: The `turn_detection_mode` parameter tells the plugin to control the end of turn detection. The default is `FIXED`, which means that the plugin will not control the end of turn detection, but depends on an external trigger. In this example we use `ADAPTIVE` mode, which means that the plugin will control the end of turn detection using the plugin's own VAD detection and the pace of speech. The `turn_detection="stt"` parameter tells the plugin to use the STT engine's end of turn detection.

```python
from livekit.agents import AgentSession
from livekit.plugins import speechmatics

agent = AgentSession(
stt=speechmatics.STT(
end_of_utterance_silence_trigger=0.5,
enable_diarization=True,
speaker_active_format="<{speaker_id}>{text}</{speaker_id}>",
turn_detection_mode=speechmatics.TurnDetectionMode.ADAPTIVE,
speaker_active_format="[Speaker {speaker_id}] {text}",
speaker_passive_format="[Speaker {speaker_id} *PASSIVE*] {text}",
additional_vocab=[
speechmatics.AdditionalVocabEntry(
content="LiveKit",
sounds_like=["live kit"],
),
],
),
turn_detection="stt",
...
)
```

Note: Using the `end_of_utterance_silence_trigger` parameter will tell the STT engine to wait for this period of time from the last detected speech and then emit the full utterance to LiveKit. This may conflict with LiveKit's end of turn detection, so you may need to adjust the `min_endpointing_delay` and `max_endpointing_delay` parameters accordingly.

## Usage (LiveKit Turn Detection)

To use the LiveKit end of turn detection, the format for the output text needs to be adjusted to not include any extra content at the end of the utterance. Using `[Speaker S1] ...` as the `speaker_active_format` should work well. You may need to adjust your system instructions to inform the LLM of this format for speaker identification.

The `end_of_utterance_silence_trigger` parameter controls the amount of silence before the end of turn detection is triggered. The default is `0.5` seconds.

Usage:

```python
from livekit.agents import AgentSession
from livekit.plugins.turn_detector.english import EnglishModel
from livekit.plugins import speechmatics
from livekit.plugins import speechmatics, silero

agent = AgentSession(
stt=speechmatics.STT(
enable_diarization=True,
end_of_utterance_mode=speechmatics.EndOfUtteranceMode.NONE,
end_of_utterance_silence_trigger=0.35,
speaker_active_format="[Speaker {speaker_id}] {text}",
speaker_passive_format="[Speaker {speaker_id} *PASSIVE*] {text}",
),
turn_detector=EnglishModel(),
vad=silero.VAD.load(),
turn_detection=EnglishModel(),
min_endpointing_delay=0.5,
max_endpointing_delay=5.0,
...
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -15,28 +15,28 @@
See https://docs.livekit.io/agents/integrations/stt/speechmatics/ for more information.
"""

from .stt import STT, SpeechStream
from .tts import TTS
from .types import (
from speechmatics.voice import (
AdditionalVocabEntry,
AudioSettings,
DiarizationFocusMode,
DiarizationKnownSpeaker,
EndOfUtteranceMode,
TranscriptionConfig,
AudioEncoding,
OperatingPoint,
SpeakerFocusMode,
SpeakerIdentifier,
)

from .stt import STT, SpeechStream, TurnDetectionMode
from .tts import TTS
from .version import __version__

__all__ = [
"STT",
"TTS",
"TurnDetectionMode",
"SpeechStream",
"AdditionalVocabEntry",
"AudioSettings",
"DiarizationFocusMode",
"DiarizationKnownSpeaker",
"EndOfUtteranceMode",
"TranscriptionConfig",
"AudioEncoding",
"OperatingPoint",
"SpeakerFocusMode",
"SpeakerIdentifier",
"logger",
"__version__",
]
Expand Down
Loading
Loading