Home Assistant custom component that allows you to turn almost any camera and almost any speaker into a local voice assistant.
Component will use:
- Stream integration for receiving audio from camera (RTSP/HTTP/RTMP) and automatic transcoding of audio codec into a format suitable for Speech-to-Text (STT)
- Assist pipeline integration for run: Speech-to-Text (STT) => Natural Language Processing (NLP) => Text-to-Speech (TTS)
- Almost any Media player for play audio respose from Text-to-Speech (TTS)
Assist pipeline can use:
- openWakeWord core Add-on for wake word detection
- Whisper core Add-on for local STT
- Piper core Add-on for local TTS
- Faster Whisper custom integration for local STT
- Google Translate core integration for cloud TTS
Video instruction from fixtSE
HACS > Integrations > 3 dots (upper top corner) > Custom repositories > URL: AlexxIT/StreamAssist
, Category: Integration > Add > wait > Stream Assist > Install
Or manually copy stream_assist
folder from latest release to /config/custom_components
folder.
- Add wake word detection Add-on Settings > Add-ons > Add-on Store > openWakeWord > Install
- Config WAKE Add-on:
openWakeWord > Configuration - Add WAKE Integration:
Settings > Integrations > openWakeWord > Configure
- Add local Speech-to-Text Add-on
Settings > Add-ons > Add-on Store > Whisper > Install - Config STT Add-on:
Whisper > Configuration - Add STT Integration:
Settings > Integrations > Whisper > Configure
- Add local Text-to-Speech Add-on
Settings > Add-ons > Add-on Store > Piper > Install - Config TTS Integration:
Piper > Configuration - Add TTS Integration:
Settings > Integrations > Piper > Configure
- Config Voice assistant:
Settings > Voice assistants > Home Assistant > Select: STT, TTS and WAKE
- Add Stream Assist Integration
Settings > Integrations > Add Integration > Stream Assist - Config Stream Assist Integration
Settings > Integrations > Stream Assist > Configure
You can select or camera entity_id as audio (MIC) source or stream URL.
You can select Voice Assistant Pipeline for recognition process: WAKE => STT => NLP => TTS. By default componen will use default pipeline. You can create several Pipelines with different settings. And several Stream Assist components with different settings.
You can select one or multiple Media players (SND) to output audio response. If your camera support two way audio you can use WebRTC Camera custom integration to add it as Media player.
You can set STT start media for play "beep" after WAKE detection (ex: media-source://media_source/local/beep.mp3
).
Component has MIC switch and multiple sensors - WAKE, STT, INTENT, TTS. There may be fewer sensors, depending on the Pipeline settings.
The sensor attributes contain a lot of useful information about the results of each step of the assistant.
You can also view the pipelines running history in the Home Assistant interface:
- Settings > Voice assistants > Pipeline > 3 dots > Debug
You can run pipeline as a service. Almost all settings optional. But allow you to achieve customisations that are not possible in Hass by default.
service: stream_assist.run
data:
stream_source: rtsp://...
camera_entity_id: camera.xxx
player_entity_id: media_player.xxx
stt_start_media: media-source://media_source/local/beep.mp3
pipeline_id: abcdefg...
assist:
start_stage: wake_word # wake_word, stt, intent, tts
end_stage: tts
pipeline:
conversation_language: en
conversation_engine: homeassistant
language: en
name: Home Assistant
stt_engine: stt.faster_whisper
stt_language: en
tts_engine: tts.google_en_com
tts_language: en
tts_voice: None
wake_word_entity: wake_word.openwakeword
wake_word_id: None
wake_word_settings: { timeout: 5 }
audio_settings:
noise_suppression_level: None
auto_gain_dbfs: None
volume_multiplier: None
conversation_id: None
device_id: None
intent_input: None
tts_audio_output: None # None, wav, mp3
tts_input: None
stream:
file: ...
options: {}
-
Recommended settings for Whisper:
- Model:
small-int8
ormedium-int8
- Beam size:
5
- Model:
-
You can add remote Whisper/Piper installation from another server:
- First server: Settings > Add-ons > Whisper/Piper > Configuration > Network > Select port
- Second server: Settings > Integrations > Add integration > Wyoming Protocol > Select: first server IP, add-on port
-
You can use Google Translate integration instead of Piper, which support many languages for TTS.
-
If your environment does not allow you to install add-ons, you can install Faster Whisper custom integration for local STT.