-
Notifications
You must be signed in to change notification settings - Fork 413
Closed
Description
Problem
Currently, the iOS app requires TTS to be configured in the pipeline for Assist to work, even if the user only wants text responses. This forces users to:
- Configure a TTS provider they don't need
- Listen to audio responses they don't want
- Deal with TTS playback in quiet/public environments
The error when TTS is not configured:
PipelineRunValidationError: the pipeline does not support text-to-speech
Justification: Reading is Faster Than Listening
Research shows that reading text is significantly more efficient than listening to speech:
| Metric | Speed (wpm) | Source |
|---|---|---|
| Average silent reading | 238-260 | ScienceDirect meta-analysis |
| Average TTS speech | ~150 | Standard TTS output |
| Efficiency gain | ~60-70% | Reading vs listening |
Additionally:
- Text can be skimmed/scanned; audio must be consumed linearly
- TTS is disruptive in quiet environments (office, bedroom, public transit)
- Some users simply prefer silent interactions
Precedent: ChatGPT App
The ChatGPT iOS app provides flexible voice interaction modes that Home Assistant should emulate:
| Mode | Input | Output | Use Case |
|---|---|---|---|
| Text only | Keyboard | Text | Default, quiet environments |
| Transcription | Voice (STT) | Text | Hands-free input, silent output |
| Full audio | Voice (STT) | Voice (TTS) | Fully hands-free, driving |
ChatGPT allows users to choose their interaction style without requiring all pipeline components. Home Assistant currently forces "Full audio" mode even when users only want "Text only" or "Transcription" modes.
Proposed Changes
- Remove TTS requirement for Assist - Allow pipelines with only STT + Conversation (no TTS) to work on mobile
- Add "Mute responses" toggle - For pipelines that have TTS configured, allow users to disable audio playback in app settings
- Text in/text out always works - Typing a query should never require TTS to be configured
Use Cases Enabled
- ✅ Voice in → text out (STT only, no TTS needed)
- ✅ Text in → text out (no STT, no TTS needed)
- ✅ Voice in → voice out (current behavior, opt-in)
Copilot