Description
AI Error Handling and User-Facing Messages Documentation
Executive Summary
This document outlines the comprehensive error handling strategy implemented in the SignalWire AI system, focusing on user-facing error messages delivered via ais_say()
and the conditions that trigger session termination. The system employs a graceful degradation approach with polite, human-like error messages to maintain user experience even during failures.
Error Message Categories
1. Fatal System Errors (Session Terminating)
1.1 "I'm sorry I am having a migrane. I have to hangup now."
Context: Fatal error in AI text generation/streaming
Trigger Conditions:
- Fatal error flag is set during AI response processing
- Occurs in
ai_send_text()
function during streaming operations - Typically indicates severe API communication failures or malformed responses
Technical Details:
- Sets
ais->offhook = 0
to terminate the call - Fires a
calling.ai.error
event with fatal flag - Logs error details to system logs
- Used when the AI system cannot recover from the error
1.2 "I'm sorry I am having a problem. I have to hangup now."
Context: Maximum retry attempts exceeded
Trigger Conditions:
- Error count exceeds
settings->max_tries
(default: 3, post-processing: 10) - Only triggered during streaming operations (
if (stream)
) - Indicates persistent communication or processing failures
Technical Details:
- Only spoken during streaming mode
- Implements exponential backoff retry logic before this message
- Sets
ais->offhook = 0
to terminate the call - Default
max_tries
is 3 for normal operations, 10 for post-processing
2. Recoverable Errors (Session Continuing)
2.1 "I'm sorry, can you hold on for a second?"
Context: First retry attempt
Trigger Conditions:
errors == 1
(first error encountered)- System will retry the operation after this message
- Implements exponential backoff (waits
errors
seconds before retry)
Technical Details:
- Polite way to indicate temporary processing delay
- Implements exponential backoff (1 second, then 2 seconds, etc.)
- Session continues after retry delay
- Does not terminate the call
3. Voice/TTS Configuration Errors (Non-Terminating)
3.1 "There was an error with your voice selection, please check your syntax. using a default voice instead."
Context: Voice configuration failure with fallback
Trigger Conditions:
- TTS engine initialization fails with specified voice parameters
- System falls back to default Google Cloud voice (
gcloud
,en-US-Neural2-J
) - Occurs during session initialization and runtime voice changes
Technical Details:
- Graceful degradation to default voice
- Session continues with fallback configuration
- Provides user feedback about configuration issue
- Suggests syntax checking for voice parameters
3.2 "There was an error with the current voice, sorry for the trouble."
Context: Runtime voice switching failure
Trigger Conditions:
- Voice switching fails during active conversation
- Occurs in the output thread during TTS processing
- System attempts to recover by switching to default voice
Technical Details:
- Attempts fallback to default voice
- If fallback fails, sets
ais->running = 0
- More apologetic tone for runtime interruption
- Indicates temporary service disruption
Session Termination Patterns
Termination Triggers (ais->offhook = 0
)
The system terminates sessions by setting ais->offhook = 0
in the following scenarios:
-
Fatal AI Processing Errors
- Unrecoverable API communication failures
- Malformed response data
- Critical system errors
-
Maximum Retry Exceeded
- Persistent communication failures
- Repeated processing errors
- System reliability threshold exceeded
-
Function-Triggered Hangup
- Explicit hangup command from AI functions
- Controlled session termination
- User or system-initiated disconnect
-
Transfer Operations
- Call transfer scenarios
- Session handoff to other systems
- Controlled termination for routing
Error Recovery Mechanisms
Retry Logic
- Exponential Backoff: Wait time increases with each retry (1s, 2s, 3s...)
- Maximum Attempts: Configurable via
max_tries
(default: 3, post-processing: 10) - Graceful Messages: User-friendly notifications during retry attempts
Fallback Strategies
- Default Voice: Falls back to
gcloud
en-US-Neural2-J
on voice failures - Service Degradation: Continues with reduced functionality rather than termination
- Error Logging: Comprehensive logging for debugging while maintaining user experience
Event System
- Error Events: Fires
calling.ai.error
events for external monitoring - Fatal Flag: Distinguishes between recoverable and fatal errors
- Structured Data: JSON error objects with detailed information
Best Practices Implemented
User Experience
- Polite Language: All error messages use apologetic, human-like language
- Clear Communication: Messages explain the situation without technical jargon
- Expectation Setting: Users are informed about delays and recovery attempts
- Graceful Degradation: System continues operation when possible
Technical Robustness
- Comprehensive Logging: All errors logged with appropriate severity levels
- Event Notification: External systems notified of error conditions
- Resource Cleanup: Proper memory and resource management during errors
- State Management: Consistent session state during error conditions
Error Classification
- Fatal vs Recoverable: Clear distinction between error types
- Context Awareness: Different handling for different operational contexts
- Retry Strategies: Appropriate retry logic for different error types
- Fallback Mechanisms: Multiple levels of service degradation
Configuration Parameters
Retry Settings
max_tries
: Maximum retry attempts (default: 3, post-processing: 10)- Exponential backoff timing:
errors * 1 second
Voice Fallback
- Default engine:
gcloud
- Default voice:
en-US-Neural2-J
- Automatic fallback on configuration errors
Error Reporting
- Event type:
calling.ai.error
- Includes fatal flag and error details
- Structured JSON error objects
Monitoring and Debugging
Log Levels
- ERROR: Fatal conditions and session terminations
- WARNING: Retry attempts and recoverable errors
- INFO: General error recovery information
- DEBUG: Detailed technical information
Event Monitoring
- Monitor
calling.ai.error
events for system health - Track fatal vs non-fatal error ratios
- Analyze retry patterns and success rates
Performance Metrics
- Track error rates by error type
- Monitor retry success rates
- Measure recovery times and user impact
Conclusion
The SignalWire AI system implements a sophisticated error handling strategy that prioritizes user experience while maintaining system reliability. The use of human-like error messages, graceful degradation, and comprehensive retry logic ensures that users receive a professional experience even during system difficulties. The clear separation between fatal and recoverable errors allows for appropriate response strategies while maintaining system stability.
The error handling patterns demonstrate a mature approach to production AI system reliability, with comprehensive logging, event notification, and fallback mechanisms that ensure both user satisfaction and system maintainability.