Conversation
There was a problem hiding this comment.
Greptile Overview
Summary
This PR implements comprehensive voice streaming functionality for the Cedar OS product roadmap example. The implementation enables end-to-end voice interaction: audio recording → transcription → LLM processing → text-to-speech synthesis → streaming audio response.
Key Changes:
- Backend Integration: Added
@mastra/voice-openaidependency and created voice stream handler with SSE-based real-time communication - Frontend Voice State: Enhanced voice slice with streaming support, proper error handling, and audio playback capabilities
- Provider Updates: Extended Mastra provider with voice streaming endpoints and event parsing
- Workflow Enhancement: Modified chat workflow to accumulate text for voice synthesis instead of streaming individual chunks
- Configuration: Updated provider config to include voice routing and enabled streaming voice settings
Technical Implementation:
- Uses WebRTC for audio capture, OpenAI Whisper for transcription, and OpenAI TTS for synthesis
- Implements proper stream handling with both Node.js Readable and Web ReadableStream compatibility
- Includes comprehensive error handling and resource cleanup
- Supports both streaming and non-streaming voice modes
Confidence Score: 3/5
- This PR is moderately safe to merge with some implementation concerns that should be addressed
- The implementation is architecturally sound with proper separation of concerns, but has several technical issues: missing environment variable validation could cause runtime errors, hardcoded audio format assumptions, and fragile stream type detection logic that could fail with certain stream implementations
- Pay special attention to voiceStreamHandler.ts for environment variable validation and streamUtils.ts for stream compatibility detection
Important Files Changed
File Analysis
| Filename | Score | Overview |
|---|---|---|
| examples-backend/product-roadmap-backend/src/mastra/voiceStreamHandler.ts | 3/5 | New voice streaming handler with transcription and LLM integration. Has some potential null handling issues and hardcoded transcription format. |
| examples-backend/product-roadmap-backend/src/utils/streamUtils.ts | 3/5 | Enhanced streaming utilities with voice support. Buffer handling logic may have edge cases with stream compatibility detection. |
| examples-backend/product-roadmap-backend/src/mastra/workflows/chatWorkflow.ts | 4/5 | Updated workflow to support voice mode with text accumulation for TTS synthesis. Clean integration of voice handling logic. |
| packages/cedar-os/src/store/voice/voiceSlice.ts | 4/5 | Comprehensive voice state management with streaming support. Well-structured with proper error handling and resource cleanup. |
| packages/cedar-os/src/store/agentConnection/providers/mastra.ts | 4/5 | Enhanced Mastra provider with voice streaming capabilities. Robust event parsing and proper stream handling. |
Sequence Diagram
sequenceDiagram
participant User
participant CedarOS as Cedar OS (Frontend)
participant MastraProvider as Mastra Provider
participant VoiceHandler as Voice Stream Handler
participant OpenAIVoice as @mastra/voice-openai
participant ChatWorkflow as Chat Workflow
participant LLM as OpenAI LLM
User->>CedarOS: Record audio and submit
CedarOS->>MastraProvider: voiceStreamLLM(audioData, settings)
MastraProvider->>VoiceHandler: POST /voice/stream
VoiceHandler->>OpenAIVoice: listen(audioBuffer, {filetype: 'webm'})
OpenAIVoice->>VoiceHandler: transcription text
VoiceHandler->>CedarOS: SSE: {type: 'transcription', transcription}
VoiceHandler->>ChatWorkflow: start workflow with transcription
ChatWorkflow->>LLM: streamVNext(transcription + context)
loop Text streaming chunks
LLM->>ChatWorkflow: text-delta chunks
ChatWorkflow->>ChatWorkflow: accumulate pendingText (for voice mode)
end
ChatWorkflow->>OpenAIVoice: speak(pendingText)
OpenAIVoice->>ChatWorkflow: audio stream
ChatWorkflow->>VoiceHandler: audio data
VoiceHandler->>CedarOS: SSE: {type: 'audio', audioData, content}
VoiceHandler->>CedarOS: SSE: {type: 'done'}
CedarOS->>User: Play audio response & show text
12 files reviewed, 3 comments
| const transcription = await voiceProvider.listen(Readable.from(buf), { | ||
| filetype: 'webm', | ||
| }); |
There was a problem hiding this comment.
style: hardcoded filetype: 'webm' assumes input format - consider making dynamic
| const transcription = await voiceProvider.listen(Readable.from(buf), { | |
| filetype: 'webm', | |
| }); | |
| const transcription = await voiceProvider.listen(Readable.from(buf), { | |
| filetype: audioFile.type.includes('webm') ? 'webm' : 'wav', | |
| }); |
Prompt To Fix With AI
This is a comment left during a code review.
Path: examples-backend/product-roadmap-backend/src/mastra/voiceStreamHandler.ts
Line: 85:87
Comment:
style: hardcoded `filetype: 'webm'` assumes input format - consider making dynamic
```suggestion
const transcription = await voiceProvider.listen(Readable.from(buf), {
filetype: audioFile.type.includes('webm') ? 'webm' : 'wav',
});
```
How can I resolve this? If you propose a fix, please make it concise.| if (typeof (speechStream as ReadableStream).getReader === 'function') { | ||
| // Web ReadableStream |
There was a problem hiding this comment.
logic: stream type detection is fragile - checking for getReader method may miss other ReadableStream-like objects
| if (typeof (speechStream as ReadableStream).getReader === 'function') { | |
| // Web ReadableStream | |
| if ('getReader' in speechStream && typeof speechStream.getReader === 'function') { |
Prompt To Fix With AI
This is a comment left during a code review.
Path: examples-backend/product-roadmap-backend/src/utils/streamUtils.ts
Line: 118:119
Comment:
logic: stream type detection is fragile - checking for `getReader` method may miss other ReadableStream-like objects
```suggestion
if ('getReader' in speechStream && typeof speechStream.getReader === 'function') {
```
How can I resolve this? If you propose a fix, please make it concise.| speechModel: { apiKey: process.env.OPENAI_API_KEY!, name: 'tts-1' }, | ||
| listeningModel: { | ||
| apiKey: process.env.OPENAI_API_KEY!, | ||
| name: 'whisper-1', | ||
| }, |
There was a problem hiding this comment.
logic: missing environment variable validation will cause runtime errors if API key is not set
Prompt To Fix With AI
This is a comment left during a code review.
Path: examples-backend/product-roadmap-backend/src/mastra/voiceStreamHandler.ts
Line: 8:12
Comment:
logic: missing environment variable validation will cause runtime errors if API key is not set
How can I resolve this? If you propose a fix, please make it concise.
No description provided.