Record outbound phone calls + 2-channel socket#4416
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces a new multi-channel audio transcription WebSocket endpoint, along with supporting utilities for conversation management, Pusher integration, and translation. The changes include creating new files for streaming utilities and modifying the backend router to handle the new endpoint. A critical issue related to error handling and fallback strategy has been identified and needs to be addressed.
|
/gemini summary |
Summary of ChangesThis pull request adds phone call functionality to the application, leveraging the Twilio Voice SDK for call management and real-time transcription. It incorporates native platform features like CallKit on iOS for a seamless user experience and introduces a new multi-channel audio processing pipeline. The changes include backend endpoints for token generation and phone number verification, as well as Flutter plugins for handling call initiation, control, and audio streaming. Highlights
Changelog
Activity
|
|
@mdmohsin7 The TwiML webhook in Reviewed by @kenji |
|
/gemini summary |
Summary of ChangesThis pull request introduces phone call functionality to the application, leveraging the Twilio Voice SDK for call management and real-time transcription. It incorporates native platform features like CallKit on iOS for a seamless user experience and introduces a new multi-channel audio processing pipeline. The changes include backend endpoints for token generation and phone number verification, as well as Flutter plugins for handling call initiation, control, and audio streaming. Highlights
Changelog
Activity
|
Done, pls check now @beastoin |
PR Review: Record outbound phone calls + 2-channel socketSubstantial feature adding Twilio-based phone call recording with dual-channel audio streaming. Good test coverage included. Issues to Address1. Twilio signature validation bypass when auth_token missing ( if not auth_token:
return True # <-- Security risk: bypasses validationThis silently allows unauthenticated requests when 2. Inconsistent package path ( 3. Missing rate limiting on verification endpoints 4. Phone number validation could be stricter ( Minor Suggestions
Positive Notes
Verdict: Address item #1 (security issue) before merging. Items #2-4 can be follow-up tasks. by AI for @beastoin |
It's com/friend/ios coz the package name is com.friend.ios Rest all fixed, pls check now @beastoin |
|
@mdmohsin7 In Can you update the verification flow and add the missing test? by AI for @beastoin |
|
final fixes done, pls check again @beastoin's AI 🤖 |
|
conflicts @mdmohsin7 |
|
@mdmohsin7 Before review, quick questions:
Thanks! by AI for @beastoin |
|
@mdmohsin7 lets roll this out soon pls |
|
Hi @mdmohsin7 — this PR is 10 days old with a minimal description. Could you add:
This will help move it through review. Thanks! |
|
Do not refactor the transcripe.py file; please keep it as is. |
End-to-End Flow1. Phone Number Verification
2. Call Initiation
3. Call Routing (TwiML Webhook)
4. Recording & TranscriptionAudio Capture
Streaming
Backend Processing
Storage
5. Call Termination
|
Yes no changes in the transcribe.py file. I duplicated logic from that file into multiple utilities so that I can use them in the new multichannel route. In future we could use the same utilities in transcribe.py file as well (ofcourse once you review it and confirm there's nothing breaking). So right now nothing is changed in the transcribe file |
|
bump @beastoin, can you pls check this |
|
Hey, closing this for now — evidence of testing is really important for PRs to move forward. Screenshots, videos, test results, or a demo showing the feature/fix working goes a long way in helping reviewers feel confident about merging. Feel free to reopen once you've added thorough testing evidence to the PR description. Thanks for contributing! |
|
Hey @mdmohsin7 👋 Thank you so much for taking the time to contribute to Omi! We truly appreciate you putting in the effort to submit this pull request. After careful review, we've decided not to merge this particular PR. Please don't take this personally — we genuinely try to merge as many contributions as possible, but sometimes we have to make tough calls based on:
Your contribution is still valuable to us, and we'd love to see you contribute again in the future! If you'd like feedback on how to improve this PR or want to discuss alternative approaches, please don't hesitate to reach out. Thank you for being part of the Omi community! 💜 |
There already is a video evidence in the pr description, and also tests. This is the top requested feature on our feature board |
Adds multi-channel STT, per-channel demuxing, resampling, and phone call conversation lifecycle directly into the existing stream handler. Uses standard UUID for conversation IDs instead of call_id. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Multi-channel support is now handled directly in /v4/listen. These modules are no longer needed. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Update transcript parsing to handle standard segment array format. Reset call state to idle on setup failures so retries work. Move mic permission request before SDK initialization. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Required for permission_handler to show the mic permission prompt. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Mix both channels sample-by-sample before sending to pusher so the stored audio is a proper mono stream. Connect pusher at TARGET_SAMPLE_RATE (16kHz) to match the resampled audio. Use standard UUID for conversation IDs instead of call_id.
End-to-End Flow
1. Phone Number Verification
users/{uid}/phone_numbers/)2. Call Initiation
callId3. Call Routing (TwiML Webhook)
POST /v1/phone/twimlwith request signature<Dial callerId="+15551234567">+15559876543</Dial>4. Recording & Transcription
Audio Capture
[0x01|0x02][audio_data]Streaming
wss://api/v4/listen?source=phone_call&channels=2/v4/listenrouteget_current_user_uiddependencyBackend Processing
channels >= 2, activates multi-channel mode within the existing stream handlerSPEAKER_00(is_user: true)SPEAKER_01(is_user: false)Storage
5. Call Termination
source: phone_call, standard UUID conversation IDRequired Environment Variables
The following env vars must be set on the backend for phone call functionality:
TWILIO_ACCOUNT_SIDTWILIO_AUTH_TOKENTWILIO_API_KEY_SIDTWILIO_API_KEY_SECRETTWILIO_TWIML_APP_SID/v1/phone/twimlwebhook)ENCRYPTION_SECRETExternal.Device-trimmed.mp4