-
Notifications
You must be signed in to change notification settings - Fork 2.8k
Gemini Realtime: Transcribe model audio via gemini api #1446
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🦋 Changeset detectedLatest commit: a6eac32 The changes in this PR will be included in the next version bump. This PR includes changesets to release 1 package
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
| @@ -382,7 +382,7 @@ def _on_input_speech_done(self, content: TranscriptionContent) -> None: | |||
| # TODO: implement sync mechanism to make sure the transcribed user speech is inside the chat_ctx and always before the generated agent speech | |||
|
|
|||
| def _on_agent_speech_done(self, content: TranscriptionContent) -> None: | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when interrupted, are we only transcribing until the moment of interruption?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, the current implementation transcribes the entire text (all the frames which are received before interruption). It's hard to determine the exact point of interruption since we receive frames faster than the actual playback.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
got it. I think that's fine.. in the v1 branch, the synchronization/truncation logic will be downstream from the model.. model should just produce the entire thing.
Uh oh!
There was an error while loading. Please reload this page.