Gemini Realtime: Transcribe model audio via gemini api #1446

jayeshp19 · 2025-02-04T19:59:42Z

add latest models from google
updated testcase for array arguments

changeset-bot · 2025-02-04T19:59:48Z

🦋 Changeset detected

Latest commit: a6eac32

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 1 package

Name	Type
livekit-plugins-google	Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

update model ids

chore

davidzhao · 2025-02-06T05:39:41Z

livekit-plugins/livekit-plugins-google/livekit/plugins/google/beta/realtime/realtime_api.py

@@ -382,7 +382,7 @@ def _on_input_speech_done(self, content: TranscriptionContent) -> None:
        # TODO: implement sync mechanism to make sure the transcribed user speech is inside the chat_ctx and always before the generated agent speech

    def _on_agent_speech_done(self, content: TranscriptionContent) -> None:


when interrupted, are we only transcribing until the moment of interruption?

No, the current implementation transcribes the entire text (all the frames which are received before interruption). It's hard to determine the exact point of interruption since we receive frames faster than the actual playback.

got it. I think that's fine.. in the v1 branch, the synchronization/truncation logic will be downstream from the model.. model should just produce the entire thing.

jayeshp19 added 5 commits February 4, 2025 18:40

debug

81a5af6

updates

bee58d5

updates

1d7488a

updates

b24fecf

updates

b70507f

jayeshp19 added 2 commits February 5, 2025 22:21

update google's latest models

37e8101

changeset

825c91c

jayeshp19 requested a review from a team February 5, 2025 16:53

jayeshp19 marked this pull request as ready for review February 5, 2025 16:54

jayeshp19 added 6 commits February 5, 2025 22:27

mend

2d12d97

update model ids

mend

9099598

chore

mend

90ae1c0

chore

update model id

658dcbd

update model id

d16f42e

update test case

a6eac32

davidzhao approved these changes Feb 6, 2025

View reviewed changes

davidzhao mentioned this pull request Feb 6, 2025

Gemini 2.0 Flash Realtime Agent return inaccuracy transcriptions #1443

Closed

jayeshp19 merged commit 5f977fa into main Feb 6, 2025
14 checks passed

jayeshp19 deleted the gemini-realtime-agent-audio branch February 6, 2025 19:40

This was referenced Feb 6, 2025

Version Packages #1438

Merged

Version Packages bashimr/agents#1

Open

Version Packages Toubat/agents#1

Open

Version Packages martin-purplefish/agents#3

Closed

Version Packages az-impaq/agents#1

Open

jayesh-mivi pushed a commit to mivi-dev-org/custom-livekit-agents that referenced this pull request Jun 4, 2025

Gemini Realtime: Transcribe model audio via gemini api (livekit#1446)

09dc788

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gemini Realtime: Transcribe model audio via gemini api #1446

Gemini Realtime: Transcribe model audio via gemini api #1446

Uh oh!

jayeshp19 commented Feb 4, 2025 •

edited

Loading

Uh oh!

changeset-bot bot commented Feb 4, 2025 •

edited

Loading

Uh oh!

davidzhao Feb 6, 2025

Uh oh!

jayeshp19 Feb 6, 2025 •

edited

Loading

Uh oh!

davidzhao Feb 6, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		@@ -382,7 +382,7 @@ def _on_input_speech_done(self, content: TranscriptionContent) -> None:
		# TODO: implement sync mechanism to make sure the transcribed user speech is inside the chat_ctx and always before the generated agent speech

		def _on_agent_speech_done(self, content: TranscriptionContent) -> None:

Gemini Realtime: Transcribe model audio via gemini api #1446

Gemini Realtime: Transcribe model audio via gemini api #1446

Uh oh!

Conversation

jayeshp19 commented Feb 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

changeset-bot bot commented Feb 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🦋 Changeset detected

Uh oh!

davidzhao Feb 6, 2025

Choose a reason for hiding this comment

Uh oh!

jayeshp19 Feb 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

davidzhao Feb 6, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jayeshp19 commented Feb 4, 2025 •

edited

Loading

changeset-bot bot commented Feb 4, 2025 •

edited

Loading

jayeshp19 Feb 6, 2025 •

edited

Loading