feat(gemini): Add bidirectional gemini model #14

mkmeral · 2025-10-29T14:54:56Z

Add Gemini Live API Support for Bidirectional Streaming

Description

This PR adds support for Google's Gemini Live API as a bidirectional streaming model provider, enabling real-time audio conversations with native audio input/output, image/video input, and automatic transcription.

Key Features

Gemini Live Model Provider (gemini_live.py)

Uses official google-genai SDK for robust WebSocket communication
Native audio streaming with 16kHz input and 24kHz output
Real-time audio transcription (both input and output)
Image/video frame input support for multimodal conversations
Automatic VAD-based interruption handling
Tool calling integration
Message history support

Enhanced Bidirectional Streaming

Added ImageInputEvent type for sending images/video frames
Added TranscriptEvent type for audio transcriptions (separate from text output)
Extended BidirectionalAgent.send() to accept text, audio, and image inputs
Updated abstract BidirectionalModelSession interface with send_image_content()

Test Suite Enhancements

Updated test to support both Gemini Live and Nova Sonic
Added camera capture for real-time video frame streaming (1 FPS)
Demonstrates audio + video multimodal interaction
Falls back to Nova Sonic if no Gemini API key provided

Implementation Details

The implementation follows the same architectural patterns as Nova Sonic:

Provider-agnostic event conversion
Clean separation between session management and model interface
Simplified configuration - all Gemini Live API parameters pass through directly
Proper async/await patterns with context manager for connection lifecycle

Configuration Example

from strands.experimental.bidirectional_streaming.models.gemini_live import GeminiLiveBidirectionalModel

model = GeminiLiveBidirectionalModel(
    model_id="gemini-2.5-flash-native-audio-preview-09-2025",
    api_key="your-api-key",
    params={
        "response_modalities": ["AUDIO"],
        "input_audio_transcription": {},   # Enable input transcription
        "output_audio_transcription": {},  # Enable output transcription
    }
)

Related Issues

Documentation PR

Type of Change

New feature

Testing

How have you tested the change?

Tested real-time audio conversations with Gemini Live API
Verified audio transcription (input and output) works correctly
Tested image/video frame streaming from camera
Verified tool calling integration
Tested message history support
Confirmed interruption handling via VAD
Verified fallback to Nova Sonic when no API key provided
Ran hatch fmt for code formatting

Test Environment

Python 3.12+
Dependencies: google-genai, pyaudio, opencv-python, pillow
Tested with GOOGLE_AI_API_KEY environment variable

Files Changed

New: src/strands/experimental/bidirectional_streaming/models/gemini_live.py (501 lines)
Modified: src/strands/experimental/bidirectional_streaming/agent/agent.py - Added image input support
Modified: src/strands/experimental/bidirectional_streaming/models/bidirectional_model.py - Added abstract send_image_content() method
Modified: src/strands/experimental/bidirectional_streaming/models/novasonic.py - Added stub for image input (not supported)
Modified: src/strands/experimental/bidirectional_streaming/types/bidirectional_streaming.py - Added ImageInputEvent and TranscriptEvent types
Modified: src/strands/experimental/bidirectional_streaming/tests/test_bidirectional_streaming.py - Enhanced test with Gemini Live and camera support

Verify that the changes do not break functionality or introduce warnings in consuming repositories: agents-docs, agents-tools, agents-cli

I ran hatch run prepare

Checklist

I have read the CONTRIBUTING document
I have added any necessary tests that prove my fix is effective or my feature works
I have updated the documentation accordingly
I have added an appropriate example to the documentation to outline the feature, or no new docs are needed
My changes generate no new warnings
Any dependent changes have been merged and published

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

feat(gemini): Add bidirectional gemini model

4648327

github-actions bot added the size/l label Oct 29, 2025

mkmeral had a problem deploying to manual-approval October 29, 2025 14:55 — with GitHub Actions Failure

mehtarac approved these changes Oct 29, 2025

View reviewed changes

mehtarac merged commit f8abbaa into mehtarac:main Oct 29, 2025
2 of 13 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(gemini): Add bidirectional gemini model #14

feat(gemini): Add bidirectional gemini model #14

Uh oh!

mkmeral commented Oct 29, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feat(gemini): Add bidirectional gemini model #14

feat(gemini): Add bidirectional gemini model #14

Uh oh!

Conversation

mkmeral commented Oct 29, 2025

Add Gemini Live API Support for Bidirectional Streaming

Description

Key Features

Implementation Details

Configuration Example

Related Issues

Documentation PR

Type of Change

Testing

How have you tested the change?

Test Environment

Files Changed

Checklist

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants