Description
Overview
Bidirectional streaming enables real-time, continuous communication between clients and AI models in both directions simultaneously. Unlike traditional request-response patterns, this approach allows for simultaneous data exchange where both client and model can send and receive data incrementally. This creates a more natural interaction flow where content is processed as it arrives, without waiting for complete messages, and conversations can adapt dynamically based on ongoing inputs and outputs.
Model Support
Several providers are building models with bidirectional streaming capabilities. Some examples include:
- Amazon: Amazon's Nova Sonic model offers real-time speech processing, interruption handling, context-aware responses, and low latency interactions, making it particularly effective for voice assistants (docs).
- OpenAI: OpenAI also provides models with bidirectional streaming capabilities, allowing for dynamic conversation flows where the model can receive new information while generating a response (announcement).
Request
Support a bidirectional streaming interface in Strands.
Prototype
To help facilitate discussion, we have implemented a prototype for bidirectional streaming under https://github.com/pgrayy/strands-sdk-python-async (see README for instructions on testing). The prototype implements a flexible architecture designed to handle real-time, two-way communication between clients and AI models. The implementation focuses on supporting audio-based interactions with Nova Sonic while establishing patterns that could extend to other models and modalities. The key components are:
- Bidirectional Agent: The Agent class in the bidirectional module provides an async context manager for sending data (
send
), an async generator for receiving data (receive
), and a method to initialize bidirectional streaming (bistream
). For example usage, please see https://github.com/pgrayy/sdk-python-async/blob/main/scripts/agents/bidirectional.py. - Model Sender/Receiver: The abstract
Sender
andReceiver
interfaces define the contract for model providers. TheSender
handles outgoing events to the model with context managers for different content types (text, audio, tools), while theReceiver
processes incoming events from the model and constructs message history. - Event System: Events are structured as typed objects representing different kinds of streaming content, including session events (start/end), prompt events (start/end), and content events (text, audio, system, tool).
- Nova Implementation: The Nova implementation demonstrates how to adapt a specific model to the bidirectional interface, providing a concrete example of the architecture in action.