Description
Please read this first
- Have you read the docs?Agents SDK docs -> Yes
- Have you searched for related issues? Others may have had similar requests -> Yes
Question
I'm implementing a parallel translation pattern in the examples
, where multiple agents generate translations simultaneously, and a selection agent chooses the best one. While this approach provides quality benefits, it introduces significant latency in the user experience, especially when streaming responses.
Current Implementation
The current implementation follows this pattern:
async def main():
msg = input("Enter message for translation")
with trace("Parallel translation"):
# Run 3 translation agents in parallel (takes ~10s total)
res_1, res_2, res_3 = await asyncio.gather(
Runner.run(spanish_agent, msg),
Runner.run(spanish_agent, msg),
Runner.run(spanish_agent, msg),
)
# Collect outputs and combine
outputs = [
ItemHelpers.text_message_outputs(res_1.new_items),
ItemHelpers.text_message_outputs(res_2.new_items),
ItemHelpers.text_message_outputs(res_3.new_items),
]
translations = "\n\n".join(outputs)
# Run selection agent to pick best (adds more latency)
best_translation = await Runner.run(
translation_picker,
f"Input: {msg}\n\nTranslations:\n{translations}",
)
Problem Statement
The current workflow creates a significant latency issue in streaming scenarios:
- All translation agents must complete execution (taking ~10s in parallel)
- Only after all translations are complete can the selection agent begin processing
- The UI remains without any output until the selection agent starts streaming
- This latency compounds when this is part of a longer agent chain
This implementation leads to poor user experience due to long waiting periods without feedback, especially in complex agent workflows where subsequent agents depend on this translation output.
What's the recommended approach to optimize this pattern for streaming scenarios while maintaining the quality benefits of parallel execution and selection?