Skip to content

Optimize Latency for Parallel Agent Runs with Streaming #498

Closed as not planned
@adhishthite

Description

@adhishthite

Please read this first

  • Have you read the docs?Agents SDK docs -> Yes
  • Have you searched for related issues? Others may have had similar requests -> Yes

Question

I'm implementing a parallel translation pattern in the examples, where multiple agents generate translations simultaneously, and a selection agent chooses the best one. While this approach provides quality benefits, it introduces significant latency in the user experience, especially when streaming responses.

Current Implementation

The current implementation follows this pattern:

async def main():
    msg = input("Enter message for translation")
    
    with trace("Parallel translation"):
        # Run 3 translation agents in parallel (takes ~10s total)
        res_1, res_2, res_3 = await asyncio.gather(
            Runner.run(spanish_agent, msg),
            Runner.run(spanish_agent, msg),
            Runner.run(spanish_agent, msg),
        )
        
        # Collect outputs and combine
        outputs = [
            ItemHelpers.text_message_outputs(res_1.new_items),
            ItemHelpers.text_message_outputs(res_2.new_items),
            ItemHelpers.text_message_outputs(res_3.new_items),
        ]
        translations = "\n\n".join(outputs)
        
        # Run selection agent to pick best (adds more latency)
        best_translation = await Runner.run(
            translation_picker,
            f"Input: {msg}\n\nTranslations:\n{translations}",
        )

Problem Statement

The current workflow creates a significant latency issue in streaming scenarios:

  1. All translation agents must complete execution (taking ~10s in parallel)
  2. Only after all translations are complete can the selection agent begin processing
  3. The UI remains without any output until the selection agent starts streaming
  4. This latency compounds when this is part of a longer agent chain

This implementation leads to poor user experience due to long waiting periods without feedback, especially in complex agent workflows where subsequent agents depend on this translation output.

What's the recommended approach to optimize this pattern for streaming scenarios while maintaining the quality benefits of parallel execution and selection?

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionQuestion about using the SDKstale

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions