Description
Please read this first
- Have you read the docs?Agents SDK docs yes
- Have you searched for related issues? Yes
Question
Hi together,
I created a mulit-agent system with an orchestrator (gpt 4.5) which has 7 agents as tools assigned (4o-mini) where each can do similiarity search in vectorstorages or has smaller functions like writing an e-mail or something like that.
Usually, with a request one or maximum of two agents get called by the orchestrator with their respective tools. However, we experience very long waiting times - we use the streaming method and sometimes calling an agent takes around 90 seconds (Tracing). We see that the v1/responses are the one taking long time, so similarity searches are very fast.
We tried with shortening prompts - however, the latency is still not very much better. Also tried swapping the 4.5 model out for 4o-mini, which resulted in better performance but still >10 seconds.
We also use 3 input guardrails for our orchestrator, which we think of combining to reduce API calls.
Does anyone else has ideas on how to tweak performance in general for such setups? We expose it as a chatbot at the moment where these waiting times are quite too long for the end-users.
Happy for any ideas or directions!