feat(multiagent): introduce Swarm multi-agent orchestrator #416

awsarron · 2025-07-11T14:41:14Z

Description

introduce Swarm multi-agent orchestrator
add stop_event_loop boolean to Agent class

Related Issues

#214

Documentation PR

TODO

Type of Change

New feature

Testing

How have you tested the change? Verify that the changes do not break functionality or introduce warnings in consuming repositories: agents-docs, agents-tools, agents-cli

I ran hatch run prepare

Checklist

I have read the CONTRIBUTING document
I have added any necessary tests that prove my fix is effective or my feature works
I have updated the documentation accordingly
I have added an appropriate example to the documentation to outline the feature, or no new docs are needed
My changes generate no new warnings
Any dependent changes have been merged and published

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

zastrowm · 2025-07-11T18:16:28Z

src/strands/agent/agent.py

@@ -317,6 +317,9 @@ def __init__(
                self.hooks.add_hook(hook)
        self.hooks.invoke_callbacks(AgentInitializedEvent(agent=self))

+        # When True, force stops the agent's event loop
+        self.stop_event_loop = False


Not a big fan of this existing on the agent - it's sort of a per-request state living off of the agent that complicates complicates restarting the agent.

Can we use exceptions for this case instead? StopAgentException?

good call, will do in a following commit

zastrowm · 2025-07-11T18:18:06Z

src/strands/multiagent/swarm.py

+
+    from_node: SwarmNode
+    to_node: SwarmNode
+    content: str


Content for human readable, context for extensibility/ structured data?

Can content also be a ContentBlock?

will remove SwarmMessage in next commits

zastrowm · 2025-07-11T18:21:03Z

src/strands/multiagent/swarm.py

+    max_iterations: int = 20
+    execution_timeout: float = 900.0  # Total execution timeout (seconds)
+    node_timeout: float = 300.0  # Individual node timeout (seconds)
+    ping_pong_check_nodes: int = 8  # Number of recent nodes to check for ping-pong


Could just be a knowledge gap on my end, but not sure what ping-pong here is referencing, for both fields

zastrowm · 2025-07-11T18:22:34Z

src/strands/multiagent/swarm.py

+        """Set list of available agents."""
+        self.available_nodes = nodes
+
+    def add_context(self, node: SwarmNode, key: str, value: Any) -> None:


I think for serialization purposes, @Unshure added some checks for values to ensure that all state was serializable. Should that be done here as well? I presume context is something that would be preserved if we were persisting swarms?

Why would the context not live on the node itself instead of via this shared context? Esp. given that you have to pass in the node

I think for serialization purposes, @Unshure added some checks for values to ensure that all state was serializable.

Trying to understand how SharedContext is different from AgentState? I can more easily see how context that is generic to the swarm can be useful, but this implementation makes it seem like you can only set context about specific nodes.

Read through the Swarm class, and I understand now that this is populated by the execution of tools from the agents, not by the user.

zastrowm · 2025-07-11T18:25:14Z

src/strands/multiagent/swarm.py

+            self.context[node.node_id] = {}
+        self.context[node.node_id][key] = value
+
+    def get_relevant_context(self, target_node: SwarmNode) -> dict[str, Any]:


I'm presuming this will come from docs, but I'm not really understanding what "relevant context" is.

zastrowm · 2025-07-11T18:36:24Z

src/strands/multiagent/swarm.py

+        """Initialize swarm configuration."""
+        # Validate agents have names and create SwarmNode objects
+        for i, node in enumerate(nodes):
+            if not hasattr(node, "name") or not node.name:


Here we should/can use node.name directly, no? No need to be extra safe with hasattr?

zastrowm · 2025-07-11T18:36:36Z

src/strands/multiagent/swarm.py

+
+        return self._build_result()
+
+    def _setup_swarm(self, nodes: list[Agent]) -> None:


Should we explicitly name this as agents?

we could because at the moment it is indeed just agents. I was thinking that we might expand the valid node executor types in Swarms as well, so kept it as nodes for now

zastrowm · 2025-07-11T18:37:10Z

src/strands/multiagent/swarm.py

+                logger.info("node_id=<%d> | agent has no name, dynamically generating one", node_id)
+
+            node_id = str(node.name)
+            self.nodes[node_id] = SwarmNode(node_id=node_id, executor=node)


Should we have a unique-ness check here?

zastrowm · 2025-07-11T18:38:45Z

src/strands/multiagent/swarm.py

+        swarm_ref = self  # Capture swarm reference
+
+        @tool
+        def get_swarm_context() -> dict[str, Any]:


And agents use context for more than just passing it into other tools, right?

zastrowm · 2025-07-11T18:39:49Z

src/strands/multiagent/swarm.py

+
+        for node in self.nodes.values():
+            # Use the agent's tool registry to process and register the tools
+            node.executor.tool_registry.process_tools(swarm_tools)


Should we be looking for duplicate names or protect against it somehow?

Unshure · 2025-07-12T15:13:19Z

Probably out of scope for this pr, but it would be nice to be able to stop the event loop both after a tool is called (current behavior) and after the model is called (new behavior).

Unshure · 2025-07-12T15:10:39Z

src/strands/event_loop/event_loop.py

@@ -462,7 +462,7 @@ def tool_handler(tool_use: ToolUse) -> ToolGenerator:
        tracer = get_tracer()
        tracer.end_event_loop_cycle_span(span=cycle_span, message=message, tool_result_message=tool_result_message)

-    if invocation_state["request_state"].get("stop_event_loop", False):
+    if agent.stop_event_loop or invocation_state["request_state"].get("stop_event_loop", False):


Can we remove the request_state stop_event_loop logic here? I don't like that there is a special, undocumented key to enable this behavior.

yeah I will replace this with a StopAgentException

Unshure · 2025-07-12T15:23:17Z

src/strands/multiagent/swarm.py

+    context: dict[str, dict[str, Any]] = field(default_factory=dict)
+    node_history: list[SwarmNode] = field(default_factory=list)
+    current_task: str | list[ContentBlock] | None = None
+    available_nodes: list[SwarmNode] = field(default_factory=list)


nit: Should this be a list or a set?

Unshure · 2025-07-12T15:24:47Z

src/strands/multiagent/swarm.py

+        return {
+            "task": self.current_task,
+            "node_history": [node.node_id for node in self.node_history],
+            "shared_context": {k: v for k, v in self.context.items() if v},


When would the values in the context be None?

Unshure · 2025-07-12T15:27:25Z

src/strands/multiagent/swarm.py

+        """Set list of available agents."""
+        self.available_nodes = nodes
+
+    def add_context(self, node: SwarmNode, key: str, value: Any) -> None:


I think for serialization purposes, @Unshure added some checks for values to ensure that all state was serializable.

Trying to understand how SharedContext is different from AgentState? I can more easily see how context that is generic to the swarm can be useful, but this implementation makes it seem like you can only set context about specific nodes.

Read through the Swarm class, and I understand now that this is populated by the execution of tools from the agents, not by the user.

Unshure · 2025-07-12T15:33:30Z

src/strands/multiagent/swarm.py

+
+    from_node: SwarmNode
+    to_node: SwarmNode
+    content: str


Can content also be a ContentBlock?

Unshure · 2025-07-12T15:38:22Z

src/strands/multiagent/swarm.py

+            self.completion_status = Status.FAILED
+            return False, f"execution_timeout_{config.execution_timeout}s"
+
+        # 5. Check for node ping-pong (nodes passing back and forth)


Suggested change

# 5. Check for node ping-pong (nodes passing back and forth)

# 5. Check for node ping-pong 🏓 (nodes passing back and forth)

Unshure · 2025-07-12T15:41:48Z

src/strands/multiagent/swarm.py

+    message_history: list[SwarmMessage] = field(default_factory=list)
+    iteration_count: int = 0
+    start_time: float = field(default_factory=time.time)
+    last_node_sequence: list[SwarmNode] = field(default_factory=list)


Some documentation on what these attributes represent would be helpful. I'm having a hard time wrapping my head around what this is, and how it is used with ping_pong_check_nodes

Unshure · 2025-07-12T15:44:11Z

src/strands/multiagent/swarm.py

+class SwarmResult(MultiAgentResult):
+    """Result from swarm execution - extends MultiAgentResult with swarm-specific details."""
+
+    status: Status = Status.PENDING


nit: status is already in MultiAgentResult

good spot, thank you

Unshure · 2025-07-12T15:44:43Z

src/strands/multiagent/swarm.py

+    node_history: list[SwarmNode] = field(default_factory=list)
+    message_history: list[SwarmMessage] = field(default_factory=list)
+    iteration_count: int = 0
+    final_result: str | None = None


Can a result also be a ContentBlock or list[ContentBlock]?

will remove final_result in following commits

Unshure

Can we have unit tests for this code?

Unshure · 2025-07-12T16:01:09Z

src/strands/multiagent/swarm.py

+        if not target_node:
+            return {"status": "error", "reason": f"agent_{target_agent_name}_not_found"}


This is a duplicate check, handoff_to_agent already checks for this.

Unshure · 2025-07-12T16:04:25Z

src/strands/multiagent/swarm.py

+            previous_agent.node_id,
+            target_node.node_id,
+        )
+        return {"status": "success", "target_agent": target_agent_name}


Do we need to return a tool result here? The handoff_to_agent already does that.

Unshure · 2025-07-12T16:17:21Z

src/strands/multiagent/swarm.py

+
+        logger.info("swarm task completed")
+
+    def _format_context(self, context_info: dict[str, Any]) -> str:


nit: It would be helpful to include an example of what this formatted context is supposed to look like in the docstring here.

Unshure · 2025-07-12T16:20:11Z

src/strands/multiagent/swarm.py

+
+            await self._execute_swarm()
+
+            if self.state.completion_status == Status.EXECUTING:


When is it expected that after self._execute_swarm() the self.state.completion_status is not Status.COMPLETED? Is this an error case? If so, can we at least log here?

Unshure · 2025-07-12T16:26:04Z

src/strands/multiagent/swarm.py

+
+            if not isinstance(task, str):
+                # Include additional ContentBlocks in node input
+                node_input = node_input + task


Can we convert a task that is an instance of str to a Text ContentBlock, and append it here?

Unshure · 2025-07-12T16:32:00Z

src/strands/multiagent/swarm.py

+            )
+
+            # Store result in state
+            self.state.results[node_name] = node_result


If one agent in a swarm is executed multiple times, does its result get overwritten? Is this node_result history stored somewhere?

awsarron had a problem deploying to auto-approve July 11, 2025 14:41 — with GitHub Actions Failure

feat(multiagent): introduce Swarm multi-agent orchestrator

1397e6e

awsarron force-pushed the feat-multiagent-swarm branch from cdb4dee to 1397e6e Compare July 11, 2025 18:13

awsarron had a problem deploying to auto-approve July 11, 2025 18:13 — with GitHub Actions Failure

zastrowm reviewed Jul 11, 2025

View reviewed changes

feat(multiagent): Swarm - support multi-modal inputs

2566014

awsarron temporarily deployed to auto-approve July 12, 2025 15:13 — with GitHub Actions Inactive

feat(multiagent): Swarm - add Agent.description to shared Swarm context

f006afc

awsarron temporarily deployed to auto-approve July 12, 2025 15:36 — with GitHub Actions Inactive

Unshure reviewed Jul 12, 2025

View reviewed changes


		return self._build_result()

		def _setup_swarm(self, nodes: list[Agent]) -> None:

	# 5. Check for node ping-pong (nodes passing back and forth)
	# 5. Check for node ping-pong 🏓 (nodes passing back and forth)

		if not target_node:
		return {"status": "error", "reason": f"agent_{target_agent_name}_not_found"}


		logger.info("swarm task completed")

		def _format_context(self, context_info: dict[str, Any]) -> str:


		await self._execute_swarm()

		if self.state.completion_status == Status.EXECUTING:

feat(multiagent): introduce Swarm multi-agent orchestrator #416

Are you sure you want to change the base?

feat(multiagent): introduce Swarm multi-agent orchestrator #416

Uh oh!

Conversation

awsarron commented Jul 11, 2025

Description

Related Issues

Documentation PR

Type of Change

Testing

Checklist

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Unshure Jul 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Unshure commented Jul 12, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Unshure Jul 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Unshure left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Unshure Jul 12, 2025 •

edited

Loading

Unshure Jul 12, 2025 •

edited

Loading