AWorld Framework Issues Report: Swarm & Parallel Execution
This report documents several critical issues identified and resolved within the aworld framework during the debugging of the RDR Swarm Proof of Concept (PoC). These issues primarily affect the orchestration of multiple agents and state management in parallel execution environments.
1. Context Loss in Parallel Sub-Agents
- File Path:
aworld/agents/parallel_llm_agent.py
- Issue: The
async_policy method in ParallelizableAgent was not correctly passing the current task's context to the parallel workers it spawned.
- Root Cause: It relied on
self.context which might be uninitialized in the runner's lifecycle, instead of extracting the live context from the incoming Message object.
- Impact: Sub-agents encountered a
NoneType error when accessing context.agent_info. This prevented any parallel execution from succeeding.
- Fix: Modified
async_policy to prioritize message.context from kwargs.
2. Invalid 'finished' State Logic
- File Path:
aworld/agents/parallel_llm_agent.py
- Issue: The
finished() method was defined as a standard method but accessed as a property by the task runner.
- Root Cause: Missing
@property decorator.
- Impact: Boolean checks like
if agent.finished: were evaluating the method object itself (which is always truthy) rather than its return value. This caused agents to be incorrectly flagged as finished immediately.
- Fix: Applied the
@property decorator to the finished method.
3. Strict Handoff Validation (is_agent)
- File Path:
aworld/core/agent/base.py
- Issue: The
is_agent() utility function, used to determine if a message represents a transition between agents, had logic that failed for "completion" signals.
- Root Cause: The function didn't correctly handle cases where
tool_name was absent but the action represented an agent-to-agent handoff or a final step.
- Impact: The runner would fail to route messages to successor agents in a Swarm graph if the transition didn't involve an explicit tool call.
- Fix: Permitted transitions where
tool_name is absent but the context implies an agent handoff.
4. Inconsistent Agent Result Serialization
- File Path:
aworld/agents/parallel_llm_agent.py
- Issue: The
_agent_result() method had a mismatch in its signature and async/sync handling compared to the base LLMAgent.
- Root Cause: Architectural drift between the base agent class and the parallelizable extension.
- Impact: Errors during the construction of the final
Message payload after parallel workers finished their tasks.
- Fix: Standardized
_agent_result to be synchronous and correctly wrap parallel results into a singular ActionModel for the next agent in the chain.
AWorld Framework Issues Report: Swarm & Parallel Execution
This report documents several critical issues identified and resolved within the
aworldframework during the debugging of the RDR Swarm Proof of Concept (PoC). These issues primarily affect the orchestration of multiple agents and state management in parallel execution environments.1. Context Loss in Parallel Sub-Agents
aworld/agents/parallel_llm_agent.pyasync_policymethod inParallelizableAgentwas not correctly passing the current task'scontextto the parallel workers it spawned.self.contextwhich might be uninitialized in the runner's lifecycle, instead of extracting the live context from the incomingMessageobject.NoneTypeerror when accessingcontext.agent_info. This prevented any parallel execution from succeeding.async_policyto prioritizemessage.contextfromkwargs.2. Invalid 'finished' State Logic
aworld/agents/parallel_llm_agent.pyfinished()method was defined as a standard method but accessed as a property by the task runner.@propertydecorator.if agent.finished:were evaluating the method object itself (which is always truthy) rather than its return value. This caused agents to be incorrectly flagged as finished immediately.@propertydecorator to thefinishedmethod.3. Strict Handoff Validation (is_agent)
aworld/core/agent/base.pyis_agent()utility function, used to determine if a message represents a transition between agents, had logic that failed for "completion" signals.tool_namewas absent but the action represented an agent-to-agent handoff or a final step.tool_nameis absent but the context implies an agent handoff.4. Inconsistent Agent Result Serialization
aworld/agents/parallel_llm_agent.py_agent_result()method had a mismatch in its signature and async/sync handling compared to the baseLLMAgent.Messagepayload after parallel workers finished their tasks._agent_resultto be synchronous and correctly wrap parallel results into a singularActionModelfor the next agent in the chain.