Skip to content

Fix multi-agent record_transition to check all agents for episode completion#443

Closed
darshmenon wants to merge 1 commit into
Toni-SM:mainfrom
darshmenon:fix/multi-agent-finished-episode-all-agents
Closed

Fix multi-agent record_transition to check all agents for episode completion#443
darshmenon wants to merge 1 commit into
Toni-SM:mainfrom
darshmenon:fix/multi-agent-finished-episode-all-agents

Conversation

@darshmenon
Copy link
Copy Markdown

Fixes #289.

Problem

MultiAgent.record_transition uses next(iter(terminated.values())) and next(iter(truncated.values())) to determine which environments finished an episode. This only checks the first agent in the dictionary, so if any other agent terminates first, the cumulative rewards and timesteps are not reset and tracking is wrong.

Fix

Replace the single-agent check with a logical OR across all agents:

PyTorch backend (multi_agents/torch/base.py):

# Before (only first agent checked):
finished_episodes = (next(iter(terminated.values())) + next(iter(truncated.values()))).nonzero(...)

# After (all agents checked via OR):
finished_episodes = (
    torch.stack(list(terminated.values())).any(dim=0)
    | torch.stack(list(truncated.values())).any(dim=0)
).nonzero(as_tuple=True)[0]

JAX backend (multi_agents/jax/base.py):

# Before:
_terminated = next(iter(terminated.values()))
_truncated = next(iter(truncated.values()))

# After:
_terminated = np.stack([jax.device_get(v) for v in terminated.values()]).any(axis=0)
_truncated = np.stack([jax.device_get(v) for v in truncated.values()]).any(axis=0)

An episode is considered finished when any agent signals termination or truncation — consistent with cooperative multi-agent environments where agents share a joint episode boundary.

@darshmenon darshmenon closed this by deleting the head repository May 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant