GitHub - taherfattahi/mini-mappo: Multi-agent PPO (MAPPO) in a 2-agent GridWorld with centralized critic, parameter sharing, and real-time episode visualization

What is MAPPO?

MAPPO adapts the popular PPO algorithm for multi-agent environments. It relies on a specific paradigm called Centralized Training, Decentralized Execution (CTDE).

Decentralized Execution (The Actor): When the agents are actually playing (or "executing"), they only see their own local observations. They don't know what others see.
Centralized Training (The Critic): During training, we allow the "Critic" (the network that estimates how good a situation is) to see everything—the global state or the observations of all agents combined. This helps the agents learn cooperative strategies faster.

Demo Video

Architecture

Actor ($\pi_{\theta}$): Input is the local observation ($o_i$). Output is the action probability distribution.
Critic ($V_{\phi}$): Input is the global state ($s$). In this implementation, the global state is simply all agents' observations concatenated together. Output is a single value score.

Data Collection (Rollout)

Observations
Global States (all obs combined)
Actions taken
Log probabilities of those actions
Rewards received

Advantage Estimation (GAE)

We calculate how much "better" an action was compared to the average expectation. We use Generalized Advantage Estimation (GAE), which balances bias and variance to stabilize training.

The PPO Update

We update the networks using the collected data.

Clipping: Don't change the policy too drastically in one step. It clips the ratio between the new and old policy probabilities.
Value Loss: We minimize the error between the Critic's prediction and the actual returns.
Entropy: We add a bonus for randomness to prevent the agent from getting stuck doing the same thing too early.

Contributing

Contributions are welcome. If you find a bug or have a feature request, please open an issue or submit a pull request.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
image		image
LICENSE		LICENSE
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

What is MAPPO?

Demo Video

Architecture

Data Collection (Rollout)

Advantage Estimation (GAE)

The PPO Update

Contributing

About

Uh oh!

Releases

Packages

Languages

License

taherfattahi/mini-mappo

Folders and files

Latest commit

History

Repository files navigation

What is MAPPO?

Demo Video

Architecture

Data Collection (Rollout)

Advantage Estimation (GAE)

The PPO Update

Contributing

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages