Official implementation of the AAMAS 2024 paper Measruing Policy Distance for MARL.
To run this repository you need:
i) Install the code's requirements. A virtual environment based on Conda is recommended (We will update the docker approach soon). Install with
conda create --name madps --file requirements.txt
ii) Install the supported MARL-environments, for example:
- Multi-agent Spread of Petting-Zoo Version
(Careful! In our research, we have upgraded the task's difficulty level. Replace the file directory at
madps/large_spread_example.pywithpettingzoo/mpe/scenarios/large_spread.pyto access the updated task.
You can then simply run ac_NF.py using:
python ac_NF.py with env_name='pettingzoo:pz-mpe-large-spread-v1' time_limit=50This command runs our multi-agent actor-critic training framework, with pettingzoo:pz-mpe-large-spread-v1 as the training scenario and the maximum episode step length as 50. The scenario versions, from large-spread-v1 to v6, correspond to the 15a_3c, \ 30a_3c, \ 30a_5c, \ 30a_5c_super, \ 15a_3c_shuffle, and 30a_3c_shuffle scenarios in our paper.
The MADPS code is structured as follows:
ac_NF.py includes:
- Multi-agent Environment Construction and Maintenance.
- Environment Sampling and Sample Pool Construction.
- Neural Network Evaluating and Training.
More details can be found in the comments of main function in ac_NF.py.
MADPS_NF.py includes:
compute_fusions(including MAPD and MADPS):- Train Conditional VAE (Learning the conditional representations of agents' decisions).
- Use VAE and agents' models to calculate dij(Compute the multi-agent policy distance matrix using the learned conditional representations).
- Use dij to automatically adjusting parameter sharing(Optional. Using the the multi-agent policy distance matrix to adjust the parameter sharing scheme of the agents).
calculate_N_Gaussians_BD: This function is used for parallel calculation of the Bhattacharyya distance between multiple Gaussian distributions using PyTorch.calculate_N_Gaussians_Hellinger_through_BD: This function is used for parallel calculation of the Hellinger distance between multiple Gaussian distributions using PyTorch. It requires results from thecalculate_N_Gaussians_BDfunction.calculate_N_Gaussians_WD: This function is used for the parallel calculation of the Wasserstein distance between multiple Gaussian distributions using PyTorch.
model_NF.py includes:
MADPSNet: Multi-agent neural network models that support dynamic adjustment and hierarchical adjustment of parameter sharing.MultiAgentFCNetwork: Multi-agent neural network models that support adjustment of parameter sharing (Replication of SePS algorithm).Policy: Multi-agent policy models.ConditionalVAE: Conditional VAE model.
The paper can be quickly accessed via the arxiv link. After the AAMAS 2024 conference, please cite as follows:
@inproceedings{hu2024MAPD,
title={Measuring Policy Distance for Multi-Agent Reinforcement Learning},
author={Hu, Tianyi and Pu, Zhiqiang and Ai, Xiaolin and Qiu, Tenghai and Yi, Jianqiang},
booktitle={Proceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems},
pages={834--842},
year={2024}
}
Note: Since there is no appendix on the AAMAS official link, we put the appendix of the paper in this repository
Note: Currently we are focusing on measuring more types of heterogeneity, not only just policy distance (policy heterogeneity). The new work is also developed based on this code foundation and will be launched in a few months. Stay tuned for our new work!