diff --git a/README.md b/README.md index 3e5803523..1fa18f70a 100644 --- a/README.md +++ b/README.md @@ -1,36 +1,58 @@ -# MultiAgent and QualityDiversity ReinforcementLearning -This repository contains all the main informations about the university and thesis project on Multi Agent Reinforcemnt Learning and Quality Diversity. +# Multi-Agent Reinforcement Learning with Quality Diversity +The aim of the project is to develop a multi-agent reinforcement learning algorithm that uses quality diversity to create the sets of agents to solve a given multi agent task. The project is part of the master thesis developed by [Nielsen Erik](github.com/NielsenErik) at the University of Trento. +This repo contains the code and relevant sources used to developed the thesis project. The project is supervised by Giovanni Iacca and Andrea Ferigo from University of Trento and follows their current researches. -## Papers and References -In [references](/references) there is a comprehensive list of references of the studied papers to complete the project. - -## Source codes -In [src](/src) are stored all the scripts developed during the project. The produced scripts are based and continue the work developed in the following papers by Giovanni Iacca, Marco Crespi, Andrea Ferigo, Leonardo Lucio Custode: +## Introduction +In [src](/src) are stored all the scripts developed during the project. The produced scripts and code are based on the following papers: - [A Population-Based Approach for Multi-Agent Interpretable Reinforcement Learning](https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4467882) - [Quality Diversity Evolutionary Learning of Decision Trees](https://arxiv.org/abs/2208.12758) -It is possible to run the aforementioned code by following the instructions in the README.md file in the [src/base](/src/base/) folder. -Otherwise by changing the working direcotry `cd src` and running the following command: +## Installation +The project is developed in python 3.11. Here are the steps to install the project ```bash -chmod +x script.sh -source script.sh +git clone https://github.com/NielsenErik/MultiAgent_and_QualityDiversity_ReinforcementLearning +cd MultiAgent_and_QualityDiversity_ReinforcementLearning +pip install -r requirements.txt ``` -It will appear the following menu: +### Note +1. The project is developed in python 3.11. It is recommended to use a virtual environment to install the project and its dependencies. + +2. Magent2 is the test environment of the project. To install it, first is required to clone the project and then install the downloaded repository: ```bash -On the terminal output will appear the following menu: -Hello! Here you can set environment and run codes -Please enter an integer to select an option: -[1]. Activate environment -[2]. Deactivate environment -[3]. Run code dts4marl -[4]. Run code marldts -[5]. Run code qd_marl -[6]. Run code qd_marl with debug mode -[7]. Run test environment -[8]. Exit +git clone https://github.com/Farama-Foundation/MAgent2 +cd MAgent2 +pip install -e . ``` -Press 1 to activate the python venv. -Then run `./script.sh` again and select one of the possible experiment. -If 3 or 4 is selected it will run the projects developed by Giovanni Iacca, Marco Crespi, Andrea Ferigo, Leonardo Lucio Custode. -if 5 or 6 (for debug and serialized mode) is selected it will run the project developed in this repository which apply a Quality Diversity approach to a Multi Agent Reinforcement Learning task. \ No newline at end of file +This solution was proposed by [Issue #19](https://github.com/Farama-Foundation/MAgent2/issues/19) of the MAgent2 repository + +## Running the project +To run the project, if the installation is done by creating a virtual environment, first is required to activate the virtual environment: +```bash +source venv/bin/activate +``` +Then, to run the project, execute the following command: +```bash +chmod +x script.sh +./script.sh +``` +The script will pop different running options, choose the desired option and the project will start running. + +## Structure +The project is structured as follows: +1. [src](/src): Contains the source code of the project + 1. [agents](/src/agents): Contains the agents classes and algorithms used in the project + 2. [algorithm](/src/algorithm): Contains the algorithm regarding Map-Elites and Quality Diversity, developed using PyRibs, and the classes for Genetic Algorithm and Genetic Programming + 3. [config](/src/config): Contains the configuration files used in the project, such as the configuration of the environment, the algorithm, and the agents and most importantly the configuration of Map-Elites archive + 4. [decisiontrees](/src/decisiontrees): Contains the classes for create and manage the Decision Trees, RL-Decision Trees, Leaves and the Conditions on the Trees Nodes + 5. [utils](/src/utils): Contains the utility functions used in the project +2. [logs](/logs): Contains the logs files generated during the execution of the project +3. [hpc_scripts](/hpc_scripts): Contains the scripts used to run the project on the High-Performance Computing (HPC) cluster + +## Papers and References +In [references](/references) there is a comprehensive list of references of the studied papers to complete the project. + +## Source codes +In [src](/src) are stored all the scripts developed during the project. The produced scripts work only if linked to the code developed in the following papers by Giovanni Iacca, Marco Crespi, Andrea Ferigo, Leonardo Lucio Custode: +- [A Population-Based Approach for Multi-Agent Interpretable Reinforcement Learning](https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4467882) +- [Quality Diversity Evolutionary Learning of Decision Trees](https://arxiv.org/abs/2208.12758) \ No newline at end of file diff --git a/hpc_run_all.sh b/hpc_run_all.sh new file mode 100755 index 000000000..b51f9081f --- /dev/null +++ b/hpc_run_all.sh @@ -0,0 +1,6 @@ +#!/bin/bash + +cd ${PBS_O_WORKDIR}/MARL-QD/Marl-QD_Private/ +for file in $(ls $PWD/hpc_scripts/with_set/); do + qsub $PWD/hpc_scripts/with_set/$file +done \ No newline at end of file diff --git a/hpc_scripts/no_set/hpc_script_pyribsCMA_best.sh b/hpc_scripts/no_set/hpc_script_pyribsCMA_best.sh new file mode 100644 index 000000000..bfccdb261 --- /dev/null +++ b/hpc_scripts/no_set/hpc_script_pyribsCMA_best.sh @@ -0,0 +1,22 @@ +#!/bin/bash + +#resources allcoation +#PBS -l select=1:ncpus=12:mem=100gb -l place=pack:excl + +#set max execution time +#PBS -l walltime=6:00:00 + +#execution queue configs +#PBS -q short_cpuQ + +#execution outpust name +#PBS -N marl_battlefield_hpc + +#set mail notification +#PBS -M erik.nielsen@studenti.unitn.it + +cd ${PBS_O_WORKDIR} + +module load python-3.8.13 +source $PWD/pyenv_hpc/bin/activate +python $PWD/src/QD_MARL/marl_qd_launcher_no_sets.py $PWD/src/QD_MARL/configs/hpc/no_sets/battlefield_hpc_pyribsCMA_best.json 4 \ No newline at end of file diff --git a/hpc_scripts/no_set/hpc_script_pyribsCMA_coach.sh b/hpc_scripts/no_set/hpc_script_pyribsCMA_coach.sh new file mode 100644 index 000000000..4277236fd --- /dev/null +++ b/hpc_scripts/no_set/hpc_script_pyribsCMA_coach.sh @@ -0,0 +1,22 @@ +#!/bin/bash + +#resources allcoation +#PBS -l select=1:ncpus=12:mem=100gb -l place=pack:excl + +#set max execution time +#PBS -l walltime=6:00:00 + +#execution queue configs +#PBS -q short_cpuQ + +#execution outpust name +#PBS -N marl_battlefield_hpc + +#set mail notification +#PBS -M erik.nielsen@studenti.unitn.it + +cd ${PBS_O_WORKDIR} + +module load python-3.8.13 +source $PWD/pyenv_hpc/bin/activate +python $PWD/src/QD_MARL/marl_qd_launcher_no_sets.py $PWD/src/QD_MARL/configs/hpc/no_sets/battlefield_hpc_pyribsCMA_coach.json 4 \ No newline at end of file diff --git a/hpc_scripts/no_set/hpc_script_pyribsCMA_random.sh b/hpc_scripts/no_set/hpc_script_pyribsCMA_random.sh new file mode 100644 index 000000000..dc7fecdff --- /dev/null +++ b/hpc_scripts/no_set/hpc_script_pyribsCMA_random.sh @@ -0,0 +1,21 @@ +#!/bin/bash + +#resources allcoation +#PBS -l select=1:ncpus=12:mem=100gb -l place=pack:excl + +#set max execution time +#PBS -l walltime=6:00:00 + +#execution queue configs +#PBS -q short_cpuQ + +#execution outpust name +#PBS -N marl_battlefield_hpc + +#set mail notification +#PBS -M erik.nielsen@studenti.unitn.it +cd ${PBS_O_WORKDIR} + +module load python-3.8.13 +source $PWD/pyenv_hpc/bin/activate +python $PWD/src/QD_MARL/marl_qd_launcher_no_sets.py $PWD/src/QD_MARL/configs/hpc/no_sets/battlefield_hpc_pyribsCMA_random.json 4 \ No newline at end of file diff --git a/hpc_scripts/no_set/hpc_script_pyribs_best.sh b/hpc_scripts/no_set/hpc_script_pyribs_best.sh new file mode 100644 index 000000000..6e0b6d1fb --- /dev/null +++ b/hpc_scripts/no_set/hpc_script_pyribs_best.sh @@ -0,0 +1,22 @@ +#!/bin/bash + +#resources allcoation +#PBS -l select=1:ncpus=12:mem=100gb -l place=pack:excl + +#set max execution time +#PBS -l walltime=6:00:00 + +#execution queue configs +#PBS -q short_cpuQ + +#execution outpust name +#PBS -N marl_battlefield_hpc + +#set mail notification +#PBS -M erik.nielsen@studenti.unitn.it + +cd ${PBS_O_WORKDIR} + +module load python-3.8.13 +source $PWD/pyenv_hpc/bin/activate +python $PWD/src/QD_MARL/marl_qd_launcher_no_sets.py $PWD/src/QD_MARL/configs/hpc/no_sets/battlefield_hpc_pyribs_best.json 4 \ No newline at end of file diff --git a/hpc_scripts/no_set/hpc_script_pyribs_coach.sh b/hpc_scripts/no_set/hpc_script_pyribs_coach.sh new file mode 100644 index 000000000..95c1fdb2e --- /dev/null +++ b/hpc_scripts/no_set/hpc_script_pyribs_coach.sh @@ -0,0 +1,22 @@ +#!/bin/bash + +#resources allcoation +#PBS -l select=1:ncpus=12:mem=100gb -l place=pack:excl + +#set max execution time +#PBS -l walltime=6:00:00 + +#execution queue configs +#PBS -q short_cpuQ + +#execution outpust name +#PBS -N marl_battlefield_hpc + +#set mail notification +#PBS -M erik.nielsen@studenti.unitn.it + +cd ${PBS_O_WORKDIR} + +module load python-3.8.13 +source $PWD/pyenv_hpc/bin/activate +python $PWD/src/QD_MARL/marl_qd_launcher_no_sets.py $PWD/src/QD_MARL/configs/hpc/no_sets/battlefield_hpc_pyribs_coach.json 4 \ No newline at end of file diff --git a/hpc_scripts/no_set/hpc_script_pyribs_random.sh b/hpc_scripts/no_set/hpc_script_pyribs_random.sh new file mode 100644 index 000000000..e1caf5a35 --- /dev/null +++ b/hpc_scripts/no_set/hpc_script_pyribs_random.sh @@ -0,0 +1,22 @@ +#!/bin/bash + +#resources allcoation +#PBS -l select=1:ncpus=12:mem=100gb -l place=pack:excl + +#set max execution time +#PBS -l walltime=6:00:00 + +#execution queue configs +#PBS -q short_cpuQ + +#execution outpust name +#PBS -N marl_battlefield_hpc + +#set mail notification +#PBS -M erik.nielsen@studenti.unitn.it + +cd ${PBS_O_WORKDIR} + +module load python-3.8.13 +source $PWD/pyenv_hpc/bin/activate +python $PWD/src/QD_MARL/marl_qd_launcher_no_sets.py $PWD/src/QD_MARL/configs/hpc/no_sets/battlefield_hpc_pyribs_random.json 4 \ No newline at end of file diff --git a/hpc_scripts/with_set/hpc_script_pyribsCMA_best.sh b/hpc_scripts/with_set/hpc_script_pyribsCMA_best.sh new file mode 100644 index 000000000..6c2ab8762 --- /dev/null +++ b/hpc_scripts/with_set/hpc_script_pyribsCMA_best.sh @@ -0,0 +1,22 @@ +#!/bin/bash + +#resources allcoation +#PBS -l select=1:ncpus=12:mem=100gb -l place=pack:excl + +#set max execution time +#PBS -l walltime=30:00:00 + +#execution queue configs +#PBS -q common_cpuQ + +#execution outpust name +#PBS -N marl_battlefield_hpc + +#set mail notification +#PBS -M erik.nielsen@studenti.unitn.it + +cd ${PBS_O_WORKDIR} + +module load python-3.8.13 +source $PWD/pyenv_hpc/bin/activate +python $PWD/src/QD_MARL/marl_qd_launcher.py $PWD/src/QD_MARL/configs/hpc/with_sets/battlefield_hpc_pyribsCMA_best.json 4 \ No newline at end of file diff --git a/hpc_scripts/with_set/hpc_script_pyribsCMA_coach.sh b/hpc_scripts/with_set/hpc_script_pyribsCMA_coach.sh new file mode 100644 index 000000000..133f364c1 --- /dev/null +++ b/hpc_scripts/with_set/hpc_script_pyribsCMA_coach.sh @@ -0,0 +1,22 @@ +#!/bin/bash + +#resources allcoation +#PBS -l select=1:ncpus=12:mem=100gb -l place=pack:excl + +#set max execution time +#PBS -l walltime=30:00:00 + +#execution queue configs +#PBS -q common_cpuQ + +#execution outpust name +#PBS -N marl_battlefield_hpc + +#set mail notification +#PBS -M erik.nielsen@studenti.unitn.it + +cd ${PBS_O_WORKDIR} + +module load python-3.8.13 +source $PWD/pyenv_hpc/bin/activate +python $PWD/src/QD_MARL/marl_qd_launcher.py $PWD/src/QD_MARL/configs/hpc/with_sets/battlefield_hpc_pyribsCMA_coach.json 4 \ No newline at end of file diff --git a/hpc_scripts/with_set/hpc_script_pyribsCMA_random.sh b/hpc_scripts/with_set/hpc_script_pyribsCMA_random.sh new file mode 100644 index 000000000..52c7ad7bc --- /dev/null +++ b/hpc_scripts/with_set/hpc_script_pyribsCMA_random.sh @@ -0,0 +1,21 @@ +#!/bin/bash + +#resources allcoation +#PBS -l select=1:ncpus=12:mem=100gb -l place=pack:excl + +#set max execution time +#PBS -l walltime=30:00:00 + +#execution queue configs +#PBS -q common_cpuQ + +#execution outpust name +#PBS -N marl_battlefield_hpc + +#set mail notification +#PBS -M erik.nielsen@studenti.unitn.it +cd ${PBS_O_WORKDIR} + +module load python-3.8.13 +source $PWD/pyenv_hpc/bin/activate +python $PWD/src/QD_MARL/marl_qd_launcher.py $PWD/src/QD_MARL/configs/with_sets/hpc/battlefield_hpc_pyribsCMA_random.json 4 \ No newline at end of file diff --git a/hpc_scripts/with_set/hpc_script_pyribs_best.sh b/hpc_scripts/with_set/hpc_script_pyribs_best.sh new file mode 100644 index 000000000..2eab95d0d --- /dev/null +++ b/hpc_scripts/with_set/hpc_script_pyribs_best.sh @@ -0,0 +1,22 @@ +#!/bin/bash + +#resources allcoation +#PBS -l select=1:ncpus=12:mem=100gb -l place=pack:excl + +#set max execution time +#PBS -l walltime=30:00:00 + +#execution queue configs +#PBS -q common_cpuQ + +#execution outpust name +#PBS -N marl_battlefield_hpc + +#set mail notification +#PBS -M erik.nielsen@studenti.unitn.it + +cd ${PBS_O_WORKDIR} + +module load python-3.8.13 +source $PWD/pyenv_hpc/bin/activate +python $PWD/src/QD_MARL/marl_qd_launcher.py $PWD/src/QD_MARL/configs/hpc/with_sets/battlefield_hpc_pyribs_best.json 4 \ No newline at end of file diff --git a/hpc_scripts/with_set/hpc_script_pyribs_coach.sh b/hpc_scripts/with_set/hpc_script_pyribs_coach.sh new file mode 100644 index 000000000..92ccfbee6 --- /dev/null +++ b/hpc_scripts/with_set/hpc_script_pyribs_coach.sh @@ -0,0 +1,22 @@ +#!/bin/bash + +#resources allcoation +#PBS -l select=1:ncpus=12:mem=100gb -l place=pack:excl + +#set max execution time +#PBS -l walltime=30:00:00 + +#execution queue configs +#PBS -q common_cpuQ + +#execution outpust name +#PBS -N marl_battlefield_hpc + +#set mail notification +#PBS -M erik.nielsen@studenti.unitn.it + +cd ${PBS_O_WORKDIR} + +module load python-3.8.13 +source $PWD/pyenv_hpc/bin/activate +python $PWD/src/QD_MARL/marl_qd_launcher.py $PWD/src/QD_MARL/configs/hpc/with_sets/battlefield_hpc_pyribs_coach.json 4 \ No newline at end of file diff --git a/hpc_scripts/with_set/hpc_script_pyribs_random.sh b/hpc_scripts/with_set/hpc_script_pyribs_random.sh new file mode 100644 index 000000000..842593a41 --- /dev/null +++ b/hpc_scripts/with_set/hpc_script_pyribs_random.sh @@ -0,0 +1,22 @@ +#!/bin/bash + +#resources allcoation +#PBS -l select=1:ncpus=12:mem=100gb -l place=pack:excl + +#set max execution time +#PBS -l walltime=30:00:00 + +#execution queue configs +#PBS -q common_cpuQ + +#execution outpust name +#PBS -N marl_battlefield_hpc + +#set mail notification +#PBS -M erik.nielsen@studenti.unitn.it + +cd ${PBS_O_WORKDIR} + +module load python-3.8.13 +source $PWD/pyenv_hpc/bin/activate +python $PWD/src/QD_MARL/marl_qd_launcher.py $PWD/src/QD_MARL/configs/hpc/with_sets/battlefield_hpc_pyribs_random.json 4 \ No newline at end of file diff --git a/requirements.txt b/requirements.txt new file mode 100644 index 000000000..7c04d9257 --- /dev/null +++ b/requirements.txt @@ -0,0 +1,70 @@ +box2d-py==2.3.5 +cffi==1.15.1 +chess==1.7.0 +click==8.1.7 +cloudpickle==2.2.1 +cmaes==0.10.0 +cmake==3.27.0 +contourpy==1.1.1 +cycler==0.11.0 +Farama-Notifications==0.0.4 +filelock==3.12.4 +fonttools==4.42.1 +gym==0.26.2 +gym-notices==0.0.8 +gymnasium==0.29.0 +hanabi-learning-environment==0.0.4 +inspyred==1.0.2 +Jinja2==3.1.2 +joblib==1.3.1 +kiwisolver==1.4.5 +lit==16.0.6 +llvmlite==0.40.1 +magent2 @ file:///home/jawa17/Documents/Unitn/Master/ProjectCourse/Source/MAgent2 +MarkupSafe==2.1.3 +matplotlib==3.8.0 +mpmath==1.3.0 +multi-agent-ale-py==0.1.11 +networkx==3.1 +numba==0.57.1 +numpy==1.24.4 +numpy-groupies==0.9.22 +nvidia-cublas-cu11==11.10.3.66 +nvidia-cuda-cupti-cu11==11.7.101 +nvidia-cuda-nvrtc-cu11==11.7.99 +nvidia-cuda-runtime-cu11==11.7.99 +nvidia-cudnn-cu11==8.5.0.96 +nvidia-cufft-cu11==10.9.0.58 +nvidia-curand-cu11==10.2.10.91 +nvidia-cusolver-cu11==11.4.0.1 +nvidia-cusparse-cu11==11.7.4.91 +nvidia-nccl-cu11==2.14.3 +nvidia-nvtx-cu11==11.7.91 +opencv-python==4.8.0.76 +packaging==23.1 +pandas==2.1.0 +pettingzoo==1.22.4 +Pillow==10.0.0 +pycparser==2.21 +pygame==2.3.0 +pymunk==6.2.0 +pyparsing==3.1.1 +python-dateutil==2.8.2 +pytz==2023.3.post1 +PyYAML==6.0.1 +ribs==0.7.0 +rlcard==1.0.5 +scikit-learn==1.3.0 +scipy==1.11.1 +seaborn==0.13.0 +shapely==2.0.1 +six==1.16.0 +sortedcontainers==2.4.0 +sympy==1.12 +termcolor==2.3.0 +threadpoolctl==3.2.0 +torch==2.0.1 +tqdm==4.66.1 +triton==2.0.0 +typing_extensions==4.7.1 +tzdata==2023.3 diff --git a/script.sh b/script.sh new file mode 100755 index 000000000..f57948b01 --- /dev/null +++ b/script.sh @@ -0,0 +1,62 @@ +#!/usr/bin/bash +echo "Hello! Here you can set environment and run codes" +echo "Please enter an integer to select an option:" +echo "[1]. Activate environment" +echo "[2]. Deactivate environment" +echo "[3]. Run code dts4marl" +echo "[4]. Run code marldts" +echo "[5]. Run code qd_marl" +echo "[6]. Run code qd_marl in debug mode" +echo "[7]. Run code qd_marl without sets" +echo "[8]. Run code qd_marl with reduced sizes" +echo "[9]. Run the qd_marl with a team per individual, test size" +echo "[10]. Exit" +read option +if [ $option -eq 1 ] +then + echo "Activating environment..." + source pyenv-marl-qd/bin/activate +elif [ $option -eq 2 ] +then + echo "Deactivating environment..." + deactivate +elif [ $option -eq 3 ] +then + echo "Running code..." + python3 src/base/dts4marl/launcher.py src/base/dts4marl/battlefield.json 4 + +elif [ $option -eq 4 ] +then + echo "Running code..." + python3 src/base/marl_dts/src/experiment_launchers/pz_advpursuit_reduced_obs_shared_launcher.py src/base/marl_dts/src/configs/magent_advpursuit_single_team.json 1 +elif [ $option -eq 5 ] +then + echo "Running code..." + python3 src/QD_MARL/marl_qd_launcher.py src/QD_MARL/configs/local/battlefield.json 4 +elif [ $option -eq 6 ] +then + echo "Running code in DEBUG MODE..." + python3 src/QD_MARL/marl_qd_launcher.py src/QD_MARL/configs/local/battlefield_test.json 4 --debug +elif [ $option -eq 7 ] +then + echo "Running test environment..." + python3 src/QD_MARL/marl_qd_launcher_no_sets.py src/QD_MARL/configs/local/battlefield_test.json 4 +elif [ $option -eq 8 ] +then + echo "Running test environment..." + python3 src/QD_MARL/marl_qd_launcher.py src/QD_MARL/configs/local/battlefield_test.json 4 +elif [ $option -eq 9 ] +then + echo "Running test for team per individual..." + python3 src/QD_MARL/marl_qd_per_individual.py src/QD_MARL/configs/local/battlefield_test_per_ind.json 4 +elif [ $option -eq 10 ] +then + echo "Exiting..." + exit +else + echo "Invalid option" + echo "Exiting..." + exit +fi +``` +``` \ No newline at end of file diff --git a/src/QD_MARL/__init__.py b/src/QD_MARL/__init__.py new file mode 100644 index 000000000..e69de29bb diff --git a/src/QD_MARL/agents/__init__.py b/src/QD_MARL/agents/__init__.py new file mode 100644 index 000000000..fbab8d33f --- /dev/null +++ b/src/QD_MARL/agents/__init__.py @@ -0,0 +1 @@ +from agents.agents import Agent, CoachAgent diff --git a/src/QD_MARL/agents/__pycache__/__init__.cpython-311.pyc b/src/QD_MARL/agents/__pycache__/__init__.cpython-311.pyc new file mode 100644 index 000000000..6cf791b1a Binary files /dev/null and b/src/QD_MARL/agents/__pycache__/__init__.cpython-311.pyc differ diff --git a/src/QD_MARL/agents/__pycache__/__init__.cpython-38.pyc b/src/QD_MARL/agents/__pycache__/__init__.cpython-38.pyc new file mode 100644 index 000000000..9dde9e4d1 Binary files /dev/null and b/src/QD_MARL/agents/__pycache__/__init__.cpython-38.pyc differ diff --git a/src/QD_MARL/agents/__pycache__/agents.cpython-311.pyc b/src/QD_MARL/agents/__pycache__/agents.cpython-311.pyc new file mode 100644 index 000000000..77e23e739 Binary files /dev/null and b/src/QD_MARL/agents/__pycache__/agents.cpython-311.pyc differ diff --git a/src/QD_MARL/agents/__pycache__/agents.cpython-38.pyc b/src/QD_MARL/agents/__pycache__/agents.cpython-38.pyc new file mode 100644 index 000000000..82ee48143 Binary files /dev/null and b/src/QD_MARL/agents/__pycache__/agents.cpython-38.pyc differ diff --git a/src/QD_MARL/agents/agents.py b/src/QD_MARL/agents/agents.py new file mode 100644 index 000000000..3bf1f4e21 --- /dev/null +++ b/src/QD_MARL/agents/agents.py @@ -0,0 +1,157 @@ +import os +import sys +from utils.print_outputs import * + +sys.path.append(".") +import random +import time +from copy import deepcopy +from math import sqrt +import pettingzoo + +from inspyred import ec +import numpy as np + + +class Agent: + def __init__(self, name, squad, set_, tree, manual_policy, to_optimize): + self._name = name + self._squad = squad + self._set = set_ + self._tree = tree.deep_copy() if tree is not None else None + self._manual_policy = manual_policy + self._to_optimize = to_optimize + self._score = [] + + def get_name(self): + return self._name + + def get_squad(self): + return self._squad + + def get_set(self): + return self._set + + def to_optimize(self): + return self._to_optimize + + def get_tree(self): + return self._tree.deep_copy() + + def get_output(self, observation): + if self._to_optimize: + return self._tree.get_output(observation) + else: + return self._manual_policy.get_output(observation) + + def set_reward(self, reward): + self._tree.set_reward(reward) + self._score[-1] += reward + + def get_score_statistics(self, params): + return getattr(np, f"{params['type']}")(a=self._score, **params["params"]) + + def new_episode(self): + self._score.append(0) + + def has_policy(self): + return not self._manual_policy is None + + def __str__(self): + return f"Name: {self._name}; Squad: {self._squad}; Set: {self._set}; Optimize: {str(self._to_optimize)}" + + +class CoachAgent: + + # Coach agent which select the team of agent from the pool produced by the initial population + # of map elite, it is based on hte Genetic Algorithm from inspyred library + # https://pythonhosted.org/inspyred/reference.html#module-inspyred.ec + + # Class arguments: + # team_fitnesses: list of fitnesses of the agents in the pool + # initial_pop: initial population of the map elite + # config: configuration of the algorithm + + # Class methods: + # init_algorithm: initialize the algorithm parameters + # set_generator: set the generator of the algorithm, generates candidates for the function to optimize + # set_evaluator: set the evaluator of the algorithm, evaluates the candidates + # get_final_pop: get the final population of the algorithm + + def __init__(self, config, me = None): + self._config = config + self._me = me + self.random = random.Random() + self.random.seed(self._config["seed"]) + self._pop_size = self._config["pop_size"] + self._batch_size = self._config["batch_size"] + self._algorithm = self.set_algorithm() + self._pop_fitnesses = None + self._pop_desc = None + + def set_algorithm(self): + # Type of avilable algorithms: + # ec.GA, ec.EvolutionaryComputation + name = self._config["algorithm"] + + return getattr(ec, name)(self.random) + + def init_algogrithm(self): + args = { + "setdefault" + } + self._algorithm.terminator = ec.terminators.evaluation_termination + self._algorithm.replacer = ec.replacers.generational_replacement + self._algorithm.variator = [ + ec.variators.uniform_crossover, + ec.variators.gaussian_mutation, + ] + self._algorithm.selector = ec.selectors.tournament_selection + + def set_generator(self, random, args): + # generate candidates + # return list of lists of indices in population + return [random.randint(0, len(self._pop_desc)-1) for _ in range(self._batch_size)] + + def set_evaluator(self, candidates, args): + # evaluate the candidates + # return list of tuples (index in population, fitness) + res = [] + for cs in candidates: + team = [] + index = [] + for c in cs: + team.append(self._pop_fitnesses[c]) + res.append(np.mean(team)) + return res + + def get_descriptions(self, index): + descriptors = [] + for i in index: + descriptors.append(self._pop_desc[i]) + return descriptors + + def get_squad(self, n_squad): + solutions = [] + me_pop = self._me._archive.data() + self._pop_desc = me_pop["solution"] + self._pop_fitnesses = me_pop["objective"] + + final_pop = self._algorithm.evolve( + generator=self.set_generator, + evaluator=self.set_evaluator, + maximaze=True, + initial_pop_storage=self._pop_fitnesses, + ) + final_pop_fitnesses = np.asarray([ind.fitness for ind in final_pop]) + final_pop_candidates = np.asarray([ind.candidate for ind in final_pop]) + + sort_indexes = sorted(range(len(final_pop_fitnesses)), key=final_pop_fitnesses.__getitem__, reverse=True) + final_pop_fitnesses = final_pop_fitnesses[sort_indexes] + final_pop_candidates = final_pop_candidates[sort_indexes] + for i in range(n_squad): + solutions.append(self.get_descriptions(final_pop_candidates[i])) + return solutions + + def __str__(self): + return f"Coach config: {self._config}" diff --git a/src/QD_MARL/algorithms/Depracated/map_elites_Pyribs.py b/src/QD_MARL/algorithms/Depracated/map_elites_Pyribs.py new file mode 100644 index 000000000..7ca2485ce --- /dev/null +++ b/src/QD_MARL/algorithms/Depracated/map_elites_Pyribs.py @@ -0,0 +1,496 @@ +#!/usr/bin/env python +# -*- coding: utf-8 -*- + +import abc +from cmath import inf +from tkinter import Grid +import numpy as np +from copy import deepcopy +from .common import OptMetaClass +from decisiontrees import Leaf, Condition +from operator import gt, lt, add, sub, mul +from processing_element import ProcessingElementFactory, PEFMetaClass +from ribs.archives._cvt_archive import CVTArchive +from ribs.archives._grid_archive import GridArchive +from ribs.archives._sliding_boundaries_archive import SlidingBoundariesArchive +from ribs.archives import EliteBatch, Elite +from ribs.archives._archive_data_frame import ArchiveDataFrame +from ribs.visualize import cvt_archive_heatmap +from ribs.visualize import grid_archive_heatmap +from ribs.visualize import sliding_boundaries_archive_heatmap +import matplotlib.pyplot as plt +from .individuals import * + + + +class MapElites_Pyribs(ProcessingElementFactory, metaclass=OptMetaClass): + def __init__(self, **kwargs): + + """ + Initializes the algorithm + + :map_size: The size of the map + :map_bounds: List of bounds + :init_pop_size: number of initial solutions + :maximize: Boolean indicating if is a maximization problem + :batch_pop: Number of population generated for iteration + :c_factory: The factory for the conditions + :l_factory: The factory for the leaves + :bounds: dictionary containing the bounds for the two factories. + It should contain two keys: "condition" and "leaf". + The values must contain the bounds + (a dict with keys (type, min, max)) + for all the parameters returned + by "get_trainable_parameters" + :max_depth: Maximum depth for the trees + + """ + self._log_path = kwargs["log_path"] + self._map_size = kwargs["map_size"] + self._map_bound = kwargs["map_bounds"] + self._cx_prob = kwargs["cx_prob"] if "cx_prob" in kwargs else 0 + self._init_pop_size = kwargs["init_pop_size"] + self._batch_pop = kwargs["batch_pop"] + self._maximize = kwargs["maximize"] + if not len(self._map_bound) == len(self._map_size): + raise Exception("number of bound must match number of dimension") + + self._c_factory = kwargs["c_factory"] + self._l_factory = kwargs["l_factory"] + self._bounds = kwargs["bounds"] + self._max_depth = kwargs["max_depth"] + self._cond_depth = kwargs.get("cond_depth", 2) + self._pop = [] + self._archive_type = kwargs["archive"] + self._bins = kwargs["bins"] + self._bins_sliding = kwargs["sliding_bins"] + self._solution_dim = kwargs["solution_dim"] + if self._archive_type == "CVT": + self._archive = CVTArchive(self._bins,self._map_bound) + elif self._archive_type == "Grid": + self._archive = GridArchive(solution_dim=self._solution_dim, dims=self._map_size, ranges=self._map_bound) + elif self._archive_type == "SlidingBoundaries": + self._archive = SlidingBoundariesArchive(self._bins_sliding,self._map_bound) + else: + raise Exception("archive not valid") + self._counter = 1 # number inserted in sol field of the archive + self._gen_number = 1 + self._max_fitness = -inf + self._selection_type = self.set_selection_type(kwargs["selection_type"]) + + def _random_var(self): + index = np.random.randint(0, self._bounds["input_index"]["max"]) + return GPVar(index) + + def _random_const(self): + index = np.random.uniform(self._bounds["float"]["min"], self._bounds["float"]["max"]) + return GPConst(index) + + def _random_expr(self, depth=0): + if depth < self._cond_depth - 1: + type_ = np.random.randint(0, 3) + else: + type_ = np.random.randint(0, 2) + + if type_ == 0: + return self._random_var() + elif type_ == 1: + return self._random_const() + else: + l = self._random_expr(depth + 1) + r = self._random_expr(depth + 1) + op = np.random.choice([add, sub, mul, safediv]) + return GPArithNode(op, l, r) + + def _random_condition(self): + left = self._random_expr() + right = self._random_expr() + while isinstance(left, GPConst) and isinstance(right, GPConst): + left = self._random_expr() + right = self._random_expr() + + op = np.random.choice([gt, lt]) + + return GPNodeIf(GPNodeCondition(op, left, right), None, None) + + def _random_leaf(self): + tp = self._l_factory.get_trainable_parameters() + + if len(tp) == 0: + return self._l_factory.create() + else: + params = [] + + for param in tp: + min_ = self._bounds[param]["min"] + max_ = self._bounds[param]["max"] + if self._bounds[param]["type"] == "int": + params.append(np.random.randint(min_, max_)) + elif self._bounds[param]["type"] == "float": + params.append(np.random.uniform(min_, max_)) + else: + raise ValueError("Unknown type") + + return self._l_factory.create(*params) + + def _get_random_leaf_or_condition(self): + if np.random.uniform() < 0.5: + return self._random_leaf() + return self._random_condition() + + def _get_depth(self, node): + """BFS search""" + fringe = [(0, node)] + max_ = 0 + while len(fringe) > 0: + d, n = fringe.pop(0) + if isinstance(node, Leaf) or \ + isinstance(node, GPNodeCondition) or \ + isinstance(node, GPExpr) or \ + n is None: + continue + + if d > max_: + max_ = d + + if not isinstance(n, Leaf): + fringe.append((d + 1, n._then)) + fringe.append((d + 1, n._else)) + return max_ + + def _reduce_expr_len(self, expr): + fringe = [(0, expr)] + + max_ = 0 + while len(fringe) > 0: + d, cur = fringe.pop(0) + if isinstance(cur, GPArithNode): + if d + 1 > self._cond_depth: + cur.set_left(self._random_expr(d + 1)) + cur.set_right(self._random_expr(d + 1)) + else: + fringe.append((d + 1, cur.get_left())) + fringe.append((d + 1, cur.get_right())) + #print(d) + return expr + + def _count_expr_len(self, expr): + fringe = [(0, expr)] + + max_ = 0 + while len(fringe) > 0: + d, cur = fringe.pop(0) + if isinstance(cur, GPArithNode): + fringe.append((d + 1, cur.get_left())) + fringe.append((d + 1, cur.get_right())) + if d > max_: + max_=d + return max_ + + + def _get_cond_depth(self, root): + """BFS search""" + + fringe = [root] + max_ = 0 + cc = 1 + while len(fringe) > 0: + cur = fringe.pop(0) + cc += 1 + if isinstance(cur, GPNodeIf): + cond = cur._condition + a = self._count_expr_len(cond.get_left()) + b = self._count_expr_len(cond.get_right()) + d = max(a,b ) + max_ = max(d, max_) + fringe.append(cur.get_then()) + fringe.append(cur.get_else()) + return max_ + + def _limit_cond_depth(self, root): + """ + Limits the depth of the tree + """ + fringe = [root] + while len(fringe) > 0: + cur = fringe.pop(0) + + if isinstance(cur, GPNodeIf): + cond = cur._condition + + cond.set_left(self._reduce_expr_len(cond.get_left())) + cond.set_right(self._reduce_expr_len(cond.get_right())) + + fringe.append(cur.get_then()) + fringe.append(cur.get_else()) + return root + + def _limit_depth(self, root): + """ + Limits the depth of the tree + """ + fringe = [(0, root)] + + while len(fringe) > 0: + d, cur = fringe.pop(0) + + if isinstance(cur, GPNodeIf): + if d + 1 == self._max_depth: + cur.set_then(self._random_leaf()) + cur.set_else(self._random_leaf()) + fringe.append((d + 1, cur.get_left())) + fringe.append((d + 1, cur.get_right())) + return root + + def _get_descriptor(self, ind): + return self._get_depth(ind), self._get_cond_depth(ind) + + def get_all_pop(self): + df = self._archive.as_pandas(include_metadata=True) + dict_to_return = dict() + for elite in df.iterelites(): + dict_to_return[(int(elite[2][0]),int(elite[2][1]))] = (elite[4]._genes,elite[1]) + return dict_to_return.items() + + def _init_pop(self): + pop = [] + grow = self._init_pop_size + + for i in range(grow): + root = self._get_random_leaf_or_condition() + fringe = [root] + + while len(fringe) > 0: + node = fringe.pop(0) + + if isinstance(node, Leaf): + continue + + if self._get_depth(root) < self._max_depth - 1: + left = self._get_random_leaf_or_condition() + right = self._get_random_leaf_or_condition() + else: + left = self._random_leaf() + right = self._random_leaf() + + node.set_then(left) + node.set_else(right) + + fringe.append(left) + fringe.append(right) + + pop.append(IndividualGP(root)) + self._pop = pop + return pop + + def _mutation(self, p): + p1 = p.deep_copy()._genes + #print(type(p1)) + cp1 = None + + p1nodes = [(None, None, p1)] + + fringe = [IndividualGP(p1)] + while len(fringe) > 0: + node = fringe.pop(0) + if not isinstance(node, Leaf) and not isinstance(node, IndividualGP): + fringe.append(node.get_left()) + fringe.append(node.get_right()) + + p1nodes.append((node, True, node.get_left())) + p1nodes.append((node, False, node.get_right())) + + cp1 = np.random.randint(0, len(p1nodes)) + + parent = IndividualGP(p1nodes[cp1][0]) + old_node = IndividualGP(p1nodes[cp1][2]) + if not isinstance(old_node, GPNodeCondition) or \ + not isinstance(old_node, GPExpr): + new_node = self._get_random_leaf_or_condition() + else: + new_node = self._random_expr() + + if not isinstance(new_node, Leaf) and \ + not isinstance(new_node, GPExpr) and \ + not isinstance(new_node, IndividualGP): + if not isinstance(old_node, Leaf) and \ + not isinstance(old_node, IndividualGP): + new_node.set_then(old_node.get_left()) + new_node.set_else(old_node.get_right()) + else: + new_node.set_then(self._random_leaf()) + new_node.set_else(self._random_leaf()) + + if p1nodes[cp1][1] is not None: + if p1nodes[cp1][1]: + parent.set_then(new_node) + else: + parent.set_else(new_node) + else: + p1 = new_node + p1 = self._limit_depth(p1) + p1 = self._limit_cond_depth(p1) + return IndividualGP(p1) + + + def _crossover(self, par1, par2): + p1, p2 = par1.copy()._genes, par2.copy()._genes + cp1 = None + cp2 = None + p1, p2 = IndividualGP(p1), IndividualGP(p2) + p1nodes = [(None, None, p1)] + + fringe = [p1] + while len(fringe) > 0: + node = fringe.pop(0) + + if not isinstance(node, Leaf) and not isinstance(node, IndividualGP) and not isinstance(node, Elite): + fringe.append(node.get_left()) + fringe.append(node.get_right()) + + p1nodes.append((node, True, node.get_left())) + p1nodes.append((node, False, node.get_right())) + + cp1 = np.random.randint(0, len(p1nodes)) + st1 = p1nodes[cp1][2] + + p2nodes = [(None, None, p2)] + + fringe = [p2] + while len(fringe) > 0: + node = fringe.pop(0) + if not isinstance(node, Leaf) and \ + not isinstance(node, GPVar) and \ + not isinstance(node, GPConst) and \ + not isinstance(node, IndividualGP) and \ + not isinstance(node, EliteBatch): + fringe.append(node.get_left()) + fringe.append(node.get_right()) + + if type(node.get_left()) == type(st1): + p2nodes.append((node, True, node.get_left())) + if type(node.get_right()) == type(st1): + p2nodes.append((node, False, node.get_right())) + + cp2 = np.random.randint(0, len(p2nodes)) + + st2 = p2nodes[cp2][2] + + if cp1 != 0: + if p1nodes[cp1][1]: + p1nodes[cp1][0].set_then(st2) + else: + p1nodes[cp1][0].set_else(st2) + else: + p1 = st2 + + if cp2 != 0: + if p2nodes[cp2][1]: + p2nodes[cp2][0].set_then(st1) + else: + p2nodes[cp2][0].set_else(st1) + else: + p2 = st1 + + return IndividualGP(p1), IndividualGP(p2) + + def set_selection_type(self, selection_type = 'random'): + return selection_type + + def set_pop_selection(self, coach_index = None): + if self._selection_type == "random": + self._pop = [ + IndividualGP(self._archive.sample_elites(1)) #metadata + for _ in range(self._batch_pop) + ] + elif self._selection_type == "best": + self._pop = [ + IndividualGP(self._archive.best_elite) #metadata + for _ in range(self._batch_pop) + ] + elif self._selection_type == "coach": + if coach_index is None: + raise Exception("coach index not valid") + team_index = self._archive.int_to_grid_index(coach_index) + self._pop = [ + IndividualGP(self._archive.retrieve_single(individual)) #metadata + for individual in team_index + ] + else: + raise Exception("selection type not valid") + return [p._genes for p in self._pop] + + def ask(self, coach_index = None): + self._pop = [] + if self._archive.empty: + self._pop = self._init_pop() + else: + temp = list() + self._pop = self.set_pop_selection(coach_index = coach_index) + for i in range(0, len(self._pop), 2): + p1 = IndividualGP(self._pop[i]) + + if i + 1 < len(self._pop): + p2 = IndividualGP(self._pop[i + 1]) + else: + p2 = None + o1, o2 = None, None + + # Crossover + if p2 is not None: + if np.random.uniform() < self._cx_prob: + o1, o2 = self._crossover(p1, p2) + temp.append(o1) + temp.append(o2) + else: + temp.append(p1) + temp.append(p2) + else: + temp.append(p1) + self._pop = [self._mutation(p) for p in temp] + return [p._genes for p in self._pop] + + def tell(self,fitnesses,data=None): + + for p in zip(self._pop,fitnesses): + desc = self._get_descriptor(p[0]._genes) + p[0]._fitness = p[1] + thr = [abs((max(self._map_bound[i]) - min(self._map_bound[i])) / self._map_size[i]) for i in + range(len(self._map_size))] + desc = [int((desc[i] - min(self._map_bound[i])) / thr[i]) for i in range(len(self._map_size))] + for i in range(len(self._map_size)): + if desc[i] < 0: + desc[i] = 0 + elif desc[i] >= self._map_size[i]: + desc[i] = self._map_size[i] - 1 + desc = tuple(desc) + status, value = self._archive.add_single(desc, p[1], desc, self._counter) + #print(status, value) + self._counter += 1 + + #Visualize archives + if max(fitnesses) > self._max_fitness: + self._max_fitness = max(fitnesses) + print_info("New best at generation: ",self._gen_number-1, " fitness: ",max(fitnesses)) + if self._gen_number%5 == 0 : + plt.figure(figsize=(8, 6)) + if self._archive_type == "CVT": + cvt_archive_heatmap(self._archive,vmin=-200,vmax=-100) + elif self._archive_type == "Grid": + grid_archive_heatmap(self._archive,vmin=-200,vmax=-100) + elif self._archive_type == "SlidingBoundaries": + sliding_boundaries_archive_heatmap(self._archive,vmin=0,vmax=500) + else: + raise Exception("archive not valid") + + if self._log_path is not None: + plt.ylabel("Condition Depth") + plt.xlabel("Depth") + plt.title("Map Elites Archive Depth at Generation: "+str(self._gen_number)) + os.makedirs(os.path.join(self._log_path, "archives_depth"), exist_ok=True) + saving_path = os.path.join(self._log_path, "archives_depth/archive_depth_at_gen_"+str(self._gen_number)+".png") + plt.savefig(saving_path) + plt.close() + self._gen_number += 1 + + \ No newline at end of file diff --git a/src/QD_MARL/algorithms/__pycache__/__init__.cpython-38.pyc b/src/QD_MARL/algorithms/__pycache__/__init__.cpython-38.pyc new file mode 100644 index 000000000..577b3f068 Binary files /dev/null and b/src/QD_MARL/algorithms/__pycache__/__init__.cpython-38.pyc differ diff --git a/src/QD_MARL/algorithms/__pycache__/common.cpython-38.pyc b/src/QD_MARL/algorithms/__pycache__/common.cpython-38.pyc new file mode 100644 index 000000000..638a81683 Binary files /dev/null and b/src/QD_MARL/algorithms/__pycache__/common.cpython-38.pyc differ diff --git a/src/QD_MARL/algorithms/__pycache__/continuous_optimization.cpython-38.pyc b/src/QD_MARL/algorithms/__pycache__/continuous_optimization.cpython-38.pyc new file mode 100644 index 000000000..dbf19b775 Binary files /dev/null and b/src/QD_MARL/algorithms/__pycache__/continuous_optimization.cpython-38.pyc differ diff --git a/src/QD_MARL/algorithms/__pycache__/genetic_algorithm.cpython-38.pyc b/src/QD_MARL/algorithms/__pycache__/genetic_algorithm.cpython-38.pyc new file mode 100644 index 000000000..fa872a5c7 Binary files /dev/null and b/src/QD_MARL/algorithms/__pycache__/genetic_algorithm.cpython-38.pyc differ diff --git a/src/QD_MARL/algorithms/__pycache__/genetic_programming.cpython-38.pyc b/src/QD_MARL/algorithms/__pycache__/genetic_programming.cpython-38.pyc new file mode 100644 index 000000000..b01a0fa5c Binary files /dev/null and b/src/QD_MARL/algorithms/__pycache__/genetic_programming.cpython-38.pyc differ diff --git a/src/QD_MARL/algorithms/__pycache__/grammatical_evolution.cpython-38.pyc b/src/QD_MARL/algorithms/__pycache__/grammatical_evolution.cpython-38.pyc new file mode 100644 index 000000000..818087e88 Binary files /dev/null and b/src/QD_MARL/algorithms/__pycache__/grammatical_evolution.cpython-38.pyc differ diff --git a/src/QD_MARL/algorithms/__pycache__/individuals.cpython-38.pyc b/src/QD_MARL/algorithms/__pycache__/individuals.cpython-38.pyc new file mode 100644 index 000000000..6c877ca4a Binary files /dev/null and b/src/QD_MARL/algorithms/__pycache__/individuals.cpython-38.pyc differ diff --git a/src/QD_MARL/algorithms/__pycache__/mapElitesCMA.cpython-38.pyc b/src/QD_MARL/algorithms/__pycache__/mapElitesCMA.cpython-38.pyc new file mode 100644 index 000000000..2667a345e Binary files /dev/null and b/src/QD_MARL/algorithms/__pycache__/mapElitesCMA.cpython-38.pyc differ diff --git a/src/QD_MARL/algorithms/__pycache__/mapElitesCMA_pyRibs.cpython-38.pyc b/src/QD_MARL/algorithms/__pycache__/mapElitesCMA_pyRibs.cpython-38.pyc new file mode 100644 index 000000000..5faa4910d Binary files /dev/null and b/src/QD_MARL/algorithms/__pycache__/mapElitesCMA_pyRibs.cpython-38.pyc differ diff --git a/src/QD_MARL/algorithms/__pycache__/mapElitesCMA_pyRibs_GE.cpython-38.pyc b/src/QD_MARL/algorithms/__pycache__/mapElitesCMA_pyRibs_GE.cpython-38.pyc new file mode 100644 index 000000000..ab1c031fe Binary files /dev/null and b/src/QD_MARL/algorithms/__pycache__/mapElitesCMA_pyRibs_GE.cpython-38.pyc differ diff --git a/src/QD_MARL/algorithms/__pycache__/map_elites.cpython-38.pyc b/src/QD_MARL/algorithms/__pycache__/map_elites.cpython-38.pyc new file mode 100644 index 000000000..d881ec721 Binary files /dev/null and b/src/QD_MARL/algorithms/__pycache__/map_elites.cpython-38.pyc differ diff --git a/src/QD_MARL/algorithms/__pycache__/map_elites_Pyribs.cpython-38.pyc b/src/QD_MARL/algorithms/__pycache__/map_elites_Pyribs.cpython-38.pyc new file mode 100644 index 000000000..c33ca10c2 Binary files /dev/null and b/src/QD_MARL/algorithms/__pycache__/map_elites_Pyribs.cpython-38.pyc differ diff --git a/src/QD_MARL/algorithms/__pycache__/map_elites_ge.cpython-38.pyc b/src/QD_MARL/algorithms/__pycache__/map_elites_ge.cpython-38.pyc new file mode 100644 index 000000000..72c3150fc Binary files /dev/null and b/src/QD_MARL/algorithms/__pycache__/map_elites_ge.cpython-38.pyc differ diff --git a/src/QD_MARL/algorithms/individuals.py b/src/QD_MARL/algorithms/individuals.py index 744f82313..cdacad819 100644 --- a/src/QD_MARL/algorithms/individuals.py +++ b/src/QD_MARL/algorithms/individuals.py @@ -7,12 +7,20 @@ import re import os import string +<<<<<<< HEAD from utils.print_outputs import * from copy import deepcopy from .common import OptMetaClass from decisiontrees import Leaf, Condition from operator import gt, lt, add, sub, mul from processing_element import ProcessingElementFactory, PEFMetaClass +======= +from copy import deepcopy +from .common import OptMetaClass +from decisiontrees import Leaf, Condition, OrthogonalCondition +from operator import gt, lt, add, sub, mul +from util_processing_elements.processing_element import ProcessingElementFactory, PEFMetaClass +>>>>>>> aca3e01 (merged from private repo) @@ -37,11 +45,15 @@ def __init__(self, index): self._index = index def get_output(self, input_): +<<<<<<< HEAD if type(input_) == dict: output = list(input_.values())[self._index] else: output = input_[self._index] return output +======= + return input_[self._index] +>>>>>>> aca3e01 (merged from private repo) def __repr__(self): return f"input_[{self._index}]" @@ -104,8 +116,12 @@ def __repr__(self): def __str__(self): return repr(self) +<<<<<<< HEAD +======= + +>>>>>>> aca3e01 (merged from private repo) class GPNodeCondition: """ A condition @@ -163,7 +179,19 @@ def empty_buffers(self): self._then.empty_buffers() self._else.empty_buffers() +<<<<<<< HEAD +======= +class GPNodeOrthogonalCondition(OrthogonalCondition, GPNodeCondition): + def __init__(self, feature_idx, split_value, left=None, right=None): + super().__init__(feature_idx, split_value, left, right) + + def get_output(self, input_): + return super().get_output(input_) + + def create_from_params(self, params): + return OrthogonalCondition(params[0], params[1]) +>>>>>>> aca3e01 (merged from private repo) class GPNodeIf(Condition): def __init__(self, condition, then, else_): self._condition = condition @@ -203,9 +231,12 @@ def get_output(self, input_): def empty_buffers(self): self._then.empty_buffers() self._else.empty_buffers() +<<<<<<< HEAD def type(self): pass +======= +>>>>>>> aca3e01 (merged from private repo) def copy(self): """ @@ -300,9 +331,13 @@ def __init__(self, genes, padding=0, fitness=None, parents=None, const=None, con def copy(self): +<<<<<<< HEAD return IndividualGP(self._genes, self._padding, self._fitness, self._parents,np.copy(self._const),self._const_len) def deep_copy(self): return IndividualGP(deepcopy(self._genes), self._padding, self._fitness, self._parents,deepcopy(self._const),self._const_len) +======= + return IndividualGP(self._genes.copy(), self._padding, self._fitness, self._parents,np.copy(self._const),self._const_len) +>>>>>>> aca3e01 (merged from private repo) def get_genes_const_nested(self,expr,const_temp): fringe = [expr] @@ -362,7 +397,10 @@ def genes_to_const(self): i = self.genes_to_const_nested(cond.get_right(),i) fringe.append(cur.get_then()) fringe.append(cur.get_else()) +<<<<<<< HEAD def get_output(self, _input): return self._genes.get_output(_input) +======= +>>>>>>> aca3e01 (merged from private repo) diff --git a/src/QD_MARL/algorithms/mapElitesCMA.py b/src/QD_MARL/algorithms/mapElitesCMA.py index 55e2379fa..62eabca3b 100644 --- a/src/QD_MARL/algorithms/mapElitesCMA.py +++ b/src/QD_MARL/algorithms/mapElitesCMA.py @@ -8,7 +8,11 @@ from .common import OptMetaClass from decisiontrees import Leaf, Condition from operator import gt, lt, add, sub, mul +<<<<<<< HEAD from processing_element import ProcessingElementFactory, PEFMetaClass +======= +from util_processing_elements.processing_element import ProcessingElementFactory, PEFMetaClass +>>>>>>> aca3e01 (merged from private repo) from ribs.archives._cvt_archive import CVTArchive from ribs.archives._grid_archive import GridArchive from ribs.archives._sliding_boundaries_archive import SlidingBoundariesArchive diff --git a/src/QD_MARL/algorithms/mapElitesCMA_pyRibs.py b/src/QD_MARL/algorithms/mapElitesCMA_pyRibs.py index 43e9ea536..797a450ef 100644 --- a/src/QD_MARL/algorithms/mapElitesCMA_pyRibs.py +++ b/src/QD_MARL/algorithms/mapElitesCMA_pyRibs.py @@ -2,6 +2,7 @@ # -*- coding: utf-8 -*- import abc +<<<<<<< HEAD from tkinter import Grid import numpy as np from copy import deepcopy @@ -26,6 +27,50 @@ class EmitterCMA: def __init__(self,archive,sigma0,padding,selection_rule="filter",restart_rule="no_improvement",weight_rule="truncation",bounds=None,batch_size=None,seed=None): +======= +import os +import time +from copy import deepcopy +from operator import add, gt, lt, mul, sub +from tkinter import Grid + +import matplotlib.pyplot as plt +import numpy as np +from decisiontrees import * +from ribs.archives import AddStatus, GridArchive +from ribs.archives._archive_data_frame import ArchiveDataFrame +from ribs.archives._cvt_archive import CVTArchive +from ribs.archives._sliding_boundaries_archive import SlidingBoundariesArchive +from ribs.emitters.opt import CMAEvolutionStrategy +from ribs.emitters.rankers import ObjectiveRanker +from ribs.visualize import ( + cvt_archive_heatmap, + grid_archive_heatmap, + sliding_boundaries_archive_heatmap, +) +from util_processing_elements.processing_element import ( + PEFMetaClass, + ProcessingElementFactory, +) + +from .common import OptMetaClass +from .individuals import * +from utils.print_outputs import * + +class EmitterCMA: + def __init__( + self, + archive, + sigma0, + padding, + selection_rule="filter", + restart_rule="no_improvement", + weight_rule="truncation", + bounds=None, + batch_size=None, + seed=None, + ): +>>>>>>> aca3e01 (merged from private repo) self._rng = np.random.default_rng(seed) self._id = "".join(np.random.choice([*string.ascii_lowercase], 10)) self._batch_size = batch_size @@ -39,6 +84,7 @@ def __init__(self,archive,sigma0,padding,selection_rule="filter",restart_rule="n if restart_rule not in ["basic", "no_improvement"]: raise ValueError(f"Invalid restart_rule {restart_rule}") self._restart_rule = restart_rule +<<<<<<< HEAD self._bounds = bounds self._solution_dim = padding @@ -56,6 +102,35 @@ def initialize(self): self._restarts = False # Currently not exposed publicly. self._lower_bounds = np.full(self._solution_dim, self._bounds[0], dtype=self._archive.dtype) self._upper_bounds = np.full(self._solution_dim, self._bounds[1], dtype=self._archive.dtype) +======= + self._bounds = bounds + self._solution_dim = padding + self._ranker = ObjectiveRanker(seed=self._opt_seed) + + def initialize(self): + self.x0 = self._archive.sample_elites(1) + self._lower_bounds = np.full( + self._solution_dim, self._bounds[0], dtype=self._archive.dtype + ) + self._upper_bounds = np.full( + self._solution_dim, self._bounds[1], dtype=self._archive.dtype + ) + self.opt = CMAEvolutionStrategy( + sigma0=self._sigma0, + solution_dim=self._solution_dim, + batch_size=self._batch_size, + seed=self._opt_seed, + dtype=self._archive.dtype, + ) + self.x0 = IndividualGP(self.x0['tree'][0], self._solution_dim, fitness=self.x0['objective']) + self.x0.get_genes_const() + self.opt.reset(self.x0._const) + self._num_parents = ( + self.opt.batch_size // 2 if self._selection_rule == "mu" else None + ) + self._batch_size = self.opt.batch_size + self._restarts = False # Currently not exposed publicly. +>>>>>>> aca3e01 (merged from private repo) def sigma0(self): return self._sigma0 @@ -66,9 +141,16 @@ def batch_size(self): def ask(self): if self._restarts: self._restarts = False +<<<<<<< HEAD self.x0 = self._archive.sample_elites(4) self.opt.reset(self.x0._const) evolved = self.opt.ask(self._lower_bounds, self._upper_bounds, self._batch_size) +======= + self.x0 = self._archive.sample_elites(1) + self.x0 = IndividualGP(self.x0['tree'][0], self._solution_dim, fitness=self.x0['objective']) + self.opt.reset(self.x0._const) + evolved = self.opt.ask() +>>>>>>> aca3e01 (merged from private repo) tree_out = [] for i in evolved: temp = self.x0.copy() @@ -83,6 +165,7 @@ def _check_restart(self, num_parents): return False def tell(self, solutions, objective_values, behavior_values, metadata): +<<<<<<< HEAD ranking_data = [] new_sols = 0 for i, (sol, obj, beh, meta) in enumerate( @@ -102,13 +185,44 @@ def tell(self, solutions, objective_values, behavior_values, metadata): # Check for reset. if (self.opt.check_stop(np.array([value for status, value, i in ranking_data])) or self._check_restart(new_sols)): +======= + new_sols = 0 + tree = {'tree': [meta for meta in metadata]} + add_info = self._archive.add(behavior_values, objective_values, behavior_values, **tree) + for i, (beh, obj) in enumerate( + zip(behavior_values, objective_values) + ): + if add_info['status'][i] in (AddStatus.NEW, AddStatus.IMPROVE_EXISTING): + new_sols += 1 + # New solutions sort ahead of improved ones, which sort ahead of ones + # that were not added. + + indices = np.argsort(objective_values) + values = np.array(objective_values)[indices] + + num_parents = ( + new_sols if self._selection_rule == "filter" else self._num_parents + ) + self.opt.tell(indices, values, num_parents) + self._ranker.reset(self.opt, self._archive) + # Check for reset. + if self.opt.check_stop( + np.array(values) + ) or self._check_restart(new_sols): +>>>>>>> aca3e01 (merged from private repo) self._restarts = True return True return False +<<<<<<< HEAD class MapElitesCMA_pyRibs(ProcessingElementFactory, metaclass=OptMetaClass): def __init__(self, **kwargs): +======= + +class MapElitesCMA_pyRibs(ProcessingElementFactory, metaclass=OptMetaClass): + def __init__(self, **kwargs): +>>>>>>> aca3e01 (merged from private repo) """ Initializes the algorithm @@ -128,6 +242,10 @@ def __init__(self, **kwargs): :max_depth: Maximum depth for the trees """ +<<<<<<< HEAD +======= + self._log_path = kwargs["log_path"] +>>>>>>> aca3e01 (merged from private repo) self._seed = kwargs["seed"] self._map_size = kwargs["map_size"] self._map_bound = kwargs["map_bounds"] @@ -150,6 +268,7 @@ def __init__(self, **kwargs): self._generations = kwargs["generations"] self._logdir = kwargs["logdir"] self._sigma0 = kwargs["sigma0"] +<<<<<<< HEAD self._padding = pow(2,self._max_depth)*self._cond_depth self._restart = [False for _ in range(self._num_emitters)] self._solution_dim = kwargs["solution_dim"] @@ -181,6 +300,63 @@ def _random_var(self): def _random_const(self): index = np.random.uniform(self._bounds["float"]["min"], self._bounds["float"]["max"]) return GPConst(index) +======= + self._padding = pow(2, self._max_depth) * self._cond_depth + self._restart = [False for _ in range(self._num_emitters)] + self._solution_dim = kwargs["solution_dim"] + self._extra_fields = {'tree': ((), object)} + if self._archive_type == "CVT": + self._archive = CVTArchive(self._bins, self._map_bound) + elif self._archive_type == "Grid": + self._archive = GridArchive( + solution_dim=self._solution_dim, + dims=self._map_size, + ranges=self._map_bound, + extra_fields=self._extra_fields, + ) + elif self._archive_type == "SlidingBoundaries": + self._archive = SlidingBoundariesArchive( + self._bins_sliding, self._map_bound + ) + else: + raise Exception("archive not valid") + # self._archive.initialize(self._solution_dim) # pyribs v.0.6.3 don't have inizialization method anymore + self._vmin = None + self._vmax = None + self._counter = 1 # number inserted in sol field of the archive + self._gen_number = 1 + self._max_fitness = -250 + self._emitters = [ + EmitterCMA( + self._archive, + self._sigma0, + self._padding, + bounds=( + self._bounds["float"]["min"] * 10, + self._bounds["float"]["max"] * 10, + ), + batch_size=self._batch_pop, + seed=self._seed, + ) + for _ in range(self._num_emitters) + ] + self._id = "".join(np.random.choice([*string.ascii_lowercase], 10)) + self._selection_type = self.set_selection_type(kwargs["selection_type"]) + # sigma la metto come parametro passato, o 1 o 0.5 + # moltiplico per 10 parametri passati a CMA + # le soluzioni venivano scartate e abbiamo deciso di fare così + # noi decidiamo un intervallo x -x, CMA-ES cercherà nell'intervallo più grande e un sigma più piccolo + + def _random_var(self): + index = np.random.randint(0, self._bounds["input_index"]["max"]) + return index + + def _random_const(self): + index = np.random.uniform( + self._bounds["float"]["min"], self._bounds["float"]["max"] + ) + return index +>>>>>>> aca3e01 (merged from private repo) def _random_expr(self, depth=0): if depth < self._cond_depth - 1: @@ -188,6 +364,7 @@ def _random_expr(self, depth=0): else: type_ = np.random.randint(0, 2) +<<<<<<< HEAD if type_ == 0: return self._random_var() elif type_ == 1: @@ -197,10 +374,21 @@ def _random_expr(self, depth=0): r = self._random_expr(depth + 1) op = np.random.choice([add, sub, mul, safediv]) return GPArithNode(op, l, r) +======= + if type_ == 0 or type_ == 1: + params = [self._random_var(), self._random_const()] + return self._c_factory.create(params) + else: + l = self._random_expr(depth + 1) + r = self._random_expr(depth + 1) + params = [self._random_var(), self._random_const()] + return GPNodeOrthogonalCondition(params[0], params[1], l, r) +>>>>>>> aca3e01 (merged from private repo) def _random_condition(self): left = self._random_expr() right = self._random_expr() +<<<<<<< HEAD while isinstance(left, GPConst) and isinstance(right, GPConst): left = self._random_expr() right = self._random_expr() @@ -208,6 +396,10 @@ def _random_condition(self): op = np.random.choice([gt, lt]) return GPNodeIf(GPNodeCondition(op, left, right), None, None) +======= + params = [self._random_var(), self._random_const()] + return GPNodeIf(GPNodeOrthogonalCondition(params[0], params[1],left, right), None, None) +>>>>>>> aca3e01 (merged from private repo) def _random_leaf(self): tp = self._l_factory.get_trainable_parameters() @@ -240,15 +432,28 @@ def _get_depth(self, node): max_ = 0 while len(fringe) > 0: d, n = fringe.pop(0) +<<<<<<< HEAD if isinstance(node, Leaf) or \ isinstance(node, GPNodeCondition) or \ isinstance(node, GPExpr) or \ n is None: +======= + if ( + isinstance(node, Leaf) + or isinstance(node, GPNodeCondition) + or isinstance(node, GPExpr) + or n is None + ): +>>>>>>> aca3e01 (merged from private repo) continue if d > max_: max_ = d +<<<<<<< HEAD +======= + +>>>>>>> aca3e01 (merged from private repo) if not isinstance(n, Leaf): fringe.append((d + 1, n._then)) fringe.append((d + 1, n._else)) @@ -259,14 +464,22 @@ def _reduce_expr_len(self, expr): max_ = 0 while len(fringe) > 0: d, cur = fringe.pop(0) +<<<<<<< HEAD if isinstance(cur, GPArithNode): +======= + if isinstance(cur, GPNodeCondition): +>>>>>>> aca3e01 (merged from private repo) if d + 1 > self._cond_depth: cur.set_left(self._random_expr(d + 1)) cur.set_right(self._random_expr(d + 1)) else: fringe.append((d + 1, cur.get_left())) fringe.append((d + 1, cur.get_right())) +<<<<<<< HEAD #print(d) +======= + # print(d) +>>>>>>> aca3e01 (merged from private repo) return expr def _count_expr_len(self, expr): @@ -275,6 +488,7 @@ def _count_expr_len(self, expr): max_ = 0 while len(fringe) > 0: d, cur = fringe.pop(0) +<<<<<<< HEAD if isinstance(cur, GPArithNode): fringe.append((d + 1, cur.get_left())) fringe.append((d + 1, cur.get_right())) @@ -283,6 +497,15 @@ def _count_expr_len(self, expr): return max_ +======= + if isinstance(cur, GPNodeCondition): + fringe.append((d + 1, cur.get_left())) + fringe.append((d + 1, cur.get_right())) + if d > max_: + max_ = d + return max_ + +>>>>>>> aca3e01 (merged from private repo) def _get_cond_depth(self, root): """BFS search""" @@ -296,7 +519,11 @@ def _get_cond_depth(self, root): cond = cur._condition a = self._count_expr_len(cond.get_left()) b = self._count_expr_len(cond.get_right()) +<<<<<<< HEAD d = max(a,b ) +======= + d = max(a, b) +>>>>>>> aca3e01 (merged from private repo) max_ = max(d, max_) fringe.append(cur.get_then()) fringe.append(cur.get_else()) @@ -339,12 +566,23 @@ def _limit_depth(self, root): def _get_descriptor(self, ind): return self._get_depth(ind), self._get_cond_depth(ind) +<<<<<<< HEAD +======= + +>>>>>>> aca3e01 (merged from private repo) def get_all_pop(self): df = self._archive.as_pandas(include_metadata=True) dict_to_return = dict() for elite in df.iterelites(): +<<<<<<< HEAD dict_to_return[(int(elite[2][0]),int(elite[2][1]))] = (elite[4]._genes,elite[1]) +======= + dict_to_return[(int(elite[2][0]), int(elite[2][1]))] = ( + elite[4]._genes, + elite[1], + ) +>>>>>>> aca3e01 (merged from private repo) return dict_to_return.items() def _init_pop(self): @@ -370,14 +608,21 @@ def _init_pop(self): fringe.append(left) fringe.append(right) +<<<<<<< HEAD # appemding directly the individual without calling the constructor pop.append(root) +======= + pop.append(IndividualGP(root, self._padding)) +>>>>>>> aca3e01 (merged from private repo) self._pop = pop return pop def _mutation(self, p): p1 = p.copy()._genes +<<<<<<< HEAD print(type(p1)) +======= +>>>>>>> aca3e01 (merged from private repo) cp1 = None p1nodes = [(None, None, p1)] @@ -397,14 +642,28 @@ def _mutation(self, p): parent = p1nodes[cp1][0] old_node = p1nodes[cp1][2] +<<<<<<< HEAD if not isinstance(old_node, GPNodeCondition) or \ not isinstance(old_node, GPExpr): +======= + if not isinstance(old_node, GPNodeCondition) or not isinstance( + old_node, GPExpr + ): +>>>>>>> aca3e01 (merged from private repo) new_node = self._get_random_leaf_or_condition() else: new_node = self._random_expr() +<<<<<<< HEAD if not isinstance(new_node, Leaf) and \ not isinstance(new_node, GPExpr): +======= + if ( + not isinstance(new_node, Leaf) + and not isinstance(new_node, GPExpr) + and not isinstance(node, IndividualGP) + ): +>>>>>>> aca3e01 (merged from private repo) if not isinstance(old_node, Leaf): new_node.set_then(old_node.get_left()) new_node.set_else(old_node.get_right()) @@ -421,8 +680,12 @@ def _mutation(self, p): p1 = new_node p1 = self._limit_depth(p1) p1 = self._limit_cond_depth(p1) +<<<<<<< HEAD return IndividualGP(p1,self._padding) +======= + return IndividualGP(p1, self._padding) +>>>>>>> aca3e01 (merged from private repo) def _crossover(self, par1, par2): p1, p2 = par1.copy()._genes, par2.copy()._genes @@ -451,9 +714,17 @@ def _crossover(self, par1, par2): while len(fringe) > 0: node = fringe.pop(0) +<<<<<<< HEAD if not isinstance(node, Leaf) and \ not isinstance(node, GPVar) and \ not isinstance(node, GPConst): +======= + if ( + not isinstance(node, Leaf) + and not isinstance(node, GPVar) + and not isinstance(node, GPConst) + ): +>>>>>>> aca3e01 (merged from private repo) fringe.append(node.get_left()) fringe.append(node.get_right()) @@ -481,6 +752,7 @@ def _crossover(self, par1, par2): p2nodes[cp2][0].set_else(st1) else: p2 = st1 +<<<<<<< HEAD return IndividualGP(p1,self._padding), IndividualGP(p2,self._padding) def ask(self): start = time.time() @@ -499,6 +771,50 @@ def ask(self): for _ in range(self._batch_pop) ] for i in range(0, len(pop_temp), 2): +======= + return IndividualGP(p1, self._padding), IndividualGP(p2, self._padding) + + def set_selection_type(self, selection_type="random"): + return selection_type + + def set_pop_selection(self, coach_index=None): + selected_pop = [] + if self._selection_type == "random": + for _ in range(self._batch_pop): + elites = self._archive.sample_elites(1) + selected_pop += [IndividualGP(elites["tree"][0])] + elif self._selection_type == "best": + data = self._archive.data() + objective = np.array(data["objective"]) + rank = np.argsort(objective) + for i in range(self._batch_pop): + elite_tree = data["tree"][rank[i]] + selected_pop += [IndividualGP(genes=elite_tree)] + elif self._selection_type == "coach": + if coach_index is None or len(coach_index) != self._batch_pop: + raise Exception("coach index not valid") + for ind in coach_index: + elites = self._archive.retrieve_single(ind) + selected_pop += [IndividualGP(genes= elites[1]['tree'])] + else: + raise Exception("selection type not valid") + return selected_pop + + def ask(self, coach_index=None): + start = time.time() + ask_pop = [] + if self._archive.empty: + ask_pop = self._init_pop() + self._pop = ask_pop + else: + for i, (e) in enumerate(self._emitters): + if not self._restart[i]: + ask_pop = e.ask() + else: + temp = list() + pop_temp = self.set_pop_selection(coach_index) + for i in range(0, len(pop_temp)): +>>>>>>> aca3e01 (merged from private repo) p1 = pop_temp[i] if i + 1 < len(pop_temp): p2 = pop_temp[i + 1] @@ -508,7 +824,11 @@ def ask(self): o1, o2 = None, None # Crossover if p2 is not None: +<<<<<<< HEAD if np.random.uniform() < self._cx_prob: +======= + if np.random.uniform() < self._cx_prob: +>>>>>>> aca3e01 (merged from private repo) o1, o2 = self._crossover(p1, p2) temp.append(o1) temp.append(o2) @@ -520,6 +840,7 @@ def ask(self): pop_temp = [self._mutation(p) for p in temp] for e in pop_temp: e.get_genes_const() +<<<<<<< HEAD self._pop += pop_temp end = time.time() @@ -535,12 +856,46 @@ def tell(self,fitnesses, data=None): thr = [abs((max(self._map_bound[i]) - min(self._map_bound[i])) / self._map_size[i]) for i in range(len(self._map_size))] desc = [int((desc[i] - min(self._map_bound[i])) / thr[i]) for i in range(len(self._map_size))] +======= + ask_pop = pop_temp + end = time.time() + self._pop += ask_pop + descs = [self._get_descriptor(p._genes) for p in ask_pop] + return [p._genes for p in ask_pop] + + def tell(self, fitnesses, data=None): + + archive_flag = self._archive.empty + sols, objs, behavs, meta = [], [], [], [] + if data is None: + data = [None for _ in range(len(fitnesses))] + for p in zip(self._pop, fitnesses, data): + if p[2] is None: + tree = p[0]._genes + else: + tree = p[2].get_root() + desc = self._get_descriptor(p[0]._genes) + + p[0]._fitness = p[1] + thr = [ + abs( + (max(self._map_bound[i]) - min(self._map_bound[i])) + / self._map_size[i] + ) + for i in range(len(self._map_size)) + ] + desc = [ + int((desc[i] - min(self._map_bound[i])) / thr[i]) + for i in range(len(self._map_size)) + ] +>>>>>>> aca3e01 (merged from private repo) for i in range(len(self._map_size)): if desc[i] < 0: desc[i] = 0 elif desc[i] >= self._map_size[i]: desc[i] = self._map_size[i] - 1 desc = tuple(desc) +<<<<<<< HEAD if archive_flag: self._archive.add_single(desc, p[1], desc, self._counter) @@ -584,3 +939,69 @@ def tell(self,fitnesses, data=None): plt.xlabel("Depth") plt.savefig(self._logdir+"/heatmap.png") self._gen_number += 1 +======= + if archive_flag: + tree = {'tree': tree} + status, value = self._archive.add_single(desc, p[0]._fitness, desc, **tree) + else: + sols.append(self._counter) + objs.append(p[1]) + behavs.append(desc) + meta.append(tree) + self._counter += 1 + self._pop = [] + + for i, (e) in enumerate(self._emitters): + if archive_flag: + e.initialize() + else: + start = i*self._batch_pop % len(sols) + end = start + self._batch_pop + + if self._restart[i]: + tree = {'tree': meta[start:end]} + self._archive.add(behavs[start:end], objs[start:end], behavs[start:end], **tree) #TODO: set it as line 639 + self._restart[i] = False + else: + self._restart[i] = e.tell( + behavs[start:end], objs[start:end], behavs[start:end], meta[start:end] + ) + + # Visualize archives + if max(fitnesses) > self._max_fitness: + self._max_fitness = max(fitnesses) + print( + "New best at generation: ", + self._gen_number - 1, + " fitness: ", + max(fitnesses), + ) + self._gen_number += 1 + + def plot_archive(self, gen, vmin=None, vmax=None): + if vmin is not None or vmax is not None: + self._vmin = vmin + self._vmax = vmax + plt.figure(figsize=(8, 6)) + if self._archive_type == "CVT": + cvt_archive_heatmap(self._archive, vmin=self._vmin, vmax=self._vmax) + elif self._archive_type == "Grid": + grid_archive_heatmap(self._archive, vmin=self._vmin, vmax=self._vmin) + elif self._archive_type == "SlidingBoundaries": + sliding_boundaries_archive_heatmap(self._archive, vmin=self._vmin, vmax=self._vmax) + else: + raise Exception("archive not valid") + if self._log_path is not None: + plt.ylabel("Condition Depth") + plt.xlabel("Depth") + plt.title( + "Map Elites CMA Archive Depth at Generation: " + str(gen) + ) + os.makedirs(os.path.join(self._log_path, "archives_depth"), exist_ok=True) + saving_path = os.path.join( + self._log_path, + "archives_depth/archive_depth_at_gen_" + str(gen) + ".png", + ) + plt.savefig(saving_path) + plt.close() +>>>>>>> aca3e01 (merged from private repo) diff --git a/src/QD_MARL/algorithms/mapElitesCMA_pyRibs_GE.py b/src/QD_MARL/algorithms/mapElitesCMA_pyRibs_GE.py index 3b5c69d10..a524acd76 100644 --- a/src/QD_MARL/algorithms/mapElitesCMA_pyRibs_GE.py +++ b/src/QD_MARL/algorithms/mapElitesCMA_pyRibs_GE.py @@ -8,7 +8,11 @@ from .common import OptMetaClass from decisiontrees import Leaf, Condition from operator import gt, lt, add, sub, mul +<<<<<<< HEAD from processing_element import ProcessingElementFactory, PEFMetaClass +======= +from util_processing_elements.processing_element import ProcessingElementFactory, PEFMetaClass +>>>>>>> aca3e01 (merged from private repo) from ribs.archives._cvt_archive import CVTArchive from ribs.archives._grid_archive import GridArchive from ribs.archives._sliding_boundaries_archive import SlidingBoundariesArchive @@ -166,6 +170,10 @@ def __init__(self, **kwargs): ] self._id = "".join(np.random.choice([*string.ascii_lowercase], 10)) self._ge = GrammaticalEvolution(**kwargs["ge_kwargs"]) +<<<<<<< HEAD +======= + self._mutation = UniformMutator(0.1, self._map_bound) +>>>>>>> aca3e01 (merged from private repo) #sigma la metto come parametro passato, o 1 o 0.5 #moltiplico per 10 parametri passati a CMA #le soluzioni venivano scartate e abbiamo deciso di fare così @@ -263,7 +271,10 @@ def tell(self,fitnesses, data=None): for i, (e) in enumerate (self._emitters): if archive_flag: +<<<<<<< HEAD print("INITIALIZING EMITTER",e._id) +======= +>>>>>>> aca3e01 (merged from private repo) e.initialize() else: start = i*self._batch_pop diff --git a/src/QD_MARL/algorithms/map_elites.py b/src/QD_MARL/algorithms/map_elites.py index 780c7cb8c..8fbfe7778 100644 --- a/src/QD_MARL/algorithms/map_elites.py +++ b/src/QD_MARL/algorithms/map_elites.py @@ -15,7 +15,14 @@ from .common import OptMetaClass from decisiontrees import Leaf, Condition from operator import gt, lt, add, sub, mul +<<<<<<< HEAD from processing_element import ProcessingElementFactory, PEFMetaClass +======= +from util_processing_elements.processing_element import ( + ProcessingElementFactory, + PEFMetaClass, +) +>>>>>>> aca3e01 (merged from private repo) from utils.print_outputs import * @@ -232,7 +239,10 @@ def set_right(self, value): class MapElites(ProcessingElementFactory, metaclass=OptMetaClass): def __init__(self, **kwargs): +<<<<<<< HEAD +======= +>>>>>>> aca3e01 (merged from private repo) """ Initializes the algorithm @@ -274,7 +284,13 @@ def _random_var(self): return GPVar(index) def _random_const(self): +<<<<<<< HEAD index = np.random.uniform(self._bounds["float"]["min"], self._bounds["float"]["max"]) +======= + index = np.random.uniform( + self._bounds["float"]["min"], self._bounds["float"]["max"] + ) +>>>>>>> aca3e01 (merged from private repo) return GPConst(index) def _random_expr(self, depth=0): @@ -335,10 +351,19 @@ def _get_depth(self, node): max_ = 0 while len(fringe) > 0: d, n = fringe.pop(0) +<<<<<<< HEAD if isinstance(node, Leaf) or \ isinstance(node, GPNodeCondition) or \ isinstance(node, GPExpr) or \ n is None: +======= + if ( + isinstance(node, Leaf) + or isinstance(node, GPNodeCondition) + or isinstance(node, GPExpr) + or n is None + ): +>>>>>>> aca3e01 (merged from private repo) continue if d > max_: @@ -362,7 +387,11 @@ def _reduce_expr_len(self, expr): else: fringe.append((d + 1, cur.get_left())) fringe.append((d + 1, cur.get_right())) +<<<<<<< HEAD #print(d) +======= + # print(d) +>>>>>>> aca3e01 (merged from private repo) return expr def _count_expr_len(self, expr): @@ -375,10 +404,16 @@ def _count_expr_len(self, expr): fringe.append((d + 1, cur.get_left())) fringe.append((d + 1, cur.get_right())) if d > max_: +<<<<<<< HEAD max_=d return max_ +======= + max_ = d + return max_ + +>>>>>>> aca3e01 (merged from private repo) def _get_cond_depth(self, root): """BFS search""" @@ -392,7 +427,11 @@ def _get_cond_depth(self, root): cond = cur._condition a = self._count_expr_len(cond.get_left()) b = self._count_expr_len(cond.get_right()) +<<<<<<< HEAD d = max(a,b ) +======= + d = max(a, b) +>>>>>>> aca3e01 (merged from private repo) max_ = max(d, max_) fringe.append(cur.get_then()) fringe.append(cur.get_else()) @@ -491,14 +530,24 @@ def _mutation(self, p): parent = p1nodes[cp1][0] old_node = p1nodes[cp1][2] +<<<<<<< HEAD if not isinstance(old_node, GPNodeCondition) or \ not isinstance(old_node, GPExpr): +======= + if not isinstance(old_node, GPNodeCondition) or not isinstance( + old_node, GPExpr + ): +>>>>>>> aca3e01 (merged from private repo) new_node = self._get_random_leaf_or_condition() else: new_node = self._random_expr() +<<<<<<< HEAD if not isinstance(new_node, Leaf) and \ not isinstance(new_node, GPExpr): +======= + if not isinstance(new_node, Leaf) and not isinstance(new_node, GPExpr): +>>>>>>> aca3e01 (merged from private repo) if not isinstance(old_node, Leaf): new_node.set_then(old_node.get_left()) new_node.set_else(old_node.get_right()) @@ -517,7 +566,10 @@ def _mutation(self, p): p1 = self._limit_cond_depth(p1) return p1 +<<<<<<< HEAD +======= +>>>>>>> aca3e01 (merged from private repo) def _crossover(self, par1, par2): p1, p2 = par1.copy(), par2.copy() cp1 = None @@ -545,9 +597,17 @@ def _crossover(self, par1, par2): while len(fringe) > 0: node = fringe.pop(0) +<<<<<<< HEAD if not isinstance(node, Leaf) and \ not isinstance(node, GPVar) and \ not isinstance(node, GPConst): +======= + if ( + not isinstance(node, Leaf) + and not isinstance(node, GPVar) + and not isinstance(node, GPConst) + ): +>>>>>>> aca3e01 (merged from private repo) fringe.append(node.get_left()) fringe.append(node.get_right()) @@ -577,6 +637,7 @@ def _crossover(self, par1, par2): p2 = st1 return p1, p2 +<<<<<<< HEAD def _add_to_map(self, ind, fitness, data=None): @@ -586,6 +647,20 @@ def _add_to_map(self, ind, fitness, data=None): print_info(desc) print_info(thr) desc = [int(desc[i] - min(self._map_bound[i]) / thr[i]) for i in range(len(self._map_size))] +======= + def _add_to_map(self, ind, fitness, data=None): + desc = self._get_descriptor(ind) + thr = [ + abs(max(self._map_bound[i]) - min(self._map_bound[i])) / self._map_size[i] + for i in range(len(self._map_size)) + ] + print_info(desc) + print_info(thr) + desc = [ + int(desc[i] - min(self._map_bound[i]) / thr[i]) + for i in range(len(self._map_size)) + ] +>>>>>>> aca3e01 (merged from private repo) print_info(desc) print("-----------------") for i in range(len(self._map_size)): @@ -623,7 +698,11 @@ def ask(self): # Crossover if p2 is not None: +<<<<<<< HEAD if np.random.uniform() < self._cx_prob: +======= + if np.random.uniform() < self._cx_prob: +>>>>>>> aca3e01 (merged from private repo) o1, o2 = self._crossover(p1, p2) temp.append(o1) temp.append(o2) @@ -645,3 +724,12 @@ def tell(self, fitnesses, data=None): else: for p in zip(self._pop, fitnesses, data): self._add_to_map(p[0], p[1], p[2]) +<<<<<<< HEAD +======= + + def plot_archive(self): + """ + Plots the archive + """ + pass +>>>>>>> aca3e01 (merged from private repo) diff --git a/src/QD_MARL/algorithms/map_elites_Pyribs.py b/src/QD_MARL/algorithms/map_elites_Pyribs.py index f4e389fb4..22df3f72f 100644 --- a/src/QD_MARL/algorithms/map_elites_Pyribs.py +++ b/src/QD_MARL/algorithms/map_elites_Pyribs.py @@ -3,6 +3,7 @@ import abc from cmath import inf +<<<<<<< HEAD from tkinter import Grid import numpy as np from copy import deepcopy @@ -26,6 +27,35 @@ class MapElites_Pyribs(ProcessingElementFactory, metaclass=OptMetaClass): def __init__(self, **kwargs): +======= +from copy import deepcopy +from operator import add, gt, lt, mul, sub +from tkinter import Grid + +import matplotlib.pyplot as plt +import numpy as np +from decisiontrees import Node, Leaf +from ribs.archives._archive_data_frame import ArchiveDataFrame +from ribs.archives._cvt_archive import CVTArchive +from ribs.archives._grid_archive import GridArchive +from ribs.archives._sliding_boundaries_archive import SlidingBoundariesArchive +from ribs.visualize import ( + cvt_archive_heatmap, + grid_archive_heatmap, + sliding_boundaries_archive_heatmap, +) +from util_processing_elements.processing_element import ( + PEFMetaClass, + ProcessingElementFactory, +) + +from .common import OptMetaClass +from .individuals import * +from utils.print_outputs import print_info, print_debugging + +class MapElites_Pyribs(ProcessingElementFactory, metaclass=OptMetaClass): + def __init__(self, **kwargs): +>>>>>>> aca3e01 (merged from private repo) """ Initializes the algorithm @@ -45,6 +75,10 @@ def __init__(self, **kwargs): :max_depth: Maximum depth for the trees """ +<<<<<<< HEAD +======= + self._log_path = kwargs["log_path"] +>>>>>>> aca3e01 (merged from private repo) self._map_size = kwargs["map_size"] self._map_bound = kwargs["map_bounds"] self._cx_prob = kwargs["cx_prob"] if "cx_prob" in kwargs else 0 @@ -64,6 +98,7 @@ def __init__(self, **kwargs): self._bins = kwargs["bins"] self._bins_sliding = kwargs["sliding_bins"] self._solution_dim = kwargs["solution_dim"] +<<<<<<< HEAD if self._archive_type == "CVT": self._archive = CVTArchive(self._bins,self._map_bound) elif self._archive_type == "Grid": @@ -83,6 +118,43 @@ def _random_var(self): def _random_const(self): index = np.random.uniform(self._bounds["float"]["min"], self._bounds["float"]["max"]) return GPConst(index) +======= + self._extra_fields = {'tree': ((), object)} + + if self._archive_type == "CVT": + self._archive = CVTArchive(self._bins, self._map_bound) + elif self._archive_type == "Grid": + self._archive = GridArchive( + solution_dim=self._solution_dim, + dims=self._map_size, + ranges=self._map_bound, + extra_fields=self._extra_fields, + ) + elif self._archive_type == "SlidingBoundaries": + self._archive = SlidingBoundariesArchive( + self._bins_sliding, self._map_bound + ) + else: + raise Exception("archive not valid") + + # self._archive.initialize(1) # one dimension (counter) + self._vmin = None + self._vmax = None + self._counter = 1 # number inserted in sol field of the archive + self._gen_number = 1 + self._max_fitness = -inf + self._selection_type = self.set_selection_type(kwargs["selection_type"]) + + def _random_var(self): + index = np.random.randint(0, self._bounds["input_index"]["max"]) + return index + + def _random_const(self): + index = np.random.uniform( + self._bounds["float"]["min"], self._bounds["float"]["max"] + ) + return index +>>>>>>> aca3e01 (merged from private repo) def _random_expr(self, depth=0): if depth < self._cond_depth - 1: @@ -90,6 +162,7 @@ def _random_expr(self, depth=0): else: type_ = np.random.randint(0, 2) +<<<<<<< HEAD if type_ == 0: return self._random_var() elif type_ == 1: @@ -99,10 +172,21 @@ def _random_expr(self, depth=0): r = self._random_expr(depth + 1) op = np.random.choice([add, sub, mul, safediv]) return GPArithNode(op, l, r) +======= + if type_ == 0 or type_ == 1: + params = [self._random_var(), self._random_const()] + return self._c_factory.create(params) + else: + l = self._random_expr(depth + 1) + r = self._random_expr(depth + 1) + params = [self._random_var(), self._random_const()] + return GPNodeOrthogonalCondition(params[0], params[1], l, r) +>>>>>>> aca3e01 (merged from private repo) def _random_condition(self): left = self._random_expr() right = self._random_expr() +<<<<<<< HEAD while isinstance(left, GPConst) and isinstance(right, GPConst): left = self._random_expr() right = self._random_expr() @@ -110,6 +194,11 @@ def _random_condition(self): op = np.random.choice([gt, lt]) return GPNodeIf(GPNodeCondition(op, left, right), None, None) +======= + params = [self._random_var(), self._random_const()] + return GPNodeIf(GPNodeOrthogonalCondition(params[0], params[1],left, right), None, None) + +>>>>>>> aca3e01 (merged from private repo) def _random_leaf(self): tp = self._l_factory.get_trainable_parameters() @@ -142,10 +231,19 @@ def _get_depth(self, node): max_ = 0 while len(fringe) > 0: d, n = fringe.pop(0) +<<<<<<< HEAD if isinstance(node, Leaf) or \ isinstance(node, GPNodeCondition) or \ isinstance(node, GPExpr) or \ n is None: +======= + if ( + isinstance(node, Leaf) + or isinstance(node, GPNodeCondition) + or isinstance(node, GPExpr) + or n is None + ): +>>>>>>> aca3e01 (merged from private repo) continue if d > max_: @@ -162,14 +260,22 @@ def _reduce_expr_len(self, expr): max_ = 0 while len(fringe) > 0: d, cur = fringe.pop(0) +<<<<<<< HEAD if isinstance(cur, GPArithNode): +======= + if isinstance(cur, GPNodeCondition): +>>>>>>> aca3e01 (merged from private repo) if d + 1 > self._cond_depth: cur.set_left(self._random_expr(d + 1)) cur.set_right(self._random_expr(d + 1)) else: fringe.append((d + 1, cur.get_left())) fringe.append((d + 1, cur.get_right())) +<<<<<<< HEAD #print(d) +======= + # print(d) +>>>>>>> aca3e01 (merged from private repo) return expr def _count_expr_len(self, expr): @@ -178,6 +284,7 @@ def _count_expr_len(self, expr): max_ = 0 while len(fringe) > 0: d, cur = fringe.pop(0) +<<<<<<< HEAD if isinstance(cur, GPArithNode): fringe.append((d + 1, cur.get_left())) fringe.append((d + 1, cur.get_right())) @@ -186,6 +293,15 @@ def _count_expr_len(self, expr): return max_ +======= + if isinstance(cur, GPNodeCondition): + fringe.append((d + 1, cur.get_left())) + fringe.append((d + 1, cur.get_right())) + if d > max_: + max_ = d + return max_ + +>>>>>>> aca3e01 (merged from private repo) def _get_cond_depth(self, root): """BFS search""" @@ -199,7 +315,11 @@ def _get_cond_depth(self, root): cond = cur._condition a = self._count_expr_len(cond.get_left()) b = self._count_expr_len(cond.get_right()) +<<<<<<< HEAD d = max(a,b ) +======= + d = max(a, b) +>>>>>>> aca3e01 (merged from private repo) max_ = max(d, max_) fringe.append(cur.get_then()) fringe.append(cur.get_else()) @@ -242,12 +362,23 @@ def _limit_depth(self, root): def _get_descriptor(self, ind): return self._get_depth(ind), self._get_cond_depth(ind) +<<<<<<< HEAD +======= + +>>>>>>> aca3e01 (merged from private repo) def get_all_pop(self): df = self._archive.as_pandas(include_metadata=True) dict_to_return = dict() for elite in df.iterelites(): +<<<<<<< HEAD dict_to_return[(int(elite[2][0]),int(elite[2][1]))] = (elite[4]._genes,elite[1]) +======= + dict_to_return[(int(elite[2][0]), int(elite[2][1]))] = ( + elite[4]._genes, + elite[1], + ) +>>>>>>> aca3e01 (merged from private repo) return dict_to_return.items() def _init_pop(self): @@ -276,6 +407,7 @@ def _init_pop(self): fringe.append(left) fringe.append(right) +<<<<<<< HEAD pop.append(IndividualGP(root)) self._pop = pop @@ -284,13 +416,28 @@ def _init_pop(self): def _mutation(self, p): p1 = p.deep_copy()._genes #print(type(p1)) +======= + pop.append(IndividualGP(root)) + + return pop + + def _mutation(self, p): + p1 = p.copy()._genes +>>>>>>> aca3e01 (merged from private repo) cp1 = None p1nodes = [(None, None, p1)] +<<<<<<< HEAD fringe = [IndividualGP(p1)] while len(fringe) > 0: node = fringe.pop(0) +======= + fringe = [p1] + while len(fringe) > 0: + node = fringe.pop(0) + +>>>>>>> aca3e01 (merged from private repo) if not isinstance(node, Leaf) and not isinstance(node, IndividualGP): fringe.append(node.get_left()) fringe.append(node.get_right()) @@ -300,19 +447,37 @@ def _mutation(self, p): cp1 = np.random.randint(0, len(p1nodes)) +<<<<<<< HEAD parent = IndividualGP(p1nodes[cp1][0]) old_node = IndividualGP(p1nodes[cp1][2]) if not isinstance(old_node, GPNodeCondition) or \ not isinstance(old_node, GPExpr): +======= + parent = p1nodes[cp1][0] + old_node = p1nodes[cp1][2] + if not isinstance(old_node, GPNodeCondition) or not isinstance( + old_node, GPExpr + ): +>>>>>>> aca3e01 (merged from private repo) new_node = self._get_random_leaf_or_condition() else: new_node = self._random_expr() +<<<<<<< HEAD if not isinstance(new_node, Leaf) and \ not isinstance(new_node, GPExpr) and \ not isinstance(new_node, IndividualGP): if not isinstance(old_node, Leaf) and \ not isinstance(old_node, IndividualGP): +======= + if ( + not isinstance(new_node, Leaf) + and not isinstance(new_node, GPExpr) + ): + if not isinstance(old_node, Leaf) and not isinstance( + old_node, IndividualGP + ): +>>>>>>> aca3e01 (merged from private repo) new_node.set_then(old_node.get_left()) new_node.set_else(old_node.get_right()) else: @@ -330,6 +495,7 @@ def _mutation(self, p): p1 = self._limit_cond_depth(p1) return IndividualGP(p1) +<<<<<<< HEAD def _crossover(self, par1, par2): p1, p2 = par1.copy()._genes, par2.copy()._genes @@ -338,11 +504,23 @@ def _crossover(self, par1, par2): p1nodes = [(None, None, p1)] +======= + def _crossover(self, par1, par2): + p1, p2 = par1.copy()._genes, par2.copy()._genes + + cp1 = None + cp2 = None + p1nodes = [(None, None, p1)] +>>>>>>> aca3e01 (merged from private repo) fringe = [p1] while len(fringe) > 0: node = fringe.pop(0) +<<<<<<< HEAD if not isinstance(node, Leaf) and not isinstance(node, IndividualGP) and not isinstance(node, EliteBatch): +======= + if not isinstance(node, Leaf): +>>>>>>> aca3e01 (merged from private repo) fringe.append(node.get_left()) fringe.append(node.get_right()) @@ -357,11 +535,20 @@ def _crossover(self, par1, par2): fringe = [p2] while len(fringe) > 0: node = fringe.pop(0) +<<<<<<< HEAD if not isinstance(node, Leaf) and \ not isinstance(node, GPVar) and \ not isinstance(node, GPConst) and \ not isinstance(node, IndividualGP) and \ not isinstance(node, EliteBatch): +======= + + if ( + not isinstance(node, Leaf) + and not isinstance(node, GPVar) + and not isinstance(node, GPConst) + ): +>>>>>>> aca3e01 (merged from private repo) fringe.append(node.get_left()) fringe.append(node.get_right()) @@ -389,6 +576,7 @@ def _crossover(self, par1, par2): p2nodes[cp2][0].set_else(st1) else: p2 = st1 +<<<<<<< HEAD return IndividualGP(p1), IndividualGP(p2) @@ -428,6 +616,55 @@ def ask(self, random = False, best = False): # Crossover if p2 is not None: if np.random.uniform() < self._cx_prob: +======= + return IndividualGP(p1), IndividualGP(p2) + + def set_selection_type(self, selection_type="random"): + return selection_type + + def set_pop_selection(self, coach_index=None): + selected_pop = [] + if self._selection_type == "random": + for _ in range(self._batch_pop): + elites = self._archive.sample_elites(1) + selected_pop += [IndividualGP(elites["tree"][0])] + elif self._selection_type == "best": + data = self._archive.data() + objective = np.array(data["objective"]) + rank = np.argsort(objective) + for i in range(self._batch_pop): + elite_tree = data["tree"][rank[i]] + selected_pop += [IndividualGP(genes=elite_tree)] + elif self._selection_type == "coach": + if coach_index is None or len(coach_index) != self._batch_pop: + raise Exception("coach index not valid") + for ind in coach_index: + elites = self._archive.retrieve_single(ind) + selected_pop += [IndividualGP(genes= elites[1]['tree'])] + else: + raise Exception("selection type not valid") + return selected_pop + + def ask(self, coach_index=None): + + ask_pop = [] + if self._archive.empty: + ask_pop = self._init_pop() + self._pop = ask_pop + else: + temp = list() + ask_pop = self.set_pop_selection(coach_index) + for i in range(0, len(ask_pop), 2): + p1 = ask_pop[i] + if i + 1 < len(ask_pop): + p2 = ask_pop[i + 1] + else: + p2 = None + o1, o2 = None, None + # Crossover + if p2 is not None: + if np.random.uniform() < self._cx_prob: +>>>>>>> aca3e01 (merged from private repo) o1, o2 = self._crossover(p1, p2) temp.append(o1) temp.append(o2) @@ -436,6 +673,7 @@ def ask(self, random = False, best = False): temp.append(p2) else: temp.append(p1) +<<<<<<< HEAD self._pop = [self._mutation(p) for p in temp] return [p._genes for p in self._pop] @@ -447,12 +685,41 @@ def tell(self,fitnesses,data=None): thr = [abs((max(self._map_bound[i]) - min(self._map_bound[i])) / self._map_size[i]) for i in range(len(self._map_size))] desc = [int((desc[i] - min(self._map_bound[i])) / thr[i]) for i in range(len(self._map_size))] +======= + + ask_pop = [self._mutation(p) for p in temp] + self._pop += ask_pop + return [p._genes for p in ask_pop] + + def tell(self, fitnesses, data=None): + if data is None: + data = [None for _ in range(len(fitnesses))] + for p in zip(self._pop, fitnesses, data): + if p[2] is None: + tree = p[0]._genes + else: + tree = p[2].get_root() + desc = self._get_descriptor(p[0]._genes) + p[0]._fitness = p[1] + thr = [ + abs( + (max(self._map_bound[i]) - min(self._map_bound[i])) + / self._map_size[i] + ) + for i in range(len(self._map_size)) + ] + desc = [ + int((desc[i] - min(self._map_bound[i])) / thr[i]) + for i in range(len(self._map_size)) + ] +>>>>>>> aca3e01 (merged from private repo) for i in range(len(self._map_size)): if desc[i] < 0: desc[i] = 0 elif desc[i] >= self._map_size[i]: desc[i] = self._map_size[i] - 1 desc = tuple(desc) +<<<<<<< HEAD status, value = self._archive.add_single(desc, p[1], desc, self._counter) #print(status, value) self._counter += 1 @@ -476,4 +743,49 @@ def tell(self,fitnesses,data=None): plt.show() self._gen_number += 1 - \ No newline at end of file + +======= + tree = {'tree': tree} + status, value = self._archive.add_single(desc, p[0]._fitness, desc, **tree) + self._counter += 1 + self._pop = [] + + # Visualize archives + if max(fitnesses) > self._max_fitness: + self._max_fitness = max(fitnesses) + print( + "New best at generation: ", + self._gen_number - 1, + " fitness: ", + max(fitnesses), + ) + self._gen_number += 1 + + + def plot_archive(self, gen, vmin=None, vmax=None): + if vmin is not None or vmax is not None: + self._vmin = vmin + self._vmax = vmax + plt.figure(figsize=(8, 6)) + if self._archive_type == "CVT": + cvt_archive_heatmap(self._archive, vmin=self._vmin, vmax=self._vmax) + elif self._archive_type == "Grid": + grid_archive_heatmap(self._archive, vmin=self._vmin, vmax=self._vmin) + elif self._archive_type == "SlidingBoundaries": + sliding_boundaries_archive_heatmap(self._archive, vmin=self._vmin, vmax=self._vmax) + else: + raise Exception("archive not valid") + if self._log_path is not None: + plt.ylabel("Condition Depth") + plt.xlabel("Depth") + plt.title( + "Map Elites Archive Depth at Generation: " + str(gen) + ) + os.makedirs(os.path.join(self._log_path, "archives_depth"), exist_ok=True) + saving_path = os.path.join( + self._log_path, + "archives_depth/archive_depth_at_gen_" + str(gen) + ".png", + ) + plt.savefig(saving_path) + plt.close() +>>>>>>> aca3e01 (merged from private repo) diff --git a/src/QD_MARL/algorithms/map_elites_ge.py b/src/QD_MARL/algorithms/map_elites_ge.py index 6ca018e9c..fe36c8487 100644 --- a/src/QD_MARL/algorithms/map_elites_ge.py +++ b/src/QD_MARL/algorithms/map_elites_ge.py @@ -13,7 +13,11 @@ from typing import List from abc import abstractmethod from .common import OptMetaClass +<<<<<<< HEAD from processing_element import ProcessingElementFactory, PEFMetaClass +======= +from util_processing_elements.processing_element import ProcessingElementFactory, PEFMetaClass +>>>>>>> aca3e01 (merged from private repo) import utils from decisiontrees import * diff --git a/src/QD_MARL/configs/battlefield_template.json b/src/QD_MARL/configs/battlefield_template.json new file mode 100644 index 000000000..ab11ae68c --- /dev/null +++ b/src/QD_MARL/configs/battlefield_template.json @@ -0,0 +1,117 @@ + +{ "files_names": ["battlefield_hpc_pyribs_random.json", + "battlefield_hpc_pyribs_best.json", + "battlefield_hpc_pyribs_coach.json", + "battlefield_hpc_pyribsCMA_random.json", + "battlefield_hpc_pyribsCMA_best.json", + "battlefield_hpc_pyribsCMA_coach.json"], + "template":{ + "hpc": true, + "environment": { + "map_size": 80, + "minimap_mode": true, + "step_reward": -0.005, + "dead_penalty": -0.1, + "attack_penalty": -0.1, + "attack_opponent_reward": 0.9, + "max_cycles": 500, + "extra_features": false, + "render_mode": null + }, + "training": { + "gamma": 0.9, + "episodes": 400, + "jobs": 12, + "generations": 60 + }, + "statistics": { + "agent": { + "type": "quantile", + "params": {"q": 0.7, "method": "midpoint"} + }, + "set": { + "type": "mean", + "params": {} + } + }, + "sets": "random", + "team_to_optimize": "blue", + "observation": "34_new", + "manual_policy": null, + + "n_agents": 12, + "n_sets": 1, + + "me_config":{ + "me":{ + "name": "map_elites.MapElites", + "__name_2__": "MapElites_Pyribs", + "__name_3__": "MapElitesCMA_pyRibs", + "kwargs": { + "me_type": "MapElitesCMA_pyRibs", + "selection_type": "random", + "seed": 5, + "map_size": [5, 5], + "cx_prob": 0.4, + "init_pop_size": 120, + "map_bounds": [[0, 5], [0, 5]], + "batch_pop": 12, + "maximize": "True", + "restart_rule": 1, + "archive": "Grid", + "solution_dim": 2, + "bins": 50, + "sliding_bins": [6, 4], + "emitters": 2, + "sigma0": 1, + "coach": { + "name": "Coach Battlefield v5", + "seed": 5, + "algorithm": "EvolutionaryComputation" + }, + "bounds": { + "float": { + "type": "float", + "min": 0.1, + "max": 1.0 + }, + "input_index": { + "type": "int", + "min": 0, + "max": 34 + }, + "action": { + "type": "int", + "min": 0, + "max": 21 + } + }, + "max_depth": 5, + "cond_depth": 5, + "generations": 10, + "logdir": "logs/" + } + }, + "DecisionTree": { + "gamma": 0.9 + }, + "ConditionFactory": { + "type": "orthogonal", + "n_inputs": 34 + }, + "QLearningLeafFactory":{ + "kwargs": { + "leaf_params": { + "n_actions": 21, + "learning_rate": null + }, + "decorators": [ + ["RandomInit", {"low": -1, "high": 1}], + ["EpsilonGreedy", {"epsilon": 1, "decay": 0.99, "min_epsilon": 0.05}], + ["NoBuffers", {}] + ] + } + } + } + } +} diff --git a/src/QD_MARL/configs/hpc/battlefield_hpc_pyribsCMA_best.json b/src/QD_MARL/configs/hpc/battlefield_hpc_pyribsCMA_best.json new file mode 100644 index 000000000..c96408520 --- /dev/null +++ b/src/QD_MARL/configs/hpc/battlefield_hpc_pyribsCMA_best.json @@ -0,0 +1,141 @@ +{ + "hpc": true, + "environment": { + "map_size": 80, + "minimap_mode": true, + "step_reward": -0.005, + "dead_penalty": -0.1, + "attack_penalty": -0.1, + "attack_opponent_reward": 0.9, + "max_cycles": 500, + "extra_features": false, + "render_mode": null + }, + "training": { + "gamma": 0.9, + "episodes": 400, + "jobs": 12, + "generations": 60 + }, + "statistics": { + "agent": { + "type": "quantile", + "params": { + "q": 0.7, + "method": "midpoint" + } + }, + "set": { + "type": "mean", + "params": {} + } + }, + "sets": "random", + "team_to_optimize": "blue", + "observation": "34_new", + "manual_policy": null, + "n_agents": 12, + "n_sets": 1, + "me_config": { + "me": { + "name": "map_elites.MapElites", + "__name_2__": "MapElites_Pyribs", + "__name_3__": "MapElitesCMA_pyRibs", + "kwargs": { + "me_type": "MapElitesCMA_pyRibs", + "selection_type": "best", + "seed": 5, + "map_size": [ + 5, + 5 + ], + "cx_prob": 0.4, + "init_pop_size": 120, + "map_bounds": [ + [ + 0, + 5 + ], + [ + 0, + 5 + ] + ], + "batch_pop": 12, + "maximize": "True", + "restart_rule": 1, + "archive": "Grid", + "solution_dim": 2, + "bins": 50, + "sliding_bins": [ + 6, + 4 + ], + "emitters": 2, + "sigma0": 1, + "coach": { + "name": "Coach Battlefield v5", + "seed": 5, + "algorithm": "EvolutionaryComputation" + }, + "bounds": { + "float": { + "type": "float", + "min": 0.1, + "max": 1.0 + }, + "input_index": { + "type": "int", + "min": 0, + "max": 34 + }, + "action": { + "type": "int", + "min": 0, + "max": 21 + } + }, + "max_depth": 5, + "cond_depth": 5, + "generations": 10, + "logdir": "logs/" + } + }, + "DecisionTree": { + "gamma": 0.9 + }, + "ConditionFactory": { + "type": "orthogonal", + "n_inputs": 34 + }, + "QLearningLeafFactory": { + "kwargs": { + "leaf_params": { + "n_actions": 21, + "learning_rate": null + }, + "decorators": [ + [ + "RandomInit", + { + "low": -1, + "high": 1 + } + ], + [ + "EpsilonGreedy", + { + "epsilon": 1, + "decay": 0.99, + "min_epsilon": 0.05 + } + ], + [ + "NoBuffers", + {} + ] + ] + } + } + } +} \ No newline at end of file diff --git a/src/QD_MARL/configs/hpc/battlefield_hpc_pyribsCMA_coach.json b/src/QD_MARL/configs/hpc/battlefield_hpc_pyribsCMA_coach.json new file mode 100644 index 000000000..33069f4bf --- /dev/null +++ b/src/QD_MARL/configs/hpc/battlefield_hpc_pyribsCMA_coach.json @@ -0,0 +1,141 @@ +{ + "hpc": true, + "environment": { + "map_size": 80, + "minimap_mode": true, + "step_reward": -0.005, + "dead_penalty": -0.1, + "attack_penalty": -0.1, + "attack_opponent_reward": 0.9, + "max_cycles": 500, + "extra_features": false, + "render_mode": null + }, + "training": { + "gamma": 0.9, + "episodes": 400, + "jobs": 12, + "generations": 60 + }, + "statistics": { + "agent": { + "type": "quantile", + "params": { + "q": 0.7, + "method": "midpoint" + } + }, + "set": { + "type": "mean", + "params": {} + } + }, + "sets": "random", + "team_to_optimize": "blue", + "observation": "34_new", + "manual_policy": null, + "n_agents": 12, + "n_sets": 1, + "me_config": { + "me": { + "name": "map_elites.MapElites", + "__name_2__": "MapElites_Pyribs", + "__name_3__": "MapElitesCMA_pyRibs", + "kwargs": { + "me_type": "MapElitesCMA_pyRibs", + "selection_type": "coach", + "seed": 5, + "map_size": [ + 5, + 5 + ], + "cx_prob": 0.4, + "init_pop_size": 120, + "map_bounds": [ + [ + 0, + 5 + ], + [ + 0, + 5 + ] + ], + "batch_pop": 12, + "maximize": "True", + "restart_rule": 1, + "archive": "Grid", + "solution_dim": 2, + "bins": 50, + "sliding_bins": [ + 6, + 4 + ], + "emitters": 2, + "sigma0": 1, + "coach": { + "name": "Coach Battlefield v5", + "seed": 5, + "algorithm": "EvolutionaryComputation" + }, + "bounds": { + "float": { + "type": "float", + "min": 0.1, + "max": 1.0 + }, + "input_index": { + "type": "int", + "min": 0, + "max": 34 + }, + "action": { + "type": "int", + "min": 0, + "max": 21 + } + }, + "max_depth": 5, + "cond_depth": 5, + "generations": 10, + "logdir": "logs/" + } + }, + "DecisionTree": { + "gamma": 0.9 + }, + "ConditionFactory": { + "type": "orthogonal", + "n_inputs": 34 + }, + "QLearningLeafFactory": { + "kwargs": { + "leaf_params": { + "n_actions": 21, + "learning_rate": null + }, + "decorators": [ + [ + "RandomInit", + { + "low": -1, + "high": 1 + } + ], + [ + "EpsilonGreedy", + { + "epsilon": 1, + "decay": 0.99, + "min_epsilon": 0.05 + } + ], + [ + "NoBuffers", + {} + ] + ] + } + } + } +} \ No newline at end of file diff --git a/src/QD_MARL/configs/hpc/battlefield_hpc_pyribsCMA_random.json b/src/QD_MARL/configs/hpc/battlefield_hpc_pyribsCMA_random.json new file mode 100644 index 000000000..e0350cdc3 --- /dev/null +++ b/src/QD_MARL/configs/hpc/battlefield_hpc_pyribsCMA_random.json @@ -0,0 +1,141 @@ +{ + "hpc": true, + "environment": { + "map_size": 80, + "minimap_mode": true, + "step_reward": -0.005, + "dead_penalty": -0.1, + "attack_penalty": -0.1, + "attack_opponent_reward": 0.9, + "max_cycles": 500, + "extra_features": false, + "render_mode": null + }, + "training": { + "gamma": 0.9, + "episodes": 400, + "jobs": 12, + "generations": 60 + }, + "statistics": { + "agent": { + "type": "quantile", + "params": { + "q": 0.7, + "method": "midpoint" + } + }, + "set": { + "type": "mean", + "params": {} + } + }, + "sets": "random", + "team_to_optimize": "blue", + "observation": "34_new", + "manual_policy": null, + "n_agents": 12, + "n_sets": 1, + "me_config": { + "me": { + "name": "map_elites.MapElites", + "__name_2__": "MapElites_Pyribs", + "__name_3__": "MapElitesCMA_pyRibs", + "kwargs": { + "me_type": "MapElitesCMA_pyRibs", + "selection_type": "random", + "seed": 5, + "map_size": [ + 5, + 5 + ], + "cx_prob": 0.4, + "init_pop_size": 120, + "map_bounds": [ + [ + 0, + 5 + ], + [ + 0, + 5 + ] + ], + "batch_pop": 12, + "maximize": "True", + "restart_rule": 1, + "archive": "Grid", + "solution_dim": 2, + "bins": 50, + "sliding_bins": [ + 6, + 4 + ], + "emitters": 2, + "sigma0": 1, + "coach": { + "name": "Coach Battlefield v5", + "seed": 5, + "algorithm": "EvolutionaryComputation" + }, + "bounds": { + "float": { + "type": "float", + "min": 0.1, + "max": 1.0 + }, + "input_index": { + "type": "int", + "min": 0, + "max": 34 + }, + "action": { + "type": "int", + "min": 0, + "max": 21 + } + }, + "max_depth": 5, + "cond_depth": 5, + "generations": 10, + "logdir": "logs/" + } + }, + "DecisionTree": { + "gamma": 0.9 + }, + "ConditionFactory": { + "type": "orthogonal", + "n_inputs": 34 + }, + "QLearningLeafFactory": { + "kwargs": { + "leaf_params": { + "n_actions": 21, + "learning_rate": null + }, + "decorators": [ + [ + "RandomInit", + { + "low": -1, + "high": 1 + } + ], + [ + "EpsilonGreedy", + { + "epsilon": 1, + "decay": 0.99, + "min_epsilon": 0.05 + } + ], + [ + "NoBuffers", + {} + ] + ] + } + } + } +} \ No newline at end of file diff --git a/src/QD_MARL/configs/hpc/battlefield_hpc_pyribs_best.json b/src/QD_MARL/configs/hpc/battlefield_hpc_pyribs_best.json new file mode 100644 index 000000000..c1983ca0e --- /dev/null +++ b/src/QD_MARL/configs/hpc/battlefield_hpc_pyribs_best.json @@ -0,0 +1,141 @@ +{ + "hpc": true, + "environment": { + "map_size": 80, + "minimap_mode": true, + "step_reward": -0.005, + "dead_penalty": -0.1, + "attack_penalty": -0.1, + "attack_opponent_reward": 0.9, + "max_cycles": 500, + "extra_features": false, + "render_mode": null + }, + "training": { + "gamma": 0.9, + "episodes": 400, + "jobs": 12, + "generations": 60 + }, + "statistics": { + "agent": { + "type": "quantile", + "params": { + "q": 0.7, + "method": "midpoint" + } + }, + "set": { + "type": "mean", + "params": {} + } + }, + "sets": "random", + "team_to_optimize": "blue", + "observation": "34_new", + "manual_policy": null, + "n_agents": 12, + "n_sets": 1, + "me_config": { + "me": { + "name": "map_elites.MapElites", + "__name_2__": "MapElites_Pyribs", + "__name_3__": "MapElitesCMA_pyRibs", + "kwargs": { + "me_type": "MapElites_pyRibs", + "selection_type": "best", + "seed": 5, + "map_size": [ + 5, + 5 + ], + "cx_prob": 0.4, + "init_pop_size": 120, + "map_bounds": [ + [ + 0, + 5 + ], + [ + 0, + 5 + ] + ], + "batch_pop": 12, + "maximize": "True", + "restart_rule": 1, + "archive": "Grid", + "solution_dim": 2, + "bins": 50, + "sliding_bins": [ + 6, + 4 + ], + "emitters": 2, + "sigma0": 1, + "coach": { + "name": "Coach Battlefield v5", + "seed": 5, + "algorithm": "EvolutionaryComputation" + }, + "bounds": { + "float": { + "type": "float", + "min": 0.1, + "max": 1.0 + }, + "input_index": { + "type": "int", + "min": 0, + "max": 34 + }, + "action": { + "type": "int", + "min": 0, + "max": 21 + } + }, + "max_depth": 5, + "cond_depth": 5, + "generations": 10, + "logdir": "logs/" + } + }, + "DecisionTree": { + "gamma": 0.9 + }, + "ConditionFactory": { + "type": "orthogonal", + "n_inputs": 34 + }, + "QLearningLeafFactory": { + "kwargs": { + "leaf_params": { + "n_actions": 21, + "learning_rate": null + }, + "decorators": [ + [ + "RandomInit", + { + "low": -1, + "high": 1 + } + ], + [ + "EpsilonGreedy", + { + "epsilon": 1, + "decay": 0.99, + "min_epsilon": 0.05 + } + ], + [ + "NoBuffers", + {} + ] + ] + } + } + } +} \ No newline at end of file diff --git a/src/QD_MARL/configs/hpc/battlefield_hpc_pyribs_coach.json b/src/QD_MARL/configs/hpc/battlefield_hpc_pyribs_coach.json new file mode 100644 index 000000000..a0da32109 --- /dev/null +++ b/src/QD_MARL/configs/hpc/battlefield_hpc_pyribs_coach.json @@ -0,0 +1,141 @@ +{ + "hpc": true, + "environment": { + "map_size": 80, + "minimap_mode": true, + "step_reward": -0.005, + "dead_penalty": -0.1, + "attack_penalty": -0.1, + "attack_opponent_reward": 0.9, + "max_cycles": 500, + "extra_features": false, + "render_mode": null + }, + "training": { + "gamma": 0.9, + "episodes": 400, + "jobs": 12, + "generations": 60 + }, + "statistics": { + "agent": { + "type": "quantile", + "params": { + "q": 0.7, + "method": "midpoint" + } + }, + "set": { + "type": "mean", + "params": {} + } + }, + "sets": "random", + "team_to_optimize": "blue", + "observation": "34_new", + "manual_policy": null, + "n_agents": 12, + "n_sets": 1, + "me_config": { + "me": { + "name": "map_elites.MapElites", + "__name_2__": "MapElites_Pyribs", + "__name_3__": "MapElitesCMA_pyRibs", + "kwargs": { + "me_type": "MapElites_pyRibs", + "selection_type": "coach", + "seed": 5, + "map_size": [ + 5, + 5 + ], + "cx_prob": 0.4, + "init_pop_size": 120, + "map_bounds": [ + [ + 0, + 5 + ], + [ + 0, + 5 + ] + ], + "batch_pop": 12, + "maximize": "True", + "restart_rule": 1, + "archive": "Grid", + "solution_dim": 2, + "bins": 50, + "sliding_bins": [ + 6, + 4 + ], + "emitters": 2, + "sigma0": 1, + "coach": { + "name": "Coach Battlefield v5", + "seed": 5, + "algorithm": "EvolutionaryComputation" + }, + "bounds": { + "float": { + "type": "float", + "min": 0.1, + "max": 1.0 + }, + "input_index": { + "type": "int", + "min": 0, + "max": 34 + }, + "action": { + "type": "int", + "min": 0, + "max": 21 + } + }, + "max_depth": 5, + "cond_depth": 5, + "generations": 10, + "logdir": "logs/" + } + }, + "DecisionTree": { + "gamma": 0.9 + }, + "ConditionFactory": { + "type": "orthogonal", + "n_inputs": 34 + }, + "QLearningLeafFactory": { + "kwargs": { + "leaf_params": { + "n_actions": 21, + "learning_rate": null + }, + "decorators": [ + [ + "RandomInit", + { + "low": -1, + "high": 1 + } + ], + [ + "EpsilonGreedy", + { + "epsilon": 1, + "decay": 0.99, + "min_epsilon": 0.05 + } + ], + [ + "NoBuffers", + {} + ] + ] + } + } + } +} \ No newline at end of file diff --git a/src/QD_MARL/configs/hpc/battlefield_hpc_pyribs_random.json b/src/QD_MARL/configs/hpc/battlefield_hpc_pyribs_random.json new file mode 100644 index 000000000..5b71ca540 --- /dev/null +++ b/src/QD_MARL/configs/hpc/battlefield_hpc_pyribs_random.json @@ -0,0 +1,141 @@ +{ + "hpc": true, + "environment": { + "map_size": 80, + "minimap_mode": true, + "step_reward": -0.005, + "dead_penalty": -0.1, + "attack_penalty": -0.1, + "attack_opponent_reward": 0.9, + "max_cycles": 500, + "extra_features": false, + "render_mode": null + }, + "training": { + "gamma": 0.9, + "episodes": 400, + "jobs": 12, + "generations": 60 + }, + "statistics": { + "agent": { + "type": "quantile", + "params": { + "q": 0.7, + "method": "midpoint" + } + }, + "set": { + "type": "mean", + "params": {} + } + }, + "sets": "random", + "team_to_optimize": "blue", + "observation": "34_new", + "manual_policy": null, + "n_agents": 12, + "n_sets": 1, + "me_config": { + "me": { + "name": "map_elites.MapElites", + "__name_2__": "MapElites_Pyribs", + "__name_3__": "MapElitesCMA_pyRibs", + "kwargs": { + "me_type": "MapElites_pyRibs", + "selection_type": "random", + "seed": 5, + "map_size": [ + 5, + 5 + ], + "cx_prob": 0.4, + "init_pop_size": 120, + "map_bounds": [ + [ + 0, + 5 + ], + [ + 0, + 5 + ] + ], + "batch_pop": 12, + "maximize": "True", + "restart_rule": 1, + "archive": "Grid", + "solution_dim": 2, + "bins": 50, + "sliding_bins": [ + 6, + 4 + ], + "emitters": 2, + "sigma0": 1, + "coach": { + "name": "Coach Battlefield v5", + "seed": 5, + "algorithm": "EvolutionaryComputation" + }, + "bounds": { + "float": { + "type": "float", + "min": 0.1, + "max": 1.0 + }, + "input_index": { + "type": "int", + "min": 0, + "max": 34 + }, + "action": { + "type": "int", + "min": 0, + "max": 21 + } + }, + "max_depth": 5, + "cond_depth": 5, + "generations": 10, + "logdir": "logs/" + } + }, + "DecisionTree": { + "gamma": 0.9 + }, + "ConditionFactory": { + "type": "orthogonal", + "n_inputs": 34 + }, + "QLearningLeafFactory": { + "kwargs": { + "leaf_params": { + "n_actions": 21, + "learning_rate": null + }, + "decorators": [ + [ + "RandomInit", + { + "low": -1, + "high": 1 + } + ], + [ + "EpsilonGreedy", + { + "epsilon": 1, + "decay": 0.99, + "min_epsilon": 0.05 + } + ], + [ + "NoBuffers", + {} + ] + ] + } + } + } +} \ No newline at end of file diff --git a/src/QD_MARL/configs/hpc/no_sets/battlefield_hpc_pyribsCMA_best.json b/src/QD_MARL/configs/hpc/no_sets/battlefield_hpc_pyribsCMA_best.json new file mode 100644 index 000000000..326c75054 --- /dev/null +++ b/src/QD_MARL/configs/hpc/no_sets/battlefield_hpc_pyribsCMA_best.json @@ -0,0 +1,141 @@ +{ + "hpc": true, + "environment": { + "map_size": 80, + "minimap_mode": true, + "step_reward": -0.005, + "dead_penalty": -0.1, + "attack_penalty": -0.1, + "attack_opponent_reward": 0.9, + "max_cycles": 500, + "extra_features": false, + "render_mode": null + }, + "training": { + "gamma": 0.9, + "episodes": 400, + "jobs": 12, + "generations": 60 + }, + "statistics": { + "agent": { + "type": "quantile", + "params": { + "q": 0.7, + "method": "midpoint" + } + }, + "set": { + "type": "mean", + "params": {} + } + }, + "sets": "random", + "team_to_optimize": "blue", + "observation": "34_new", + "manual_policy": null, + "n_agents": 12, + "n_sets": 1, + "me_config": { + "me": { + "name": "map_elites.MapElites", + "__name_2__": "MapElites_Pyribs", + "__name_3__": "MapElitesCMA_pyRibs", + "kwargs": { + "me_type": "MapElitesCMA_pyRibs", + "selection_type": "best", + "seed": 5, + "map_size": [ + 7, + 7 + ], + "cx_prob": 0.4, + "init_pop_size": 120, + "map_bounds": [ + [ + 0, + 7 + ], + [ + 0, + 7 + ] + ], + "batch_pop": 12, + "maximize": "True", + "restart_rule": 1, + "archive": "Grid", + "solution_dim": 2, + "bins": 50, + "sliding_bins": [ + 6, + 4 + ], + "emitters": 2, + "sigma0": 1, + "coach": { + "name": "Coach Battlefield v5", + "seed": 5, + "algorithm": "EvolutionaryComputation" + }, + "bounds": { + "float": { + "type": "float", + "min": 0.1, + "max": 1.0 + }, + "input_index": { + "type": "int", + "min": 0, + "max": 34 + }, + "action": { + "type": "int", + "min": 0, + "max": 21 + } + }, + "max_depth": 7, + "cond_depth": 7, + "generations": 10, + "logdir": "logs/" + } + }, + "DecisionTree": { + "gamma": 0 + }, + "ConditionFactory": { + "type": "orthogonal", + "n_inputs": 34 + }, + "QLearningLeafFactory": { + "kwargs": { + "leaf_params": { + "n_actions": 21, + "learning_rate": null + }, + "decorators": [ + [ + "RandomInit", + { + "low": -1, + "high": 1 + } + ], + [ + "EpsilonGreedy", + { + "epsilon": 1, + "decay": 0.99, + "min_epsilon": 0.05 + } + ], + [ + "NoBuffers", + {} + ] + ] + } + } + } +} \ No newline at end of file diff --git a/src/QD_MARL/configs/hpc/no_sets/battlefield_hpc_pyribsCMA_coach.json b/src/QD_MARL/configs/hpc/no_sets/battlefield_hpc_pyribsCMA_coach.json new file mode 100644 index 000000000..2dcf0ef67 --- /dev/null +++ b/src/QD_MARL/configs/hpc/no_sets/battlefield_hpc_pyribsCMA_coach.json @@ -0,0 +1,141 @@ +{ + "hpc": true, + "environment": { + "map_size": 80, + "minimap_mode": true, + "step_reward": -0.005, + "dead_penalty": -0.1, + "attack_penalty": -0.1, + "attack_opponent_reward": 0.9, + "max_cycles": 500, + "extra_features": false, + "render_mode": null + }, + "training": { + "gamma": 0.9, + "episodes": 400, + "jobs": 12, + "generations": 60 + }, + "statistics": { + "agent": { + "type": "quantile", + "params": { + "q": 0.7, + "method": "midpoint" + } + }, + "set": { + "type": "mean", + "params": {} + } + }, + "sets": "random", + "team_to_optimize": "blue", + "observation": "34_new", + "manual_policy": null, + "n_agents": 12, + "n_sets": 1, + "me_config": { + "me": { + "name": "map_elites.MapElites", + "__name_2__": "MapElites_Pyribs", + "__name_3__": "MapElitesCMA_pyRibs", + "kwargs": { + "me_type": "MapElitesCMA_pyRibs", + "selection_type": "coach", + "seed": 5, + "map_size": [ + 7, + 7 + ], + "cx_prob": 0.4, + "init_pop_size": 120, + "map_bounds": [ + [ + 0, + 7 + ], + [ + 0, + 7 + ] + ], + "batch_pop": 12, + "maximize": "True", + "restart_rule": 1, + "archive": "Grid", + "solution_dim": 2, + "bins": 50, + "sliding_bins": [ + 6, + 4 + ], + "emitters": 2, + "sigma0": 1, + "coach": { + "name": "Coach Battlefield v5", + "seed": 5, + "algorithm": "EvolutionaryComputation" + }, + "bounds": { + "float": { + "type": "float", + "min": 0.1, + "max": 1.0 + }, + "input_index": { + "type": "int", + "min": 0, + "max": 34 + }, + "action": { + "type": "int", + "min": 0, + "max": 21 + } + }, + "max_depth": 7, + "cond_depth": 7, + "generations": 10, + "logdir": "logs/" + } + }, + "DecisionTree": { + "gamma": 0 + }, + "ConditionFactory": { + "type": "orthogonal", + "n_inputs": 34 + }, + "QLearningLeafFactory": { + "kwargs": { + "leaf_params": { + "n_actions": 21, + "learning_rate": null + }, + "decorators": [ + [ + "RandomInit", + { + "low": -1, + "high": 1 + } + ], + [ + "EpsilonGreedy", + { + "epsilon": 1, + "decay": 0.99, + "min_epsilon": 0.05 + } + ], + [ + "NoBuffers", + {} + ] + ] + } + } + } +} \ No newline at end of file diff --git a/src/QD_MARL/configs/hpc/no_sets/battlefield_hpc_pyribsCMA_random.json b/src/QD_MARL/configs/hpc/no_sets/battlefield_hpc_pyribsCMA_random.json new file mode 100644 index 000000000..c5124d000 --- /dev/null +++ b/src/QD_MARL/configs/hpc/no_sets/battlefield_hpc_pyribsCMA_random.json @@ -0,0 +1,141 @@ +{ + "hpc": true, + "environment": { + "map_size": 80, + "minimap_mode": true, + "step_reward": -0.005, + "dead_penalty": -0.1, + "attack_penalty": -0.1, + "attack_opponent_reward": 0.9, + "max_cycles": 500, + "extra_features": false, + "render_mode": null + }, + "training": { + "gamma": 0.9, + "episodes": 400, + "jobs": 12, + "generations": 60 + }, + "statistics": { + "agent": { + "type": "quantile", + "params": { + "q": 0.7, + "method": "midpoint" + } + }, + "set": { + "type": "mean", + "params": {} + } + }, + "sets": "random", + "team_to_optimize": "blue", + "observation": "34_new", + "manual_policy": null, + "n_agents": 12, + "n_sets": 1, + "me_config": { + "me": { + "name": "map_elites.MapElites", + "__name_2__": "MapElites_Pyribs", + "__name_3__": "MapElitesCMA_pyRibs", + "kwargs": { + "me_type": "MapElitesCMA_pyRibs", + "selection_type": "random", + "seed": 5, + "map_size": [ + 7, + 7 + ], + "cx_prob": 0.4, + "init_pop_size": 120, + "map_bounds": [ + [ + 0, + 7 + ], + [ + 0, + 7 + ] + ], + "batch_pop": 12, + "maximize": "True", + "restart_rule": 1, + "archive": "Grid", + "solution_dim": 2, + "bins": 50, + "sliding_bins": [ + 6, + 4 + ], + "emitters": 2, + "sigma0": 1, + "coach": { + "name": "Coach Battlefield v5", + "seed": 5, + "algorithm": "EvolutionaryComputation" + }, + "bounds": { + "float": { + "type": "float", + "min": 0.1, + "max": 1.0 + }, + "input_index": { + "type": "int", + "min": 0, + "max": 34 + }, + "action": { + "type": "int", + "min": 0, + "max": 21 + } + }, + "max_depth": 7, + "cond_depth": 7, + "generations": 10, + "logdir": "logs/" + } + }, + "DecisionTree": { + "gamma": 0 + }, + "ConditionFactory": { + "type": "orthogonal", + "n_inputs": 34 + }, + "QLearningLeafFactory": { + "kwargs": { + "leaf_params": { + "n_actions": 21, + "learning_rate": null + }, + "decorators": [ + [ + "RandomInit", + { + "low": -1, + "high": 1 + } + ], + [ + "EpsilonGreedy", + { + "epsilon": 1, + "decay": 0.99, + "min_epsilon": 0.05 + } + ], + [ + "NoBuffers", + {} + ] + ] + } + } + } +} \ No newline at end of file diff --git a/src/QD_MARL/configs/hpc/no_sets/battlefield_hpc_pyribs_best.json b/src/QD_MARL/configs/hpc/no_sets/battlefield_hpc_pyribs_best.json new file mode 100644 index 000000000..496aa0720 --- /dev/null +++ b/src/QD_MARL/configs/hpc/no_sets/battlefield_hpc_pyribs_best.json @@ -0,0 +1,141 @@ +{ + "hpc": true, + "environment": { + "map_size": 80, + "minimap_mode": true, + "step_reward": -0.005, + "dead_penalty": -0.1, + "attack_penalty": -0.1, + "attack_opponent_reward": 0.9, + "max_cycles": 500, + "extra_features": false, + "render_mode": null + }, + "training": { + "gamma": 0.9, + "episodes": 400, + "jobs": 12, + "generations": 60 + }, + "statistics": { + "agent": { + "type": "quantile", + "params": { + "q": 0.7, + "method": "midpoint" + } + }, + "set": { + "type": "mean", + "params": {} + } + }, + "sets": "random", + "team_to_optimize": "blue", + "observation": "34_new", + "manual_policy": null, + "n_agents": 12, + "n_sets": 1, + "me_config": { + "me": { + "name": "map_elites.MapElites", + "__name_2__": "MapElites_Pyribs", + "__name_3__": "MapElitesCMA_pyRibs", + "kwargs": { + "me_type": "MapElites_pyRibs", + "selection_type": "best", + "seed": 5, + "map_size": [ + 7, + 7 + ], + "cx_prob": 0.4, + "init_pop_size": 120, + "map_bounds": [ + [ + 0, + 7 + ], + [ + 0, + 7 + ] + ], + "batch_pop": 12, + "maximize": "True", + "restart_rule": 1, + "archive": "Grid", + "solution_dim": 2, + "bins": 50, + "sliding_bins": [ + 6, + 4 + ], + "emitters": 2, + "sigma0": 1, + "coach": { + "name": "Coach Battlefield v5", + "seed": 5, + "algorithm": "EvolutionaryComputation" + }, + "bounds": { + "float": { + "type": "float", + "min": 0.1, + "max": 1.0 + }, + "input_index": { + "type": "int", + "min": 0, + "max": 34 + }, + "action": { + "type": "int", + "min": 0, + "max": 21 + } + }, + "max_depth": 7, + "cond_depth": 7, + "generations": 10, + "logdir": "logs/" + } + }, + "DecisionTree": { + "gamma": 0 + }, + "ConditionFactory": { + "type": "orthogonal", + "n_inputs": 34 + }, + "QLearningLeafFactory": { + "kwargs": { + "leaf_params": { + "n_actions": 21, + "learning_rate": null + }, + "decorators": [ + [ + "RandomInit", + { + "low": -1, + "high": 1 + } + ], + [ + "EpsilonGreedy", + { + "epsilon": 1, + "decay": 0.99, + "min_epsilon": 0.05 + } + ], + [ + "NoBuffers", + {} + ] + ] + } + } + } +} \ No newline at end of file diff --git a/src/QD_MARL/configs/hpc/no_sets/battlefield_hpc_pyribs_coach.json b/src/QD_MARL/configs/hpc/no_sets/battlefield_hpc_pyribs_coach.json new file mode 100644 index 000000000..f5bc2dcd6 --- /dev/null +++ b/src/QD_MARL/configs/hpc/no_sets/battlefield_hpc_pyribs_coach.json @@ -0,0 +1,141 @@ +{ + "hpc": true, + "environment": { + "map_size": 80, + "minimap_mode": true, + "step_reward": -0.005, + "dead_penalty": -0.1, + "attack_penalty": -0.1, + "attack_opponent_reward": 0.9, + "max_cycles": 500, + "extra_features": false, + "render_mode": null + }, + "training": { + "gamma": 0.9, + "episodes": 400, + "jobs": 12, + "generations": 60 + }, + "statistics": { + "agent": { + "type": "quantile", + "params": { + "q": 0.7, + "method": "midpoint" + } + }, + "set": { + "type": "mean", + "params": {} + } + }, + "sets": "random", + "team_to_optimize": "blue", + "observation": "34_new", + "manual_policy": null, + "n_agents": 12, + "n_sets": 1, + "me_config": { + "me": { + "name": "map_elites.MapElites", + "__name_2__": "MapElites_Pyribs", + "__name_3__": "MapElitesCMA_pyRibs", + "kwargs": { + "me_type": "MapElites_pyRibs", + "selection_type": "coach", + "seed": 5, + "map_size": [ + 7, + 7 + ], + "cx_prob": 0.4, + "init_pop_size": 120, + "map_bounds": [ + [ + 0, + 7 + ], + [ + 0, + 7 + ] + ], + "batch_pop": 12, + "maximize": "True", + "restart_rule": 1, + "archive": "Grid", + "solution_dim": 2, + "bins": 50, + "sliding_bins": [ + 6, + 4 + ], + "emitters": 2, + "sigma0": 1, + "coach": { + "name": "Coach Battlefield v5", + "seed": 5, + "algorithm": "EvolutionaryComputation" + }, + "bounds": { + "float": { + "type": "float", + "min": 0.1, + "max": 1.0 + }, + "input_index": { + "type": "int", + "min": 0, + "max": 34 + }, + "action": { + "type": "int", + "min": 0, + "max": 21 + } + }, + "max_depth": 7, + "cond_depth": 7, + "generations": 10, + "logdir": "logs/" + } + }, + "DecisionTree": { + "gamma": 0 + }, + "ConditionFactory": { + "type": "orthogonal", + "n_inputs": 34 + }, + "QLearningLeafFactory": { + "kwargs": { + "leaf_params": { + "n_actions": 21, + "learning_rate": null + }, + "decorators": [ + [ + "RandomInit", + { + "low": -1, + "high": 1 + } + ], + [ + "EpsilonGreedy", + { + "epsilon": 1, + "decay": 0.99, + "min_epsilon": 0.05 + } + ], + [ + "NoBuffers", + {} + ] + ] + } + } + } +} \ No newline at end of file diff --git a/src/QD_MARL/configs/hpc/no_sets/battlefield_hpc_pyribs_random.json b/src/QD_MARL/configs/hpc/no_sets/battlefield_hpc_pyribs_random.json new file mode 100644 index 000000000..60de53ffd --- /dev/null +++ b/src/QD_MARL/configs/hpc/no_sets/battlefield_hpc_pyribs_random.json @@ -0,0 +1,141 @@ +{ + "hpc": true, + "environment": { + "map_size": 80, + "minimap_mode": true, + "step_reward": -0.005, + "dead_penalty": -0.1, + "attack_penalty": -0.1, + "attack_opponent_reward": 0.9, + "max_cycles": 500, + "extra_features": false, + "render_mode": null + }, + "training": { + "gamma": 0.9, + "episodes": 400, + "jobs": 12, + "generations": 60 + }, + "statistics": { + "agent": { + "type": "quantile", + "params": { + "q": 0.7, + "method": "midpoint" + } + }, + "set": { + "type": "mean", + "params": {} + } + }, + "sets": "random", + "team_to_optimize": "blue", + "observation": "34_new", + "manual_policy": null, + "n_agents": 12, + "n_sets": 1, + "me_config": { + "me": { + "name": "map_elites.MapElites", + "__name_2__": "MapElites_Pyribs", + "__name_3__": "MapElitesCMA_pyRibs", + "kwargs": { + "me_type": "MapElites_pyRibs", + "selection_type": "random", + "seed": 5, + "map_size": [ + 7, + 7 + ], + "cx_prob": 0.4, + "init_pop_size": 120, + "map_bounds": [ + [ + 0, + 7 + ], + [ + 0, + 7 + ] + ], + "batch_pop": 12, + "maximize": "True", + "restart_rule": 1, + "archive": "Grid", + "solution_dim": 2, + "bins": 50, + "sliding_bins": [ + 6, + 4 + ], + "emitters": 2, + "sigma0": 1, + "coach": { + "name": "Coach Battlefield v5", + "seed": 5, + "algorithm": "EvolutionaryComputation" + }, + "bounds": { + "float": { + "type": "float", + "min": 0.1, + "max": 1.0 + }, + "input_index": { + "type": "int", + "min": 0, + "max": 34 + }, + "action": { + "type": "int", + "min": 0, + "max": 21 + } + }, + "max_depth": 7, + "cond_depth": 7, + "generations": 10, + "logdir": "logs/" + } + }, + "DecisionTree": { + "gamma": 0 + }, + "ConditionFactory": { + "type": "orthogonal", + "n_inputs": 34 + }, + "QLearningLeafFactory": { + "kwargs": { + "leaf_params": { + "n_actions": 21, + "learning_rate": null + }, + "decorators": [ + [ + "RandomInit", + { + "low": -1, + "high": 1 + } + ], + [ + "EpsilonGreedy", + { + "epsilon": 1, + "decay": 0.99, + "min_epsilon": 0.05 + } + ], + [ + "NoBuffers", + {} + ] + ] + } + } + } +} \ No newline at end of file diff --git a/src/QD_MARL/configs/hpc/with_sets/battlefield_hpc_pyribsCMA_best.json b/src/QD_MARL/configs/hpc/with_sets/battlefield_hpc_pyribsCMA_best.json new file mode 100644 index 000000000..ea3a53d75 --- /dev/null +++ b/src/QD_MARL/configs/hpc/with_sets/battlefield_hpc_pyribsCMA_best.json @@ -0,0 +1,141 @@ +{ + "hpc": true, + "environment": { + "map_size": 80, + "minimap_mode": true, + "step_reward": -0.005, + "dead_penalty": -0.1, + "attack_penalty": -0.1, + "attack_opponent_reward": 0.9, + "max_cycles": 500, + "extra_features": false, + "render_mode": null + }, + "training": { + "gamma": 0.9, + "episodes": 400, + "jobs": 12, + "generations": 60 + }, + "statistics": { + "agent": { + "type": "quantile", + "params": { + "q": 0.7, + "method": "midpoint" + } + }, + "set": { + "type": "mean", + "params": {} + } + }, + "sets": "random", + "team_to_optimize": "blue", + "observation": "34_new", + "manual_policy": null, + "n_agents": 12, + "n_sets": 1, + "me_config": { + "me": { + "name": "map_elites.MapElites", + "__name_2__": "MapElites_Pyribs", + "__name_3__": "MapElitesCMA_pyRibs", + "kwargs": { + "me_type": "MapElitesCMA_pyRibs", + "selection_type": "best", + "seed": 5, + "map_size": [ + 7, + 7 + ], + "cx_prob": 0.4, + "init_pop_size": 36, + "map_bounds": [ + [ + 0, + 7 + ], + [ + 0, + 7 + ] + ], + "batch_pop": 12, + "maximize": "True", + "restart_rule": 1, + "archive": "Grid", + "solution_dim": 2, + "bins": 50, + "sliding_bins": [ + 6, + 4 + ], + "emitters": 2, + "sigma0": 1, + "coach": { + "name": "Coach Battlefield v5", + "seed": 5, + "algorithm": "EvolutionaryComputation" + }, + "bounds": { + "float": { + "type": "float", + "min": 0.1, + "max": 1.0 + }, + "input_index": { + "type": "int", + "min": 0, + "max": 34 + }, + "action": { + "type": "int", + "min": 0, + "max": 21 + } + }, + "max_depth": 7, + "cond_depth": 7, + "generations": 10, + "logdir": "logs/" + } + }, + "DecisionTree": { + "gamma": 0 + }, + "ConditionFactory": { + "type": "orthogonal", + "n_inputs": 34 + }, + "QLearningLeafFactory": { + "kwargs": { + "leaf_params": { + "n_actions": 21, + "learning_rate": null + }, + "decorators": [ + [ + "RandomInit", + { + "low": -1, + "high": 1 + } + ], + [ + "EpsilonGreedy", + { + "epsilon": 1, + "decay": 0.99, + "min_epsilon": 0.05 + } + ], + [ + "NoBuffers", + {} + ] + ] + } + } + } +} \ No newline at end of file diff --git a/src/QD_MARL/configs/hpc/with_sets/battlefield_hpc_pyribsCMA_coach.json b/src/QD_MARL/configs/hpc/with_sets/battlefield_hpc_pyribsCMA_coach.json new file mode 100644 index 000000000..c2726d54c --- /dev/null +++ b/src/QD_MARL/configs/hpc/with_sets/battlefield_hpc_pyribsCMA_coach.json @@ -0,0 +1,141 @@ +{ + "hpc": true, + "environment": { + "map_size": 80, + "minimap_mode": true, + "step_reward": -0.005, + "dead_penalty": -0.1, + "attack_penalty": -0.1, + "attack_opponent_reward": 0.9, + "max_cycles": 500, + "extra_features": false, + "render_mode": null + }, + "training": { + "gamma": 0.9, + "episodes": 400, + "jobs": 12, + "generations": 60 + }, + "statistics": { + "agent": { + "type": "quantile", + "params": { + "q": 0.7, + "method": "midpoint" + } + }, + "set": { + "type": "mean", + "params": {} + } + }, + "sets": "random", + "team_to_optimize": "blue", + "observation": "34_new", + "manual_policy": null, + "n_agents": 12, + "n_sets": 1, + "me_config": { + "me": { + "name": "map_elites.MapElites", + "__name_2__": "MapElites_Pyribs", + "__name_3__": "MapElitesCMA_pyRibs", + "kwargs": { + "me_type": "MapElitesCMA_pyRibs", + "selection_type": "coach", + "seed": 5, + "map_size": [ + 7, + 7 + ], + "cx_prob": 0.4, + "init_pop_size": 36, + "map_bounds": [ + [ + 0, + 7 + ], + [ + 0, + 7 + ] + ], + "batch_pop": 12, + "maximize": "True", + "restart_rule": 1, + "archive": "Grid", + "solution_dim": 2, + "bins": 50, + "sliding_bins": [ + 6, + 4 + ], + "emitters": 2, + "sigma0": 1, + "coach": { + "name": "Coach Battlefield v5", + "seed": 5, + "algorithm": "EvolutionaryComputation" + }, + "bounds": { + "float": { + "type": "float", + "min": 0.1, + "max": 1.0 + }, + "input_index": { + "type": "int", + "min": 0, + "max": 34 + }, + "action": { + "type": "int", + "min": 0, + "max": 21 + } + }, + "max_depth": 7, + "cond_depth": 7, + "generations": 10, + "logdir": "logs/" + } + }, + "DecisionTree": { + "gamma": 0 + }, + "ConditionFactory": { + "type": "orthogonal", + "n_inputs": 34 + }, + "QLearningLeafFactory": { + "kwargs": { + "leaf_params": { + "n_actions": 21, + "learning_rate": null + }, + "decorators": [ + [ + "RandomInit", + { + "low": -1, + "high": 1 + } + ], + [ + "EpsilonGreedy", + { + "epsilon": 1, + "decay": 0.99, + "min_epsilon": 0.05 + } + ], + [ + "NoBuffers", + {} + ] + ] + } + } + } +} \ No newline at end of file diff --git a/src/QD_MARL/configs/hpc/with_sets/battlefield_hpc_pyribsCMA_random.json b/src/QD_MARL/configs/hpc/with_sets/battlefield_hpc_pyribsCMA_random.json new file mode 100644 index 000000000..cccd938e6 --- /dev/null +++ b/src/QD_MARL/configs/hpc/with_sets/battlefield_hpc_pyribsCMA_random.json @@ -0,0 +1,141 @@ +{ + "hpc": true, + "environment": { + "map_size": 80, + "minimap_mode": true, + "step_reward": -0.005, + "dead_penalty": -0.1, + "attack_penalty": -0.1, + "attack_opponent_reward": 0.9, + "max_cycles": 500, + "extra_features": false, + "render_mode": null + }, + "training": { + "gamma": 0.9, + "episodes": 400, + "jobs": 12, + "generations": 60 + }, + "statistics": { + "agent": { + "type": "quantile", + "params": { + "q": 0.7, + "method": "midpoint" + } + }, + "set": { + "type": "mean", + "params": {} + } + }, + "sets": "random", + "team_to_optimize": "blue", + "observation": "34_new", + "manual_policy": null, + "n_agents": 12, + "n_sets": 1, + "me_config": { + "me": { + "name": "map_elites.MapElites", + "__name_2__": "MapElites_Pyribs", + "__name_3__": "MapElitesCMA_pyRibs", + "kwargs": { + "me_type": "MapElitesCMA_pyRibs", + "selection_type": "random", + "seed": 5, + "map_size": [ + 7, + 7 + ], + "cx_prob": 0.4, + "init_pop_size": 36, + "map_bounds": [ + [ + 0, + 7 + ], + [ + 0, + 7 + ] + ], + "batch_pop": 12, + "maximize": "True", + "restart_rule": 1, + "archive": "Grid", + "solution_dim": 2, + "bins": 50, + "sliding_bins": [ + 6, + 4 + ], + "emitters": 2, + "sigma0": 1, + "coach": { + "name": "Coach Battlefield v5", + "seed": 5, + "algorithm": "EvolutionaryComputation" + }, + "bounds": { + "float": { + "type": "float", + "min": 0.1, + "max": 1.0 + }, + "input_index": { + "type": "int", + "min": 0, + "max": 34 + }, + "action": { + "type": "int", + "min": 0, + "max": 21 + } + }, + "max_depth": 7, + "cond_depth": 7, + "generations": 10, + "logdir": "logs/" + } + }, + "DecisionTree": { + "gamma": 0 + }, + "ConditionFactory": { + "type": "orthogonal", + "n_inputs": 34 + }, + "QLearningLeafFactory": { + "kwargs": { + "leaf_params": { + "n_actions": 21, + "learning_rate": null + }, + "decorators": [ + [ + "RandomInit", + { + "low": -1, + "high": 1 + } + ], + [ + "EpsilonGreedy", + { + "epsilon": 1, + "decay": 0.99, + "min_epsilon": 0.05 + } + ], + [ + "NoBuffers", + {} + ] + ] + } + } + } +} \ No newline at end of file diff --git a/src/QD_MARL/configs/hpc/with_sets/battlefield_hpc_pyribs_best.json b/src/QD_MARL/configs/hpc/with_sets/battlefield_hpc_pyribs_best.json new file mode 100644 index 000000000..70f762e60 --- /dev/null +++ b/src/QD_MARL/configs/hpc/with_sets/battlefield_hpc_pyribs_best.json @@ -0,0 +1,141 @@ +{ + "hpc": true, + "environment": { + "map_size": 80, + "minimap_mode": true, + "step_reward": -0.005, + "dead_penalty": -0.1, + "attack_penalty": -0.1, + "attack_opponent_reward": 0.9, + "max_cycles": 500, + "extra_features": false, + "render_mode": null + }, + "training": { + "gamma": 0.9, + "episodes": 400, + "jobs": 12, + "generations": 60 + }, + "statistics": { + "agent": { + "type": "quantile", + "params": { + "q": 0.7, + "method": "midpoint" + } + }, + "set": { + "type": "mean", + "params": {} + } + }, + "sets": "random", + "team_to_optimize": "blue", + "observation": "34_new", + "manual_policy": null, + "n_agents": 12, + "n_sets": 1, + "me_config": { + "me": { + "name": "map_elites.MapElites", + "__name_2__": "MapElites_Pyribs", + "__name_3__": "MapElitesCMA_pyRibs", + "kwargs": { + "me_type": "MapElites_pyRibs", + "selection_type": "best", + "seed": 5, + "map_size": [ + 7, + 7 + ], + "cx_prob": 0.4, + "init_pop_size": 36, + "map_bounds": [ + [ + 0, + 7 + ], + [ + 0, + 7 + ] + ], + "batch_pop": 12, + "maximize": "True", + "restart_rule": 1, + "archive": "Grid", + "solution_dim": 2, + "bins": 50, + "sliding_bins": [ + 6, + 4 + ], + "emitters": 2, + "sigma0": 1, + "coach": { + "name": "Coach Battlefield v5", + "seed": 5, + "algorithm": "EvolutionaryComputation" + }, + "bounds": { + "float": { + "type": "float", + "min": 0.1, + "max": 1.0 + }, + "input_index": { + "type": "int", + "min": 0, + "max": 34 + }, + "action": { + "type": "int", + "min": 0, + "max": 21 + } + }, + "max_depth": 7, + "cond_depth": 7, + "generations": 10, + "logdir": "logs/" + } + }, + "DecisionTree": { + "gamma": 0 + }, + "ConditionFactory": { + "type": "orthogonal", + "n_inputs": 34 + }, + "QLearningLeafFactory": { + "kwargs": { + "leaf_params": { + "n_actions": 21, + "learning_rate": null + }, + "decorators": [ + [ + "RandomInit", + { + "low": -1, + "high": 1 + } + ], + [ + "EpsilonGreedy", + { + "epsilon": 1, + "decay": 0.99, + "min_epsilon": 0.05 + } + ], + [ + "NoBuffers", + {} + ] + ] + } + } + } +} \ No newline at end of file diff --git a/src/QD_MARL/configs/hpc/with_sets/battlefield_hpc_pyribs_coach.json b/src/QD_MARL/configs/hpc/with_sets/battlefield_hpc_pyribs_coach.json new file mode 100644 index 000000000..be5af777e --- /dev/null +++ b/src/QD_MARL/configs/hpc/with_sets/battlefield_hpc_pyribs_coach.json @@ -0,0 +1,141 @@ +{ + "hpc": true, + "environment": { + "map_size": 80, + "minimap_mode": true, + "step_reward": -0.005, + "dead_penalty": -0.1, + "attack_penalty": -0.1, + "attack_opponent_reward": 0.9, + "max_cycles": 500, + "extra_features": false, + "render_mode": null + }, + "training": { + "gamma": 0.9, + "episodes": 400, + "jobs": 12, + "generations": 60 + }, + "statistics": { + "agent": { + "type": "quantile", + "params": { + "q": 0.7, + "method": "midpoint" + } + }, + "set": { + "type": "mean", + "params": {} + } + }, + "sets": "random", + "team_to_optimize": "blue", + "observation": "34_new", + "manual_policy": null, + "n_agents": 12, + "n_sets": 1, + "me_config": { + "me": { + "name": "map_elites.MapElites", + "__name_2__": "MapElites_Pyribs", + "__name_3__": "MapElitesCMA_pyRibs", + "kwargs": { + "me_type": "MapElites_pyRibs", + "selection_type": "coach", + "seed": 5, + "map_size": [ + 7, + 7 + ], + "cx_prob": 0.4, + "init_pop_size": 36, + "map_bounds": [ + [ + 0, + 7 + ], + [ + 0, + 7 + ] + ], + "batch_pop": 12, + "maximize": "True", + "restart_rule": 1, + "archive": "Grid", + "solution_dim": 2, + "bins": 50, + "sliding_bins": [ + 6, + 4 + ], + "emitters": 2, + "sigma0": 1, + "coach": { + "name": "Coach Battlefield v5", + "seed": 5, + "algorithm": "EvolutionaryComputation" + }, + "bounds": { + "float": { + "type": "float", + "min": 0.1, + "max": 1.0 + }, + "input_index": { + "type": "int", + "min": 0, + "max": 34 + }, + "action": { + "type": "int", + "min": 0, + "max": 21 + } + }, + "max_depth": 7, + "cond_depth": 7, + "generations": 10, + "logdir": "logs/" + } + }, + "DecisionTree": { + "gamma": 0 + }, + "ConditionFactory": { + "type": "orthogonal", + "n_inputs": 34 + }, + "QLearningLeafFactory": { + "kwargs": { + "leaf_params": { + "n_actions": 21, + "learning_rate": null + }, + "decorators": [ + [ + "RandomInit", + { + "low": -1, + "high": 1 + } + ], + [ + "EpsilonGreedy", + { + "epsilon": 1, + "decay": 0.99, + "min_epsilon": 0.05 + } + ], + [ + "NoBuffers", + {} + ] + ] + } + } + } +} \ No newline at end of file diff --git a/src/QD_MARL/configs/hpc/with_sets/battlefield_hpc_pyribs_random.json b/src/QD_MARL/configs/hpc/with_sets/battlefield_hpc_pyribs_random.json new file mode 100644 index 000000000..93bf74074 --- /dev/null +++ b/src/QD_MARL/configs/hpc/with_sets/battlefield_hpc_pyribs_random.json @@ -0,0 +1,141 @@ +{ + "hpc": true, + "environment": { + "map_size": 80, + "minimap_mode": true, + "step_reward": -0.005, + "dead_penalty": -0.1, + "attack_penalty": -0.1, + "attack_opponent_reward": 0.9, + "max_cycles": 500, + "extra_features": false, + "render_mode": null + }, + "training": { + "gamma": 0.9, + "episodes": 400, + "jobs": 12, + "generations": 60 + }, + "statistics": { + "agent": { + "type": "quantile", + "params": { + "q": 0.7, + "method": "midpoint" + } + }, + "set": { + "type": "mean", + "params": {} + } + }, + "sets": "random", + "team_to_optimize": "blue", + "observation": "34_new", + "manual_policy": null, + "n_agents": 12, + "n_sets": 1, + "me_config": { + "me": { + "name": "map_elites.MapElites", + "__name_2__": "MapElites_Pyribs", + "__name_3__": "MapElitesCMA_pyRibs", + "kwargs": { + "me_type": "MapElites_pyRibs", + "selection_type": "random", + "seed": 5, + "map_size": [ + 7, + 7 + ], + "cx_prob": 0.4, + "init_pop_size": 36, + "map_bounds": [ + [ + 0, + 7 + ], + [ + 0, + 7 + ] + ], + "batch_pop": 12, + "maximize": "True", + "restart_rule": 1, + "archive": "Grid", + "solution_dim": 2, + "bins": 50, + "sliding_bins": [ + 6, + 4 + ], + "emitters": 2, + "sigma0": 1, + "coach": { + "name": "Coach Battlefield v5", + "seed": 5, + "algorithm": "EvolutionaryComputation" + }, + "bounds": { + "float": { + "type": "float", + "min": 0.1, + "max": 1.0 + }, + "input_index": { + "type": "int", + "min": 0, + "max": 34 + }, + "action": { + "type": "int", + "min": 0, + "max": 21 + } + }, + "max_depth": 7, + "cond_depth": 7, + "generations": 10, + "logdir": "logs/" + } + }, + "DecisionTree": { + "gamma": 0 + }, + "ConditionFactory": { + "type": "orthogonal", + "n_inputs": 34 + }, + "QLearningLeafFactory": { + "kwargs": { + "leaf_params": { + "n_actions": 21, + "learning_rate": null + }, + "decorators": [ + [ + "RandomInit", + { + "low": -1, + "high": 1 + } + ], + [ + "EpsilonGreedy", + { + "epsilon": 1, + "decay": 0.99, + "min_epsilon": 0.05 + } + ], + [ + "NoBuffers", + {} + ] + ] + } + } + } +} \ No newline at end of file diff --git a/src/QD_MARL/configs/local/battlefield.json b/src/QD_MARL/configs/local/battlefield.json new file mode 100644 index 000000000..4218f559d --- /dev/null +++ b/src/QD_MARL/configs/local/battlefield.json @@ -0,0 +1,196 @@ +{ + "hpc": false, + "grammar": { + "root": [ + "condition", + "leaf" + ], + "input_index": { + "start": 0, + "stop": 34, + "step": 1, + "dtype": "int" + }, + "float": { + "start": 0.1, + "stop": 1, + "step": 0.1, + "dtype": "float" + } + }, + "conditions": { + "type": "orthogonal" + }, + "leaves": { + "params": { + "n_actions": 21, + "learning_rate": null + }, + "decorators": [ + [ + "RandomInit", + { + "low": -1, + "high": 1 + } + ], + [ + "EpsilonGreedy", + { + "epsilon": 1, + "decay": 0.99, + "min_epsilon": 0.05 + } + ], + [ + "NoBuffers", + {} + ] + ] + }, + "environment": { + "map_size": 80, + "minimap_mode": true, + "step_reward": -0.005, + "dead_penalty": -0.1, + "attack_penalty": -0.1, + "attack_opponent_reward": 0.9, + "max_cycles": 500, + "extra_features": false, + "render_mode": null + }, + "training": { + "gamma": 0.9, + "episodes": 200, + "jobs": 12, + "generations": 100 + }, + "coach": { + "name": "Coach Battlefield v5", + "pop_size": 24, + "agents": 12, + "seed": 5, + "algorithm": "EvolutionaryComputation" + }, + "statistics": { + "agent": { + "type": "quantile", + "params": { + "q": 0.7, + "method": "midpoint" + } + }, + "set": { + "type": "mean", + "params": {} + } + }, + "sets": "random", + "team_to_optimize": "blue", + "observation": "34_new", + "manual_policy": null, + "n_agents": 12, + "n_sets": 1, + "me_config": { + "me": { + "name": "map_elites.MapElites", + "__name_2__": "MapElites_Pyribs", + "__name_3__": "MapElitesCMA_pyRibs", + "kwargs": { + "me_type": "MapElitesCMA_pyRibs", + "selection_type": "random", + "seed": 5, + "map_size": [ + 10, + 10 + ], + "cx_prob": 0.5, + "init_pop_size": 120, + "map_bounds": [ + [ + 0, + 10 + ], + [ + 0, + 10 + ] + ], + "batch_pop": 12, + "maximize": "True", + "restart_rule": 3, + "archive": "Grid", + "solution_dim": 2, + "bins": 50, + "sliding_bins": [ + 6, + 4 + ], + "emitters": 5, + "sigma0": 1, + "bounds": { + "float": { + "type": "float", + "min": -10, + "max": 10 + }, + "input_index": { + "type": "int", + "min": 0, + "max": 34 + }, + "action": { + "type": "int", + "min": 0, + "max": 21 + } + }, + "max_depth": 10, + "cond_depth": 10, + "generations": 10, + "logdir": "logs/" + } + }, + "DecisionTree": { + "gamma": 0 + }, + "ConditionFactory": { + "type": "orthogonal", + "n_inputs": 34 + }, + "QLearningLeafFactory": { + "kwargs": { + "leaf_params": { + "n_actions": 21, + "learning_rate": 0.1 + }, + "decorators": [ + [ + "RandomInit", + { + "low": -1, + "high": 1 + } + ], + [ + "EpsilonGreedy", + { + "epsilon": 0.05, + "decay": 0.999 + } + ], + [ + "NoBuffers", + {} + ], + [ + "QLambda", + { + "decay": 0.81 + } + ] + ] + } + } + } +} \ No newline at end of file diff --git a/src/QD_MARL/configs/local/battlefield_test.json b/src/QD_MARL/configs/local/battlefield_test.json new file mode 100644 index 000000000..3d0127698 --- /dev/null +++ b/src/QD_MARL/configs/local/battlefield_test.json @@ -0,0 +1,109 @@ +{ + "hpc": false, + "environment": { + "map_size": 80, + "minimap_mode": true, + "step_reward": -0.005, + "dead_penalty": -0.1, + "attack_penalty": -0.1, + "attack_opponent_reward": 0.9, + "max_cycles": 500, + "extra_features": false, + "render_mode": null + }, + "training": { + "gamma": 0.9, + "episodes": 1, + "jobs": 12, + "generations": 40 + }, + "statistics": { + "agent": { + "type": "quantile", + "params": {"q": 0.7, "method": "midpoint"} + }, + "set": { + "type": "mean", + "params": {} + } + }, + "sets": "random", + "team_to_optimize": "blue", + "observation": "34_new", + "manual_policy": null, + + "n_agents": 12, + "n_sets": 1, + + "me_config":{ + "me":{ + "name": "map_elites.MapElites", + "__name_2__": "MapElites_Pyribs", + "__name_3__": "MapElitesCMA_pyRibs", + "kwargs": { + "me_type": "MapElitesCMA_pyRibs", + "selection_type": "coach", + "seed": 5, + "map_size": [5, 5], + "cx_prob": 0.3, + "init_pop_size": 24, + "map_bounds": [[0, 5], [0, 5]], + "batch_pop": 12, + "maximize": "True", + "restart_rule": 1, + "archive": "Grid", + "solution_dim": 2, + "bins": 50, + "sliding_bins": [6, 4], + "emitters": 5, + "sigma0": 1, + "coach": { + "name": "Coach Battlefield v5", + "seed": 5, + "algorithm": "EvolutionaryComputation" + }, + "bounds": { + "float": { + "type": "float", + "min": 0.1, + "max": 1.0 + }, + "input_index": { + "type": "int", + "min": 0, + "max": 34 + }, + "action": { + "type": "int", + "min": 0, + "max": 21 + } + }, + "max_depth": 5, + "cond_depth": 5, + "generations": 10, + "logdir": "logs/" + } + }, + "DecisionTree": { + "gamma": 0 + }, + "ConditionFactory": { + "type": "orthogonal", + "n_inputs": 34 + }, + "QLearningLeafFactory":{ + "kwargs": { + "leaf_params": { + "n_actions": 21, + "learning_rate": null + }, + "decorators": [ + ["RandomInit", {"low": -1, "high": 1}], + ["EpsilonGreedy", {"epsilon": 1, "decay": 0.99, "min_epsilon": 0.05}], + ["NoBuffers", {}] + ] + } + } + } +} \ No newline at end of file diff --git a/src/QD_MARL/configs/local/battlefield_test_per_ind.json b/src/QD_MARL/configs/local/battlefield_test_per_ind.json new file mode 100644 index 000000000..a7d97c833 --- /dev/null +++ b/src/QD_MARL/configs/local/battlefield_test_per_ind.json @@ -0,0 +1,156 @@ +{ + "hpc": false, + "grammar": { + "root": ["condition", "leaf"], + "input_index": { + "start": 0, + "stop": 34, + "step": 1, + "dtype": "int" + }, + "float": { + "start": 0.1, + "stop": 1, + "step": 0.1, + "dtype": "float" + } + }, + "conditions": { + "type": "orthogonal" + }, + "leaves": { + "params": { + "n_actions": 21, + "learning_rate": null + }, + "decorators": [ + [ + "RandomInit", + { + "low": -1, + "high": 1 + } + + ], + [ + "EpsilonGreedy", + { + "epsilon": 1, + "decay": 0.99, + "min_epsilon": 0.05 + } + ], + [ + "NoBuffers", + {} + ] + ] + }, + "environment": { + "map_size": 80, + "minimap_mode": true, + "step_reward": -0.005, + "dead_penalty": -0.1, + "attack_penalty": -0.1, + "attack_opponent_reward": 0.9, + "max_cycles": 10, + "extra_features": false, + "render_mode": null + }, + "training": { + "gamma": 0.9, + "episodes": 4, + "jobs": 12, + "generations": 25 + }, + "statistics": { + "agent": { + "type": "quantile", + "params": {"q": 0.99, "method": "midpoint"} + }, + "set": { + "type": "mean", + "params": {} + } + }, + "sets": "random", + "team_to_optimize": "blue", + "observation": "34_new", + "manual_policy": null, + + "n_agents": 12, + "n_sets": 1, + + "me_config":{ + "me":{ + "name": "map_elites.MapElites", + "__name_2__": "MapElites_Pyribs", + "__name_3__": "MapElitesCMA_pyRibs", + "kwargs": { + "me_type": "MapElitesCMA_pyRibs", + "selection_type": "random", + "seed": 5, + "map_size": [10, 10], + "cx_prob": 0.3, + "init_pop_size": 12, + "map_bounds": [[0, 10], [0, 10]], + "batch_pop": 12, + "maximize": "True", + "restart_rule": 1, + "archive": "Grid", + "solution_dim": 2, + "bins": 50, + "sliding_bins": [6, 4], + "emitters": 2, + "sigma0": 1, + "coach": { + "name": "Coach Battlefield v5", + "seed": 5, + "algorithm": "EvolutionaryComputation" + }, + "bounds": { + "float": { + "type": "float", + "min": -10, + "max": 10 + }, + "input_index": { + "type": "int", + "min": 0, + "max": 34 + }, + "action": { + "type": "int", + "min": 0, + "max": 21 + } + }, + "max_depth": 10, + "cond_depth": 10, + "generations": 10, + "logdir": "logs/" + } + }, + "DecisionTree": { + "gamma": 0 + }, + "ConditionFactory": { + "type": "orthogonal", + "n_inputs": 34 + }, + "QLearningLeafFactory":{ + "kwargs": { + "leaf_params": { + "n_actions": 21, + "learning_rate": null + }, + "decorators": [ + ["RandomInit", {"low": -1, "high": 1}], + ["EpsilonGreedy", {"epsilon": 0.1, "decay": 0.999, "min_epsilon": 0.07}], + ["NoBuffers", {}] + ] + } + } + } +} + diff --git a/src/QD_MARL/configs/local/test_config.json b/src/QD_MARL/configs/local/test_config.json new file mode 100644 index 000000000..1f66dc177 --- /dev/null +++ b/src/QD_MARL/configs/local/test_config.json @@ -0,0 +1,189 @@ +{ + "hpc": false, + "grammar": { + "root": [ + "condition", + "leaf" + ], + "input_index": { + "start": 0, + "stop": 34, + "step": 1, + "dtype": "int" + }, + "float": { + "start": 0.1, + "stop": 1, + "step": 0.1, + "dtype": "float" + } + }, + "conditions": { + "type": "orthogonal" + }, + "leaves": { + "params": { + "n_actions": 21, + "learning_rate": null + }, + "decorators": [ + [ + "RandomInit", + { + "low": -1, + "high": 1 + } + ], + [ + "EpsilonGreedy", + { + "epsilon": 1, + "decay": 0.99, + "min_epsilon": 0.05 + } + ], + [ + "NoBuffers", + {} + ] + ] + }, + "environment": { + "map_size": 80, + "minimap_mode": true, + "step_reward": -0.005, + "dead_penalty": -0.1, + "attack_penalty": -0.1, + "attack_opponent_reward": 0.9, + "max_cycles": 1000, + "extra_features": false, + "render_mode": null + }, + "training": { + "gamma": 0.9, + "episodes": 10, + "jobs": 12, + "generations": 40 + }, + "statistics": { + "agent": { + "type": "quantile", + "params": { + "q": 0.7, + "method": "midpoint" + } + }, + "set": { + "type": "mean", + "params": {} + } + }, + "sets": "random", + "team_to_optimize": "blue", + "observation": "34_new", + "manual_policy": null, + "n_agents": 12, + "n_sets": 1, + "me_config": { + "me": { + "name": "map_elites.MapElites", + "__name_2__": "MapElites_Pyribs", + "__name_3__": "MapElitesCMA_pyRibs", + "kwargs": { + "me_type": "MapElitesCMA_pyRibs", + "selection_type": "coach", + "seed": 5, + "map_size": [ + 10, + 10 + ], + "cx_prob": 0.3, + "init_pop_size": 120, + "map_bounds": [ + [ + 0, + 10 + ], + [ + 0, + 10 + ] + ], + "batch_pop": 12, + "maximize": "True", + "restart_rule": 1, + "archive": "Grid", + "solution_dim": 2, + "bins": 50, + "sliding_bins": [ + 6, + 4 + ], + "emitters": 10, + "sigma0": 1, + "coach": { + "name": "Coach Battlefield v5", + "seed": 5, + "algorithm": "EvolutionaryComputation" + }, + "bounds": { + "float": { + "type": "float", + "min": -10, + "max": 10 + }, + "input_index": { + "type": "int", + "min": 0, + "max": 34 + }, + "action": { + "type": "int", + "min": 0, + "max": 21 + } + }, + "max_depth": 20, + "cond_depth": 20, + "generations": 10, + "logdir": "logs/" + } + }, + "DecisionTree": { + "gamma": 0 + }, + "ConditionFactory": { + "type": "orthogonal", + "n_inputs": 34 + }, + "QLearningLeafFactory": { + "kwargs": { + "leaf_params": { + "n_actions": 21, + "learning_rate": null + }, + "decorators": [ + [ + "RandomInit", + { + "low": -1, + "high": 1 + } + ], + [ + "EpsilonGreedy", + { + "epsilon": 1, + "decay": 0.99, + "min_epsilon": 0.05 + } + ], + [ + "NoBuffers", + {} + ] + ] + } + } + } +} \ No newline at end of file diff --git a/src/QD_MARL/decisiontrees/__init__.py b/src/QD_MARL/decisiontrees/__init__.py index a9f19ed68..61afb9771 100644 --- a/src/QD_MARL/decisiontrees/__init__.py +++ b/src/QD_MARL/decisiontrees/__init__.py @@ -1,6 +1,10 @@ #!/usr/bin/env python from .leaves import QLearningLeafFactory, ConstantLeafFactory, Leaf, DummyLeafFactory, PPOLeaf, PPOLeafFactory from .nodes import * +<<<<<<< HEAD from .conditions import ConditionFactory, OrthogonalCondition, Condition, DifferentiableOrthogonalCondition +======= +from .conditions import ConditionFactory, OrthogonalCondition, Condition +>>>>>>> aca3e01 (merged from private repo) from .trees import DecisionTree, RLDecisionTree, DifferentiableDecisionTree, FastDecisionTree from .factories import * diff --git a/src/QD_MARL/decisiontrees/__pycache__/__init__.cpython-38.pyc b/src/QD_MARL/decisiontrees/__pycache__/__init__.cpython-38.pyc new file mode 100644 index 000000000..10265cf8b Binary files /dev/null and b/src/QD_MARL/decisiontrees/__pycache__/__init__.cpython-38.pyc differ diff --git a/src/QD_MARL/decisiontrees/__pycache__/conditions.cpython-38.pyc b/src/QD_MARL/decisiontrees/__pycache__/conditions.cpython-38.pyc new file mode 100644 index 000000000..025639a8e Binary files /dev/null and b/src/QD_MARL/decisiontrees/__pycache__/conditions.cpython-38.pyc differ diff --git a/src/QD_MARL/decisiontrees/__pycache__/factories.cpython-38.pyc b/src/QD_MARL/decisiontrees/__pycache__/factories.cpython-38.pyc new file mode 100644 index 000000000..eeda3316a Binary files /dev/null and b/src/QD_MARL/decisiontrees/__pycache__/factories.cpython-38.pyc differ diff --git a/src/QD_MARL/decisiontrees/__pycache__/leaves.cpython-38.pyc b/src/QD_MARL/decisiontrees/__pycache__/leaves.cpython-38.pyc new file mode 100644 index 000000000..c2ce2f061 Binary files /dev/null and b/src/QD_MARL/decisiontrees/__pycache__/leaves.cpython-38.pyc differ diff --git a/src/QD_MARL/decisiontrees/__pycache__/nodes.cpython-38.pyc b/src/QD_MARL/decisiontrees/__pycache__/nodes.cpython-38.pyc new file mode 100644 index 000000000..d2bf35b39 Binary files /dev/null and b/src/QD_MARL/decisiontrees/__pycache__/nodes.cpython-38.pyc differ diff --git a/src/QD_MARL/decisiontrees/__pycache__/trees.cpython-38.pyc b/src/QD_MARL/decisiontrees/__pycache__/trees.cpython-38.pyc new file mode 100644 index 000000000..773c2b0b5 Binary files /dev/null and b/src/QD_MARL/decisiontrees/__pycache__/trees.cpython-38.pyc differ diff --git a/src/QD_MARL/decisiontrees/conditions.py b/src/QD_MARL/decisiontrees/conditions.py index b2551f1cb..8c6a2c696 100644 --- a/src/QD_MARL/decisiontrees/conditions.py +++ b/src/QD_MARL/decisiontrees/conditions.py @@ -10,8 +10,11 @@ :license: MIT, see LICENSE for more details. """ import abc +<<<<<<< HEAD # import torch import numpy as np +======= +>>>>>>> aca3e01 (merged from private repo) from copy import deepcopy from .nodes import Node @@ -77,7 +80,11 @@ def get_output(self, input_): :input_: A 1D numpy array :returns: A 1D numpy array """ +<<<<<<< HEAD # assert len(input_.shape) == 1, "Only 1D arrays are currently supported" +======= + assert len(input_.shape) == 1, "Only 1D arrays are currently supported" +>>>>>>> aca3e01 (merged from private repo) if self.get_branch(input_) == Condition.BRANCH_LEFT: return self._left.get_output(input_) else: @@ -99,6 +106,7 @@ def empty_buffers(self): self._left.empty_buffers() self._right.empty_buffers() +<<<<<<< HEAD def copy(self): """ Returns a copy of itself @@ -106,13 +114,25 @@ def copy(self): new = Condition(self.get_left().copy(), self.get_right().copy()) return new +======= + def copy_structure(self): + """ + Returns a copy of the structure of itself + """ + new = Condition(self.get_left().copy_structure(), self.get_right().copy_structure()) + return new + +>>>>>>> aca3e01 (merged from private repo) def deep_copy(self): """ Returns a deep copy of itself """ return deepcopy(self) +<<<<<<< HEAD +======= +>>>>>>> aca3e01 (merged from private repo) class OrthogonalCondition(Condition): """ This class implements orthogonal conditions for the decision tree. @@ -131,6 +151,10 @@ def __init__(self, feature_idx, split_value, left=None, right=None): :right: The right node. Default: None. """ Condition.__init__(self, left, right) +<<<<<<< HEAD +======= + +>>>>>>> aca3e01 (merged from private repo) self._feature_idx = feature_idx self._split_value = split_value @@ -147,9 +171,12 @@ def get_branch(self, inputs): return inputs[self._feature_idx] < self._split_value return inputs[:, self._feature_idx] < self._split_value +<<<<<<< HEAD def get_code(self): return f"if input_[{self._feature_idx}] < {self._split_value}:" +======= +>>>>>>> aca3e01 (merged from private repo) @staticmethod def get_trainable_parameters(): """ @@ -169,15 +196,25 @@ def check_params(params): assert len(params) >= 2, \ "This type of condition requires 2 parameters." +<<<<<<< HEAD @classmethod def create_from_params(cls, params): +======= + @staticmethod + def create_from_params(params): +>>>>>>> aca3e01 (merged from private repo) """ Creates a condition from its parameters :params: A list of params (int or float) """ +<<<<<<< HEAD cls.check_params(params) return cls(int(params[0]), float(params[params[0]])) +======= + OrthogonalCondition.check_params(params) + return OrthogonalCondition(int(params[0]), float(params[1])) +>>>>>>> aca3e01 (merged from private repo) def set_params_from_list(self, params): """ @@ -186,9 +223,15 @@ def set_params_from_list(self, params): :params: A list of params (int or float) """ +<<<<<<< HEAD self.check_params(params) self._feature_idx = int(params[0]) self._split_value = float(params[params[0]]) +======= + OrthogonalCondition.check_params(params) + self._feature_idx = int(params[0]) + self._split_value = float(params[1]) +>>>>>>> aca3e01 (merged from private repo) def get_feature_idx(self): return self._feature_idx @@ -203,6 +246,7 @@ def set_split_value(self, value): self._split_value = value def __str__(self): +<<<<<<< HEAD return f"x_{self._feature_idx} < {self._split_value}" def copy(self): @@ -621,6 +665,28 @@ def copy(self): ####################################################################### # Factory # ####################################################################### +======= + return f"x_{self._feature_idx} < {round(self._split_value, 1)}" + #return f"x_{self._feature_idx} < {self._split_value}" # Original + + def copy_structure(self): + """ + Returns a copy of the structure of itself + """ + new = OrthogonalCondition( + self.get_feature_idx(), + self.get_split_value(), + self.get_left().copy_structure(), + self.get_right().copy_structure() + ) + return new + + def deep_copy(self): + """ + Returns a deep copy of itself + """ + return deepcopy(self) +>>>>>>> aca3e01 (merged from private repo) class ConditionFactory: @@ -628,6 +694,7 @@ class ConditionFactory: A factory for conditions """ ORTHOGONAL = "orthogonal" +<<<<<<< HEAD DIFFERENTIABLE = "differentiable" OBLIQUE2 = "2vars" OBLIQUE2OFFSET = "2varswoffset" @@ -642,11 +709,20 @@ class ConditionFactory: } def __init__(self, condition_type="orthogonal", n_inputs=None): +======= + + CONDITIONS = { + ORTHOGONAL: OrthogonalCondition, + } + + def __init__(self, condition_type="orthogonal"): +>>>>>>> aca3e01 (merged from private repo) """ Initializes the factory of conditions :condition_type: strings supported: - orthogonal +<<<<<<< HEAD - differentiable - 2vars - 2varswoffset @@ -655,22 +731,35 @@ def __init__(self, condition_type="orthogonal", n_inputs=None): """ self._condition_type = condition_type self._n_inputs = n_inputs if condition_type == "oblique" else None +======= + """ + self._condition_type = condition_type +>>>>>>> aca3e01 (merged from private repo) def create(self, params): """ Creates a condition :returns: A Condition """ +<<<<<<< HEAD cond = self.CONDITIONS[self._condition_type].create_from_params(params) return cond +======= + return self.CONDITIONS[self._condition_type].create_from_params(params) +>>>>>>> aca3e01 (merged from private repo) def get_trainable_parameters(self): """ Returns a list of parameters with their type (int or float). """ +<<<<<<< HEAD param_types = self.CONDITIONS[self._condition_type].get_trainable_parameters() if self._condition_type == self.OBLIQUE: return (param_types * (self._n_inputs + 1))[:self._n_inputs + 1] return param_types +======= + return self.CONDITIONS[self._condition_type].get_trainable_parameters() + +>>>>>>> aca3e01 (merged from private repo) diff --git a/src/QD_MARL/decisiontrees/factories.py b/src/QD_MARL/decisiontrees/factories.py index b5efc8cca..a73c10fb6 100644 --- a/src/QD_MARL/decisiontrees/factories.py +++ b/src/QD_MARL/decisiontrees/factories.py @@ -11,7 +11,11 @@ :license: MIT, see LICENSE for more details. """ from algorithms import OptMetaClass +<<<<<<< HEAD from processing_element import ProcessingElementFactory, PEFMetaClass +======= +from util_processing_elements.processing_element import ProcessingElementFactory, PEFMetaClass +>>>>>>> aca3e01 (merged from private repo) from decisiontrees import ConditionFactory, QLearningLeafFactory, ConstantLeafFactory, \ RLDecisionTree # TODO: Create a MetaClass for leaves' factories <23-11-21, Leonardo Lucio Custode> # diff --git a/src/QD_MARL/decisiontrees/leaves.py b/src/QD_MARL/decisiontrees/leaves.py index bf23fe1a7..c042f97f7 100644 --- a/src/QD_MARL/decisiontrees/leaves.py +++ b/src/QD_MARL/decisiontrees/leaves.py @@ -14,7 +14,11 @@ import numpy as np from .nodes import Node from copy import deepcopy +<<<<<<< HEAD import torch +======= +# import torch +>>>>>>> aca3e01 (merged from private repo) from utils.print_outputs import print_debugging @@ -734,6 +738,7 @@ def get_trainable_parameters(self): class PPOLeaf(Leaf): +<<<<<<< HEAD """A Leaf that implements PPO for discrete action spaces""" def __init__(self, n_actions): @@ -762,6 +767,37 @@ def __repr__(self): def __str__(self): return repr(self) +======= + pass +# """A Leaf that implements PPO for discrete action spaces""" + +# def __init__(self, n_actions): +# """ +# Initializes the leaf + +# :n_actions: The number of actions that the leaf can perform +# """ +# Leaf.__init__(self) + +# self._n_actions = n_actions +# self.actions = torch.rand(n_actions, requires_grad=True) +# self.sm = torch.nn.Softmax() + +# def get_output(self, input_): +# return self.sm(self.actions), self + +# def get_params(self): +# return self.actions + +# def discretize(self): +# return ConstantLeaf(torch.argmax(self.actions).detach().numpy()) + +# def __repr__(self): +# return str(self.sm(self.actions)) + +# def __str__(self): +# return repr(self) +>>>>>>> aca3e01 (merged from private repo) class PPOLeafFactory(): diff --git a/src/QD_MARL/decisiontrees/trees.py b/src/QD_MARL/decisiontrees/trees.py index 7a10c9be0..19068492b 100644 --- a/src/QD_MARL/decisiontrees/trees.py +++ b/src/QD_MARL/decisiontrees/trees.py @@ -11,17 +11,28 @@ """ # import torch import numpy as np +<<<<<<< HEAD import torch +======= +# import torch +>>>>>>> aca3e01 (merged from private repo) from .nodes import Node from collections import deque from .conditions import Condition from .leaves import Leaf +<<<<<<< HEAD from processing_element import ProcessingElement from utils.print_outputs import * from algorithms.individuals import IndividualGP +======= +from util_processing_elements.processing_element import ProcessingElement +from utils.print_outputs import * +from algorithms.individuals import IndividualGP +from copy import deepcopy +>>>>>>> aca3e01 (merged from private repo) class DecisionTree: """ @@ -201,8 +212,15 @@ def set_reward(self, reward): #TODO fix set rewards and why is called in Decisio self._rewards.appendleft(reward) if len(self._last_leaves) == 2: leaf = self._last_leaves.pop() +<<<<<<< HEAD + leaf.set_reward(self._rewards.pop() + + self._gamma * self._last_leaves[0].get_value()) +======= + # print_debugging(type(leaf), type(self._last_leaves), type(self._last_leaves.pop()), self._last_leaves.pop()) leaf.set_reward(self._rewards.pop() + self._gamma * self._last_leaves[0].get_value()) + # print_debugging(leaf.set_reward(self._rewards.pop() + self._gamma * self._last_leaves[0].get_value())) +>>>>>>> aca3e01 (merged from private repo) def set_reward_end_of_episode(self): """ @@ -266,8 +284,12 @@ def new_episode(self): self._init_buffers() def deep_copy(self): +<<<<<<< HEAD dt = RLDecisionTree(self.get_root().deep_copy(), self._gamma) return dt +======= + return deepcopy(self) +>>>>>>> aca3e01 (merged from private repo) class FastDecisionTree(RLDecisionTree): @@ -330,6 +352,7 @@ def get_output(self, input_): def discretize(self): return RLDecisionTree(self._root.discretize(), 0) +<<<<<<< HEAD def get_splits_outputs(self, input_): outputs = [] fringe = [self._root] @@ -342,6 +365,20 @@ def get_splits_outputs(self, input_): fringe.append(cur.get_left()) fringe.append(cur.get_right()) return torch.Tensor(outputs) +======= + # def get_splits_outputs(self, input_): + # outputs = [] + # fringe = [self._root] + + # while len(fringe) > 0: + # cur = fringe.pop(0) + + # if not isinstance(cur, Leaf): + # outputs.append(cur.get_coefficient(input_)) + # fringe.append(cur.get_left()) + # fringe.append(cur.get_right()) + # return torch.Tensor(outputs) +>>>>>>> aca3e01 (merged from private repo) def get_params(self): params = [] diff --git a/src/QD_MARL/hpc_script.sh b/src/QD_MARL/hpc_script.sh new file mode 100644 index 000000000..498b37211 --- /dev/null +++ b/src/QD_MARL/hpc_script.sh @@ -0,0 +1,12 @@ +#!/bin/bash + +#PBS -l select=2:ncpus=10:mem=64gb + +#set max execution time +#PBS -l walltime=4:00:00 + +#imposta la coda di esecuzione +#PBS -q common_cpuQ + +source pyenv-marl-qd/bin/activate +python3 python3 src/QD_MARL/marl_qd_launcher.py src/QD_MARL/configs/battlefield.json 4 \ No newline at end of file diff --git a/src/QD_MARL/marl_qd_launcher.py b/src/QD_MARL/marl_qd_launcher.py index d1efe56c9..8635a70e0 100644 --- a/src/QD_MARL/marl_qd_launcher.py +++ b/src/QD_MARL/marl_qd_launcher.py @@ -1,11 +1,19 @@ +<<<<<<< HEAD import os import sys +======= +import importlib +import os +import sys +import gc +>>>>>>> aca3e01 (merged from private repo) sys.path.append(".") import random import time from copy import deepcopy from math import sqrt +<<<<<<< HEAD from test_environments import * import numpy as np import pettingzoo @@ -20,6 +28,31 @@ from evaluations import * from decisiontrees.leaves import * import get_interpretability +======= + +import numpy as np +import pettingzoo +from agents.agents import * +from algorithms import ( + grammatical_evolution, + individuals, + map_elites, + map_elites_Pyribs, + mapElitesCMA_pyRibs, +) +from decisiontrees import ( + ConditionFactory, + DecisionTree, + QLearningLeafFactory, + RLDecisionTree, +) +from decisiontrees.leaves import * +from magent2.environments import battlefield_v5 +from training.evaluations import * +from utils import * + +# from memory_profiler import profile +>>>>>>> aca3e01 (merged from private repo) def get_map_elite(config): @@ -29,11 +62,16 @@ def get_map_elite(config): :config: a dictionary containing all the parameters :log_path: a path to the log directory """ +<<<<<<< HEAD # Setup GE +======= + # Setup ME +>>>>>>> aca3e01 (merged from private repo) me_config = config["me"]["kwargs"] # Build classes of the operators from the config file me_config["c_factory"] = ConditionFactory() me_config["l_factory"] = QLearningLeafFactory( +<<<<<<< HEAD config["QLearningLeafFactory"]["kwargs"]["leaf_params"], config["QLearningLeafFactory"]["kwargs"]["decorators"] ) @@ -43,6 +81,28 @@ def get_map_elite(config): # me = mapElitesCMA_pyRibs.MapElitesCMA_pyRibs(**me_config) return me +======= + config["QLearningLeafFactory"]["kwargs"]["leaf_params"], + config["QLearningLeafFactory"]["kwargs"]["decorators"], + ) + if me_config["me_type"] == "Map_elites": + me = map_elites.MapElites(**me_config) + elif me_config["me_type"] == "MapElites_pyRibs": + me = map_elites_Pyribs.MapElites_Pyribs(**me_config) + elif me_config["me_type"] == "MapElitesCMA_pyRibs": + me = mapElitesCMA_pyRibs.MapElitesCMA_pyRibs(**me_config) + print_configs("ME type:", me_config["me_type"]) + if me_config["selection_type"] == "coach": + coach_config = me_config["coach"] + coach_config["pop_size"] = me_config["init_pop_size"] + coach_config["batch_size"] = me_config["batch_pop"] + coach = CoachAgent(coach_config, me) + else: + coach = None + print_configs("ME selection type:", me_config["selection_type"]) + return me, coach + +>>>>>>> aca3e01 (merged from private repo) def pretrain_tree(tree, rb): """ @@ -62,6 +122,7 @@ def pretrain_tree(tree, rb): tree.set_reward(r) tree.set_reward_end_of_episode() return tree +<<<<<<< HEAD def produce_tree(config, log_path=None, extra_log=False, debug=False, manual_policy=False): # Setup GE @@ -75,11 +136,35 @@ def produce_tree(config, log_path=None, extra_log=False, debug=False, manual_pol # ge[i] -> set_i ge = grammatical_evolution.GrammaticalEvolution(**ge_config, logdir=gram_dir) # setup log files +======= + + +def produce_tree( + config, log_path=None, extra_log=False, debug=False, manual_policy=False +): + + # Setup ME + me_config = config["me_config"] + me_config["me"]["kwargs"]["log_path"] = log_path + + me, coach = get_map_elite(me_config) + coach_index = None + selection_type = me_config["me"]["kwargs"]["selection_type"] + #setup job manager + map_ = utils.get_map(config["training"]["jobs"], debug) + number_of_agents = config["n_agents"] + number_of_sets = config["n_sets"] + population_size = me_config["me"]["kwargs"]["init_pop_size"] + number_of_teams = population_size // number_of_agents + + # setup log files +>>>>>>> aca3e01 (merged from private repo) evolution_dir = os.path.join(log_path, "Evolution_dir") os.makedirs(evolution_dir , exist_ok=False) trees_dir = os.path.join(log_path, "Trees_dir") os.makedirs(trees_dir , exist_ok=False) +<<<<<<< HEAD # Retrieve the map function from utils map_ = utils.get_map(config["training"]["jobs"], debug) @@ -177,13 +262,167 @@ def produce_tree(config, log_path=None, extra_log=False, debug=False, manual_pol return trees +======= + # setup sets + if config["sets"] == "random": + sets = [[] for _ in range(number_of_sets)] + set_full = [False for _ in range(number_of_sets)] + agents = [i for i in range(number_of_agents)] + + agents_per_set = number_of_agents // number_of_sets + surplus = number_of_agents // number_of_sets + + while not all(set_full): + set_index = random.randint(0, number_of_sets -1) + if not set_full[set_index]: + random_index = random.randint(0, len(agents) -1) + sets[set_index].append(agents[random_index]) + del agents[random_index] + if len(sets[set_index]) == agents_per_set: + set_full[set_index] = True + + if surplus > 0: + while len(agents) != 0: + set_index = random.randint(0, number_of_sets -1) + random_index = random.randint(0, len(agents) -1) + sets[set_index].append(agents[random_index]) + del agents[random_index] + + for set_ in sets: + set_.sort() + + config["sets"] = sets + print_info("Sets: ", config["sets"]) + + with open(os.path.join(log_path, "sets.log"), "w") as f: + f.write(f"{config['sets']}\n") + + # Initialize best individual for each agent + best = [None for _ in range(number_of_agents)] + best_fit = [-float("inf") for _ in range(number_of_agents)] + new_best = [False for _ in range(number_of_agents)] + + for i in range(number_of_agents): + #print(f"{gen: <10} agent_{i: <4} {agent_min[i]: <10.2f} {agent_mean[i]: <10.2f} {agent_max[i]: <10.2f} {agent_std[i]: <10.2f}") + with open(os.path.join(evolution_dir, f"agent_{i}.log"), "a") as f: + f.write(f"Generation,Min,Mean,Max,Std\n") + for i in range(number_of_sets): + with open(os.path.join(evolution_dir, f"set_{i}.txt"), "a") as f: + f.write(f"Generation,Min,Mean,Max,Std\n") + with open(os.path.join(evolution_dir, f"bests.txt"), "a") as f: + f.write(f"Generation,Min,Mean,Max,Std\n") + + print_info(f"{'Generation' : <10} {'Set': <10} {'Min': <10} {'Mean': <10} {'Max': <10} {'Std': <10}") + sets_fitness = [ [] for _ in range(number_of_sets)] + #start training process + for gen in range(-1,config["training"]["generations"]): + config["generation"] = gen + me_pop = [] + #Ask for a new population + if gen >=0: + if selection_type == "coach": + coach_index = coach.get_squad(number_of_teams) + for i in range(number_of_teams): + me_pop += me.ask(coach_index[0] if coach_index is not None else None) + else: + me_pop = me.ask(None) + + trees = [[RLDecisionTree(me_pop[i], config["training"]["gamma"]) for i in range(population_size)] for _ in range(number_of_sets)] + #Form the groups + squads = [[trees[j][i] for j in range(number_of_sets)] for i in range(population_size)] + return_values = map_(evaluate, squads, config) + + agents_fitness = [ [] for _ in range(number_of_agents)] + agents_tree = [ [] for _ in range(number_of_agents)] + + # Store trees and fitnesses + for values in return_values: + for i in range(number_of_agents): + agents_fitness[i].append(values[0][i]) + agents_tree[i].append(values[1][i]) + + # Check whether the best, for each agent, has to be updated + amax = [np.argmax(agents_fitness[i]) for i in range(number_of_agents)] + max_ = [agents_fitness[i][amax[i]] for i in range(number_of_agents)] + + for i in range(number_of_agents): + if max_[i] > best_fit[i]: + best_fit[i] = max_[i] + best[i] = agents_tree[i][amax[i]] + new_best[i] = True + + tree_text = f"{best[i]}" + utils.save_tree(best[i], trees_dir, f"best_agent_{i}") + with open(os.path.join(trees_dir, f"best_agent_{i}.log"), "w") as f: + f.write(tree_text) + # Calculate fitnesses for each set + sets_fitness = [ [] for _ in range(number_of_sets)] + for index, set_ in enumerate(config["sets"]): + + set_agents_fitnesses = [] + for agent in set_: + set_agents_fitnesses.append(agents_fitness[agent]) + + set_agents_fitnesses = np.array(set_agents_fitnesses) + + # Calculate fitness for each individual in the set + sets_fitness[index] = [getattr(np, config['statistics']['set']['type'])(a=set_agents_fitnesses[:, i], **config['statistics']['set']['params']) for i in range(set_agents_fitnesses.shape[1])] + + sets_fitness = np.array(sets_fitness).flatten() + me.tell(sets_fitness) + me.plot_archive(gen) + # Compute stats for each agent + agent_min = [np.min(agents_fitness[i]) for i in range(number_of_agents)] + agent_mean = [np.mean(agents_fitness[i]) for i in range(number_of_agents)] + agent_max = [np.max(agents_fitness[i]) for i in range(number_of_agents)] + agent_std = [np.std(agents_fitness[i]) for i in range(number_of_agents)] + + for i in range(number_of_agents): + #print(f"{gen: <10} agent_{i: <4} {agent_min[i]: <10.2f} {agent_mean[i]: <10.2f} {agent_max[i]: <10.2f} {agent_std[i]: <10.2f}") + with open(os.path.join(evolution_dir, f"agent_{i}.log"), "a") as f: + f.write(f"{gen} {agent_min[i]} {agent_mean[i]} {agent_max[i]} {agent_std[i]}\n") + + new_best[i] = False + + # Compute states for each set + set_min = [np.min(sets_fitness[i]) for i in range(number_of_sets)] + set_mean = [np.mean(sets_fitness[i]) for i in range(number_of_sets)] + set_max = [np.max(sets_fitness[i]) for i in range(number_of_sets)] + set_std = [np.std(sets_fitness[i]) for i in range(number_of_sets)] + + for i in range(number_of_sets): + print_info(f"{gen: <10} set_{i: <4} {set_min[i]: <10.2f} {set_mean[i]: <10.2f} {set_max[i]: <10.2f} {set_std[i]: <10.2f}") + with open(os.path.join(evolution_dir, f"set_{i}.txt"), "a") as f: + f.write(f"{gen},{set_min[i]},{set_mean[i]},{set_max[i]},{set_std[i]}\n") + + # Compute stats for the bests + best_min = np.min(best_fit) + best_mean = np.mean(best_fit) + best_max = np.max(best_fit) + best_std = np.std(best_fit) + + with open(os.path.join(evolution_dir, f"bests.txt"), "a") as f: + f.write(f"{gen},{best_min},{best_mean},{best_max},{best_std}\n") + + plot_log(evolution_dir, f"bests.txt", gen) + for i in range(number_of_sets): + plot_log(evolution_dir, f"set_{i}.txt", gen) + return best +>>>>>>> aca3e01 (merged from private repo) if __name__ == "__main__": import argparse import json import shutil +<<<<<<< HEAD import yaml import utils +======= + + import utils + import yaml + +>>>>>>> aca3e01 (merged from private repo) parser = argparse.ArgumentParser() parser.add_argument("config", help="Path of the config file to use") parser.add_argument("--debug", action="store_true", help="Debug flag") @@ -192,6 +431,7 @@ def produce_tree(config, log_path=None, extra_log=False, debug=False, manual_pol args = parser.parse_args() print_info("Launching Quality Diversity MARL") print_configs("Environment configurations file: ", args.config) +<<<<<<< HEAD if args.debug: print_configs("DEBUG MODE") @@ -207,10 +447,33 @@ def produce_tree(config, log_path=None, extra_log=False, debug=False, manual_pol log_path = f"logs/magent_battlefield/{logdir_name}" join = lambda x: os.path.join(log_path, x) +======= + + if args.debug: + print_configs("DEBUG MODE") + + # Load the config file + config = json.load(open(args.config)) + + # Set the random seed + random.seed(args.seed) + np.random.seed(args.seed) + # Setup logging + logdir_name = utils.get_logdir_name() + if config["hpc"]: + cwd = os.getcwd() + log_path = f"{cwd}/logs/qd-marl/hpc/with_sets/{config['me_config']['me']['kwargs']['me_type']}/{config['me_config']['me']['kwargs']['selection_type']}/magent_battlefield/{logdir_name}" + else: + log_path = f"logs/qd-marl/local/with_sets/{config['me_config']['me']['kwargs']['me_type']}/{config['me_config']['me']['kwargs']['selection_type']}/magent_battlefield/{logdir_name}" + print_configs("Logs path: ", log_path) + join = lambda x: os.path.join(log_path, x) + config["log_path"] = log_path +>>>>>>> aca3e01 (merged from private repo) os.makedirs(log_path, exist_ok=False) shutil.copy(args.config, join("config.json")) with open(join("seed.log"), "w") as f: f.write(str(args.seed)) +<<<<<<< HEAD squad = produce_tree(config, log_path, args.log, args.debug) @@ -225,4 +488,24 @@ def produce_tree(config, log_path=None, extra_log=False, debug=False, manual_pol # return interpretability(best, config, log_path, index, logger) - \ No newline at end of file + +======= + + squad = produce_tree(config, log_path, args.log, args.debug) + + import logging + + logging.basicConfig( + filename=join("output.log"), + level=logging.INFO, + format="%(asctime)s %(message)s", + filemode="w", + ) + logger = logging.getLogger() + index = 0 + + for player in squad: + print_info("\n", player) + + # return interpretability(best, config, log_path, index, logger) +>>>>>>> aca3e01 (merged from private repo) diff --git a/src/QD_MARL/marl_qd_launcher_no_sets.py b/src/QD_MARL/marl_qd_launcher_no_sets.py new file mode 100644 index 000000000..81c94084a --- /dev/null +++ b/src/QD_MARL/marl_qd_launcher_no_sets.py @@ -0,0 +1,306 @@ +import importlib +import os +import sys +import gc + +sys.path.append(".") +import random +import time +from copy import deepcopy +from math import sqrt + +import numpy as np +import pettingzoo +from agents.agents import * +from algorithms import ( + grammatical_evolution, + individuals, + map_elites, + map_elites_Pyribs, + mapElitesCMA_pyRibs, +) +from decisiontrees import ( + ConditionFactory, + DecisionTree, + QLearningLeafFactory, + RLDecisionTree, +) +from decisiontrees.leaves import * +from magent2.environments import battlefield_v5 +from training.evaluations_no_sets import * +from utils import * + +# from memory_profiler import profile + + +def get_map_elite(config): + """ + Produces a tree for the selected problem by using the Grammatical Evolution + + :config: a dictionary containing all the parameters + :log_path: a path to the log directory + """ + # Setup ME + me_config = config["me"]["kwargs"] + # Build classes of the operators from the config file + me_config["c_factory"] = ConditionFactory(config["ConditionFactory"]["type"]) + me_config["l_factory"] = QLearningLeafFactory( + config["QLearningLeafFactory"]["kwargs"]["leaf_params"], + config["QLearningLeafFactory"]["kwargs"]["decorators"], + ) + if me_config["me_type"] == "Map_elites": + me = map_elites.MapElites(**me_config) + elif me_config["me_type"] == "MapElites_pyRibs": + me = map_elites_Pyribs.MapElites_Pyribs(**me_config) + elif me_config["me_type"] == "MapElitesCMA_pyRibs": + me = mapElitesCMA_pyRibs.MapElitesCMA_pyRibs(**me_config) + print_configs("ME type:", me_config["me_type"]) + if me_config["selection_type"] == "coach": + coach_config = me_config["coach"] + coach_config["pop_size"] = me_config["init_pop_size"] + coach_config["batch_size"] = me_config["batch_pop"] + coach = CoachAgent(coach_config, me) + else: + coach = None + print_configs("ME selection type:", me_config["selection_type"]) + return me, coach + + +def pretrain_tree(tree, rb): + """ + Pretrains a tree + + :t: A tree + :rb: The replay buffer + :returns: The pretrained tree + """ + if tree is None: + return None + for e in rb: + tree.empty_buffers() + if len(e) > 0: + for s, a, r, sp in e: + tree.force_action(s, a) + tree.set_reward(r) + tree.set_reward_end_of_episode() + return tree + + +def produce_tree( + config, log_path=None, extra_log=False, debug=False, manual_policy=False +): + + # Setup ME + me_config = config["me_config"] + me_config["me"]["kwargs"]["log_path"] = log_path + + me, coach = get_map_elite(me_config) + coach_index = None + selection_type = me_config["me"]["kwargs"]["selection_type"] + #setup job manager + number_of_agents = config["n_agents"] + number_of_sets = config["n_sets"] + population_size = me_config["me"]["kwargs"]["init_pop_size"] + number_of_teams = population_size // number_of_agents + map_ = utils.get_map(number_of_teams, debug) + + # setup log files + evolution_dir = os.path.join(log_path, "Evolution_dir") + os.makedirs(evolution_dir , exist_ok=False) + team_dir = os.path.join(evolution_dir, "Teams") + os.makedirs(team_dir , exist_ok=False) + pop_dir = os.path.join(evolution_dir, "Population") + os.makedirs(pop_dir , exist_ok=False) + trees_dir = os.path.join(log_path, "Trees_dir") + os.makedirs(trees_dir , exist_ok=False) + + + # Initialize best individual for each agent + best = [None for _ in range(number_of_agents)] + best_fit = [-float("inf") for _ in range(number_of_agents)] + new_best = [False for _ in range(number_of_agents)] + + for i in range(number_of_agents): + #print(f"{gen: <10} agent_{i: <4} {agent_min[i]: <10.2f} {agent_mean[i]: <10.2f} {agent_max[i]: <10.2f} {agent_std[i]: <10.2f}") + with open(os.path.join(evolution_dir, f"agent_{i}.log"), "a") as f: + f.write(f"Generation,Min,Mean,Max,Std\n") + for i in range(number_of_teams): + with open(os.path.join(team_dir, f"team_{i}.txt"), "a") as f: + f.write(f"Generation,Min,Mean,Max,Std\n") + with open(os.path.join(pop_dir, f"pop.txt"), "a") as f: + f.write(f"Generation,Min,Mean,Max,Std\n") + with open(os.path.join(evolution_dir, f"bests.txt"), "a") as f: + f.write(f"Generation,Min,Mean,Max,Std\n") + + print_info(f"{'Generation' : <10} {'Set': <10} {'Min': <10} {'Mean': <10} {'Max': <10} {'Std': <10}") + sets_fitness = [ [] for _ in range(number_of_sets)] + #start training process + for gen in range(-1,config["training"]["generations"]): + config["generation"] = gen + me_pop = [] + squads = [] + #Ask for a new population + if gen == -1: + me_pop = me.ask(None) + trees = [RLDecisionTree(me_pop[i], config["training"]["gamma"]) for i in range(len(me_pop))] + for i in range(number_of_teams): + squads.append(trees[i*number_of_agents:(i+1)*number_of_agents]) + else: + if selection_type == "coach": + coach_index = coach.get_squad(number_of_teams) + for i in range(number_of_teams): + me_pop = me.ask(coach_index[i] if selection_type == "coach" else None) + trees = [RLDecisionTree(me_pop[i], config["training"]["gamma"]) for i in range(len(me_pop))] + squads.append(trees) + return_values = map_(evaluate, squads, config) + + agents_fitness = [ [] for _ in range(number_of_agents)] + agents_tree = [ [] for _ in range(number_of_agents)] + + # Store trees and fitnesses + for values in return_values: + for i in range(number_of_agents): + agents_fitness[i].append(values[0][i]) + agents_tree[i].append(values[1][i]) + + # Check whether the best, for each agent, has to be updated + amax = [np.argmax(agents_fitness[i]) for i in range(number_of_agents)] + max_ = [agents_fitness[i][amax[i]] for i in range(number_of_agents)] + + for i in range(number_of_agents): + if max_[i] > best_fit[i]: + best_fit[i] = max_[i] + best[i] = agents_tree[i][amax[i]] + new_best[i] = True + + tree_text = f"{best[i]}" + utils.save_tree(best[i], trees_dir, f"best_agent_{i}") + with open(os.path.join(trees_dir, f"best_agent_{i}.log"), "w") as f: + f.write(tree_text) + teams_fitness = [] + for i in range(number_of_teams): + team = [] + for j in range(number_of_agents): + team.append(agents_fitness[j][i]) + teams_fitness.append((team)) + individual_fitness = np.array(teams_fitness).flatten() + me.tell(individual_fitness) + + # Compute stats for each agent + agent_min = [np.min(agents_fitness[i]) for i in range(number_of_agents)] + agent_mean = [np.mean(agents_fitness[i]) for i in range(number_of_agents)] + agent_max = [np.max(agents_fitness[i]) for i in range(number_of_agents)] + agent_std = [np.std(agents_fitness[i]) for i in range(number_of_agents)] + + for i in range(number_of_agents): + #print(f"{gen: <10} agent_{i: <4} {agent_min[i]: <10.2f} {agent_mean[i]: <10.2f} {agent_max[i]: <10.2f} {agent_std[i]: <10.2f}") + with open(os.path.join(evolution_dir, f"agent_{i}.log"), "a") as f: + f.write(f"{gen} {agent_min[i]} {agent_mean[i]} {agent_max[i]} {agent_std[i]}\n") + + new_best[i] = False + # Compute stats for the population + pop_min = np.min(individual_fitness) + pop_mean = np.mean(individual_fitness) + pop_max = np.max(individual_fitness) + pop_std = np.std(individual_fitness) + + if gen ==-1: + vmin = np.quantile(individual_fitness, 0.7) + vmax = None + else: + vmin = None + vmax = None + + with open(os.path.join(pop_dir, f"pop.txt"), "a") as f: + f.write(f"{gen},{pop_min},{pop_mean},{pop_max},{pop_std}\n") + + plot_log(pop_dir, f"pop.txt", gen) + + # Compute states for each team + + team_min = [np.min(teams_fitness[i]) for i in range(number_of_teams)] + team_mean = [np.mean(teams_fitness[i]) for i in range(number_of_teams)] + team_max = [np.max(teams_fitness[i]) for i in range(number_of_teams)] + team_std = [np.std(teams_fitness[i]) for i in range(number_of_teams)] + + for i in range(number_of_teams): + print_info(f"{gen: <10} team_{i: <4} {team_min[i]: <10.2f} {team_mean[i]: <10.2f} {team_max[i]: <10.2f} {team_std[i]: <10.2f}") + with open(os.path.join(team_dir, f"team_{i}.txt"), "a") as f: + f.write(f"{gen},{team_min[i]},{team_mean[i]},{team_max[i]},{team_std[i]}\n") + plot_log(team_dir, f"team_{i}.txt", gen) + + # Compute stats for the bests + best_min = np.min(best_fit) + best_mean = np.mean(best_fit) + best_max = np.max(best_fit) + best_std = np.std(best_fit) + + with open(os.path.join(evolution_dir, f"bests.txt"), "a") as f: + f.write(f"{gen},{best_min},{best_mean},{best_max},{best_std}\n") + + plot_log(evolution_dir, f"bests.txt", gen) + me.plot_archive(gen, vmin=vmin, vmax=vmax) + return best + +if __name__ == "__main__": + import argparse + import json + import shutil + + import utils + import yaml + + parser = argparse.ArgumentParser() + parser.add_argument("config", help="Path of the config file to use") + parser.add_argument("--debug", action="store_true", help="Debug flag") + parser.add_argument("--log", action="store_true", help="Log flag") + parser.add_argument("seed", type=int, help="Random seed to use") + args = parser.parse_args() + print_info("Launching Quality Diversity MARL") + print_configs("Environment configurations file: ", args.config) + + if args.debug: + print_configs("DEBUG MODE") + + # Load the config file + config = json.load(open(args.config)) + + # Set the random seed + random.seed(args.seed) + np.random.seed(args.seed) + # Setup logging + logdir_name = utils.get_logdir_name() + if config["hpc"]: + cwd = os.getcwd() + log_path = f"{cwd}/logs/qd-marl/hpc/no_sets/{config['me_config']['me']['kwargs']['me_type']}/{config['me_config']['me']['kwargs']['selection_type']}/magent_battlefield/{logdir_name}" + else: + log_path = f"logs/qd-marl/local/no_sets/{config['me_config']['me']['kwargs']['me_type']}/{config['me_config']['me']['kwargs']['selection_type']}/magent_battlefield/{logdir_name}" + print_configs("Logs path: ", log_path) + join = lambda x: os.path.join(log_path, x) + config["log_path"] = log_path + os.makedirs(log_path, exist_ok=False) + shutil.copy(args.config, join("config.json")) + with open(join("seed.log"), "w") as f: + f.write(str(args.seed)) + + squad = produce_tree(config, log_path, args.log, args.debug) + + import logging + + logging.basicConfig( + filename=join("output.log"), + level=logging.INFO, + format="%(asctime)s %(message)s", + filemode="w", + ) + logger = logging.getLogger() + index = 0 + + for player in squad: + print_info("\n", player) + + # config["environment"]["render_mode"] = "human" + + # results = evaluate(squad, config) + # print_info(results) + # return interpretability(best, config, log_path, index, logger) diff --git a/src/QD_MARL/training/__init__.py b/src/QD_MARL/training/__init__.py new file mode 100644 index 000000000..19abc4402 --- /dev/null +++ b/src/QD_MARL/training/__init__.py @@ -0,0 +1,4 @@ +from training.evaluations import * +from training.evaluations_no_sets import * +from training.evolve_tree_me import * +from training.differentObservations import * diff --git a/src/QD_MARL/training/__pycache__/__init__.cpython-311.pyc b/src/QD_MARL/training/__pycache__/__init__.cpython-311.pyc new file mode 100644 index 000000000..8a0ff1ec9 Binary files /dev/null and b/src/QD_MARL/training/__pycache__/__init__.cpython-311.pyc differ diff --git a/src/QD_MARL/training/__pycache__/__init__.cpython-38.pyc b/src/QD_MARL/training/__pycache__/__init__.cpython-38.pyc new file mode 100644 index 000000000..e3c09c5c7 Binary files /dev/null and b/src/QD_MARL/training/__pycache__/__init__.cpython-38.pyc differ diff --git a/src/QD_MARL/training/__pycache__/differentObservations.cpython-311.pyc b/src/QD_MARL/training/__pycache__/differentObservations.cpython-311.pyc new file mode 100644 index 000000000..5708ad3ba Binary files /dev/null and b/src/QD_MARL/training/__pycache__/differentObservations.cpython-311.pyc differ diff --git a/src/QD_MARL/training/__pycache__/differentObservations.cpython-38.pyc b/src/QD_MARL/training/__pycache__/differentObservations.cpython-38.pyc new file mode 100644 index 000000000..61a159fd7 Binary files /dev/null and b/src/QD_MARL/training/__pycache__/differentObservations.cpython-38.pyc differ diff --git a/src/QD_MARL/training/__pycache__/evaluations.cpython-311.pyc b/src/QD_MARL/training/__pycache__/evaluations.cpython-311.pyc new file mode 100644 index 000000000..2ee319a50 Binary files /dev/null and b/src/QD_MARL/training/__pycache__/evaluations.cpython-311.pyc differ diff --git a/src/QD_MARL/training/__pycache__/evaluations.cpython-38.pyc b/src/QD_MARL/training/__pycache__/evaluations.cpython-38.pyc new file mode 100644 index 000000000..4ea0205ac Binary files /dev/null and b/src/QD_MARL/training/__pycache__/evaluations.cpython-38.pyc differ diff --git a/src/QD_MARL/training/__pycache__/evaluations_no_sets.cpython-311.pyc b/src/QD_MARL/training/__pycache__/evaluations_no_sets.cpython-311.pyc new file mode 100644 index 000000000..af490ef0b Binary files /dev/null and b/src/QD_MARL/training/__pycache__/evaluations_no_sets.cpython-311.pyc differ diff --git a/src/QD_MARL/training/__pycache__/evaluations_no_sets.cpython-38.pyc b/src/QD_MARL/training/__pycache__/evaluations_no_sets.cpython-38.pyc new file mode 100644 index 000000000..4c8cfbe3b Binary files /dev/null and b/src/QD_MARL/training/__pycache__/evaluations_no_sets.cpython-38.pyc differ diff --git a/src/QD_MARL/training/__pycache__/evaluations_single_tree.cpython-311.pyc b/src/QD_MARL/training/__pycache__/evaluations_single_tree.cpython-311.pyc new file mode 100644 index 000000000..2532cb292 Binary files /dev/null and b/src/QD_MARL/training/__pycache__/evaluations_single_tree.cpython-311.pyc differ diff --git a/src/QD_MARL/training/__pycache__/evaluations_single_tree.cpython-38.pyc b/src/QD_MARL/training/__pycache__/evaluations_single_tree.cpython-38.pyc new file mode 100644 index 000000000..4f86638d8 Binary files /dev/null and b/src/QD_MARL/training/__pycache__/evaluations_single_tree.cpython-38.pyc differ diff --git a/src/QD_MARL/training/__pycache__/evolve_tree_me.cpython-311.pyc b/src/QD_MARL/training/__pycache__/evolve_tree_me.cpython-311.pyc new file mode 100644 index 000000000..83f066ebc Binary files /dev/null and b/src/QD_MARL/training/__pycache__/evolve_tree_me.cpython-311.pyc differ diff --git a/src/QD_MARL/training/__pycache__/evolve_tree_me.cpython-38.pyc b/src/QD_MARL/training/__pycache__/evolve_tree_me.cpython-38.pyc new file mode 100644 index 000000000..1858c45b1 Binary files /dev/null and b/src/QD_MARL/training/__pycache__/evolve_tree_me.cpython-38.pyc differ diff --git a/src/QD_MARL/training/differentObservations.py b/src/QD_MARL/training/differentObservations.py new file mode 100644 index 000000000..1ab8d9821 --- /dev/null +++ b/src/QD_MARL/training/differentObservations.py @@ -0,0 +1,513 @@ +import numpy as np + + +def compute_features_42(obs, n_allies, n_enemies): + + map_h = obs.shape[0] + map_w = obs.shape[1] + + coordinates = tuple(obs[0, 0, 7:9]) + gamma = round(0.0125 * 7, 4) # Indice ottenuto dalla coordinata per matrice 13 x 13, usata in dimensione 80 + epsilon = 0.0001 + ind_x = int(coordinates[0] / gamma + epsilon) + ind_y = int(coordinates[1] / gamma + epsilon) + + map_h_2 = map_h // 2 + map_w_2 = map_w // 2 + + new_features = [] + # Find nearby obstacles + nearby_obstacles = [0 for _ in range(4)] + for i in [1, 2]: + # left + if all(obs[map_h_2-1:map_h_2+2, map_w_2-i, 0]): nearby_obstacles[0] = 1 + # up + if all(obs[map_h_2-i, map_w_2-1:map_w_2+2, 0]): nearby_obstacles[1] = 1 + # right + if all(obs[map_h_2-1:map_h_2+2, map_w_2+i, 0]): nearby_obstacles[2] = 1 + # down + if all(obs[map_h_2+i, map_w_2-1:map_w_2+2, 0]): nearby_obstacles[3] = 1 + + for y in [-1, 0, 1]: + for x in [-1, 0, 1]: + if not(y == 0 and x == 0): + nearby_obstacles.append(obs[map_h_2 + y, map_w_2 + x, 0]) + new_features.extend(nearby_obstacles) + + # Compute the number of teammates to the left, right, up and down + # Compute the number of enemies to the left, right, up and down + allies = obs[: , :, 3] + enemis = obs[: , :, 6] + allies[ind_y, ind_x] -= 1 + (1 / n_allies) # Beacuse in the agent coordinates it contains 1 + itself density + if allies[ind_y, ind_x] < epsilon: allies[ind_y, ind_x] = 0 + enemis[ind_y, ind_x] -= 1 # Beacuse in the agent coordinates it contains 1 + + allies_density = np.zeros(9) + enemies_density = np.zeros(9) + + # above the agent + for i in range(ind_y): + # top left + for j in range(ind_x): + allies_density[0] += allies[i, j] + enemies_density[0] += enemis[i, j] + # top + allies_density[1] += allies[i, ind_x] + enemies_density[1] += enemis[i, ind_x] + # top right + for j in range(ind_x +1, map_w): + allies_density[2] += allies[i, j] + enemies_density[2] += enemis[i, j] + + # to the left of the agent + for j in range(ind_x): + allies_density[3] += allies[ind_y, j] + enemies_density[3] += enemis[ind_y, j] + + # center + allies_density[4] += allies[ind_y, ind_x] + enemies_density[4] += enemis[ind_y, ind_x] + + # to the right of the agent + for j in range(ind_x+1, map_w): + allies_density[5] += allies[ind_y, j] + enemies_density[5] += enemis[ind_y, j] + + # under the agent + for i in range(ind_y +1, map_h): + # below left + for j in range(ind_x): + allies_density[6] += allies[i, j] + enemies_density[6] += enemis[i, j] + # below + allies_density[7] += allies[i, ind_x] + enemies_density[7] += enemis[i, ind_x] + # below right + for j in range(ind_x +1, map_w): + allies_density[8] += allies[i, j] + enemies_density[8] += enemis[i, j] + + new_features.extend(allies_density) + new_features.extend(enemies_density) + + nondead = (obs[:, :, 4] * obs[:, :, 5]) > 0 + n_enemies_left = np.sum(nondead[:, :map_w_2]) / n_enemies + n_enemies_up = np.sum(nondead[:map_h_2, :]) / n_enemies + n_enemies_right = np.sum(nondead[:, 1 + map_w_2:]) / n_enemies + n_enemies_down = np.sum(nondead[1 + map_h_2:, :]) / n_enemies + new_features.extend([ + n_enemies_left, + n_enemies_up, + n_enemies_right, + n_enemies_down + ]) + enemy_presence = [] + for y in [-1, 0, 1]: + for x in [-1, 0, 1]: + if not(y == 0 and x == 0): + enemy_presence.append(nondead[map_h_2 + y, map_w_2 + x]) + + new_features.extend(enemy_presence) + + return np.array(new_features) + +def compute_features_42_obs(obs, n_allies, n_enemies): + + map_h = obs.shape[0] + map_w = obs.shape[1] + + coordinates = tuple(obs[0, 0, 7:9]) + gamma = round(0.0125 * 7, 4) # Indice ottenuto dalla coordinata per matrice 13 x 13, usata in dimensione 80 + epsilon = 0.0001 + ind_x = int(coordinates[0] / gamma + epsilon) + ind_y = int(coordinates[1] / gamma + epsilon) + + map_h_2 = map_h // 2 + map_w_2 = map_w // 2 + + new_features = [] + # Find nearby obstacles + obstacles = (obs[:, :, 0] + ((obs[:, :, 1] * obs[:, :, 2]) > 0).astype('float32')) + + # two boxes up, left and two boxes down + new_features.extend([ + # up + obstacles[map_h_2 -2, map_w_2], + # left + obstacles[map_h_2, map_w_2 -2], + # right + obstacles[map_h_2, map_w_2 +2], + # down + obstacles[map_h_2 +2, map_w_2] + ]) + + nearby_obstacles = [] + for y in [-1, 0, 1]: + for x in [-1, 0, 1]: + if not(y == 0 and x == 0): + nearby_obstacles.append(obstacles[map_h_2 + y, map_w_2 + x]) + new_features.extend(nearby_obstacles) + + # Compute the number of teammates to the left, right, up and down + # Compute the number of enemies to the left, right, up and down + allies = obs[: , :, 3] + enemis = obs[: , :, 6] + allies[ind_y, ind_x] -= 1 + (1 / n_allies) # Beacuse in the agent coordinates it contains 1 + itself density + if allies[ind_y, ind_x] < epsilon: allies[ind_y, ind_x] = 0 + enemis[ind_y, ind_x] -= 1 # Beacuse in the agent coordinates it contains 1 + + allies_density = np.zeros(9) + enemies_density = np.zeros(9) + + # above the agent + for i in range(ind_y): + # top left + for j in range(ind_x): + allies_density[0] += allies[i, j] + enemies_density[0] += enemis[i, j] + # top + allies_density[1] += allies[i, ind_x] + enemies_density[1] += enemis[i, ind_x] + # top right + for j in range(ind_x +1, map_w): + allies_density[2] += allies[i, j] + enemies_density[2] += enemis[i, j] + + # to the left of the agent + for j in range(ind_x): + allies_density[3] += allies[ind_y, j] + enemies_density[3] += enemis[ind_y, j] + + # center + allies_density[4] += allies[ind_y, ind_x] + enemies_density[4] += enemis[ind_y, ind_x] + + # to the right of the agent + for j in range(ind_x+1, map_w): + allies_density[5] += allies[ind_y, j] + enemies_density[5] += enemis[ind_y, j] + + # under the agent + for i in range(ind_y +1, map_h): + # below left + for j in range(ind_x): + allies_density[6] += allies[i, j] + enemies_density[6] += enemis[i, j] + # below + allies_density[7] += allies[i, ind_x] + enemies_density[7] += enemis[i, ind_x] + # below right + for j in range(ind_x +1, map_w): + allies_density[8] += allies[i, j] + enemies_density[8] += enemis[i, j] + + new_features.extend(allies_density) + new_features.extend(enemies_density) + + nondead = (obs[:, :, 4] * obs[:, :, 5]) > 0 + n_enemies_left = np.sum(nondead[:, :map_w_2]) / n_enemies + n_enemies_up = np.sum(nondead[:map_h_2, :]) / n_enemies + n_enemies_right = np.sum(nondead[:, 1 + map_w_2:]) / n_enemies + n_enemies_down = np.sum(nondead[1 + map_h_2:, :]) / n_enemies + new_features.extend([ + n_enemies_left, + n_enemies_up, + n_enemies_right, + n_enemies_down + ]) + enemy_presence = [] + for y in [-1, 0, 1]: + for x in [-1, 0, 1]: + if not(y == 0 and x == 0): + enemy_presence.append(nondead[map_h_2 + y, map_w_2 + x]) + + new_features.extend(enemy_presence) + + return np.array(new_features) + +def compute_features_34_old(obs, n_allies, n_enemies): + + + map_h = obs.shape[0] + map_w = obs.shape[1] + + coordinates = tuple(obs[0, 0, 7:9]) + gamma = round(0.0125 * 7, 4) # Indice ottenuto dalla coordinata per matrice 13 x 13, usata in dimensione 80 + epsilon = 0.0001 + ind_x = int(coordinates[0] / gamma + epsilon) + ind_y = int(coordinates[1] / gamma + epsilon) + + map_h_2 = map_h // 2 + map_w_2 = map_w // 2 + + new_features = [] + # Find nearby obstacles + nearby_obstacles = [0 for _ in range(4)] + for i in [1, 2]: + # up + if all(obs[map_h_2-i, map_w_2-1:map_w_2+2, 0]): nearby_obstacles[0] = 1 + # left + if all(obs[map_h_2-1:map_h_2+2, map_w_2-i, 0]): nearby_obstacles[1] = 1 + # right + if all(obs[map_h_2-1:map_h_2+2, map_w_2+i, 0]): nearby_obstacles[2] = 1 + # down + if all(obs[map_h_2+i, map_w_2-1:map_w_2+2, 0]): nearby_obstacles[3] = 1 + + for y in [-1, 0, 1]: + for x in [-1, 0, 1]: + if not(y == 0 and x == 0): + nearby_obstacles.append(obs[map_h_2 + y, map_w_2 + x, 0]) + new_features.extend(nearby_obstacles) + + # Compute the global density of teammates to the left, right, up and down + # Compute the global density of enemies to the left, right, up and down + allies = obs[: , :, 3] + enemis = obs[: , :, 6] + allies[ind_y, ind_x] -= 1 + (1 / n_allies) # Beacuse in the agent coordinates it contains 1 + itself density + if allies[ind_y, ind_x] < epsilon: allies[ind_y, ind_x] = 0 + enemis[ind_y, ind_x] -= 1 # Beacuse in the agent coordinates it contains 1 + + allies_density = np.zeros(5) + enemies_density = np.zeros(5) + + # above the agent + for i in range(ind_y): + for j in range(map_w): + allies_density[0] += allies[i, j] + enemies_density[0] += enemis[i, j] + + # left of the agent + for i in range(ind_x): + for j in range(map_h): + allies_density[1] += allies[j, i] + enemies_density[1] += enemis[j, i] + + # center + allies_density[2] += allies[ind_y, ind_x] + enemies_density[2] += enemis[ind_y, ind_x] + + # right of the agent + for i in range(ind_x+1, map_w): + for j in range(map_h): + allies_density[3] += allies[j, i] + enemies_density[3] += enemis[j, i] + + # under the agent + for i in range(ind_y +1, map_h): + for j in range(map_w): + allies_density[4] += allies[i, j] + enemies_density[4] += enemis[i, j] + + new_features.extend(allies_density) + new_features.extend(enemies_density) + + # Compute the local density of enemies to the left, right, up and down + nondead = (obs[:, :, 4] * obs[:, :, 5]) > 0 + n_enemies_left = np.sum(nondead[:, :map_w_2]) / n_enemies + n_enemies_up = np.sum(nondead[:map_h_2, :]) / n_enemies + n_enemies_right = np.sum(nondead[:, 1 + map_w_2:]) / n_enemies + n_enemies_down = np.sum(nondead[1 + map_h_2:, :]) / n_enemies + new_features.extend([ + n_enemies_left, + n_enemies_up, + n_enemies_right, + n_enemies_down + ]) + enemy_presence = [] + for y in [-1, 0, 1]: + for x in [-1, 0, 1]: + if not(y == 0 and x == 0): + enemy_presence.append(nondead[map_h_2 + y, map_w_2 + x]) + + new_features.extend(enemy_presence) + + return np.array(new_features) + +def compute_features_34_new(obs, n_allies, n_enemies): + + map_h = obs.shape[1] + map_w = obs.shape[1] + + coordinates = tuple(obs[0, 0, 7:9]) + gamma = round(0.0125 * 7, 4) # Indice ottenuto dalla coordinata per matrice 13 x 13, usata in dimensione 80 + epsilon = 0.0001 + ind_x = int(coordinates[0] / gamma + epsilon) + ind_y = int(coordinates[1] / gamma + epsilon) + + map_h_2 = map_h // 2 + map_w_2 = map_w // 2 + + new_features = [] + # Find nearby obstacles + obstacles = (obs[:, :, 0] + ((obs[:, :, 1] * obs[:, :, 2]) > 0).astype('float32')) + + # two boxes up, left and two boxes down + new_features.extend([ + # up + obstacles[map_h_2 -2, map_w_2], + # left + obstacles[map_h_2, map_w_2 -2], + # right + obstacles[map_h_2, map_w_2 +2], + # down + obstacles[map_h_2 +2, map_w_2] + ]) + + nearby_obstacles = [] + for y in [-1, 0, 1]: + for x in [-1, 0, 1]: + if not(y == 0 and x == 0): + nearby_obstacles.append(obstacles[map_h_2 + y, map_w_2 + x]) + ''' + for y in range(-2, 3): + for x in range(-2, 3): + if not(y == 0 and x == 0) and not(abs(y) + abs(x) > 2): + nearby_obstacles.append(obstacles[map_h_2 + y, map_w_2 + x]) + ''' + new_features.extend(nearby_obstacles) + + # Compute the global density of teammates to the left, right, up and down + # Compute the global density of enemies to the left, right, up and down + allies = obs[: , :, 3] + enemis = obs[: , :, 6] + allies[ind_y, ind_x] -= 1 + (1 / n_allies) # Beacuse in the agent coordinates it contains 1 + itself density + if allies[ind_y, ind_x] < epsilon: allies[ind_y, ind_x] = 0 + enemis[ind_y, ind_x] -= 1 # Beacuse in the agent coordinates it contains 1 + + allies_density = np.zeros(5) + enemies_density = np.zeros(5) + + # above the agent + for i in range(ind_y): + for j in range(map_w): + allies_density[0] += allies[i, j] + enemies_density[0] += enemis[i, j] + + # to the left of the agent + for i in range(ind_x): + for j in range(map_h): + allies_density[1] += allies[j, i] + enemies_density[1] += enemis[j, i] + + # center + allies_density[2] += allies[ind_y, ind_x] + enemies_density[2] += enemis[ind_y, ind_x] + + # to the right of the agent + for i in range(ind_x+1, map_w): + for j in range(map_h): + allies_density[3] += allies[j, i] + enemies_density[3] += enemis[j, i] + + # under the agent + for i in range(ind_y +1, map_h): + for j in range(map_w): + allies_density[4] += allies[i, j] + enemies_density[4] += enemis[i, j] + + new_features.extend(allies_density) + new_features.extend(enemies_density) + + # Compute the local density of enemies to the left, right, up and down + nondead = (obs[:, :, 4] * obs[:, :, 5]) > 0 + n_enemies_up = np.sum(nondead[:map_h_2, :]) / n_enemies + n_enemies_left = np.sum(nondead[:, :map_w_2]) / n_enemies + n_enemies_right = np.sum(nondead[:, 1 + map_w_2:]) / n_enemies + n_enemies_down = np.sum(nondead[1 + map_h_2:, :]) / n_enemies + new_features.extend([ + n_enemies_up, + n_enemies_left, + n_enemies_right, + n_enemies_down + ]) + enemy_presence = [] + for y in [-1, 0, 1]: + for x in [-1, 0, 1]: + if not(y == 0 and x == 0): + enemy_presence.append(nondead[map_h_2 + y, map_w_2 + x]) + + new_features.extend(enemy_presence) + + return np.array(new_features) + +if __name__ == "__main__": + + + import pettingzoo + from magent2.environments import battle_v4, battlefield_v5 + import json + import argparse + import random + import time + + class Agent: + def __init__(self, name, squad): + self._name = name + self._squad = squad + def get_name(self): + return self._name + def get_squad(self): + return self._squad + + parser = argparse.ArgumentParser() + parser.add_argument("config", help="Path of the config file to use") + args = parser.parse_args() + + config = json.load(open(args.config)) + + env = battlefield_v5.env(**config['environment']) + + env.reset() + + red_agents = 12 + blue_agents = 12 + agents = {} + + for agent_name in env.agents: + agent_squad = "_".join(agent_name.split("_")[:-1]) + agents[agent_name] = Agent(agent_name, agent_squad) + + for index, agent in enumerate(env.agent_iter()): + j = index % len(env.agents) + if j == 0: + env.render() + observation, reward, done, trunc, info = env.last() + action = 6 if not done else None + + #if not done: + if agent == 'blue_0' and not done: + features = compute_features_34_new(observation, blue_agents, red_agents) + print(len(features)) + print(features) + print('blue:', blue_agents) + print('red:', red_agents) + print() + #print("Comando {}:".format(agent)) + print("Comando:") + input_ = input() + try: + action = int(input_) + if action < 0 or action > 20: + print("L'azione deve essere compresa tra 0 e 20 inclusi") + action = 6 + except: + #print(input_) + if input_ == "break" or input_ == "exit": + print("Closed") + break + elif input_ == "vision": + print(features) + print('blue:', blue_agents) + print('red:', red_agents) + print() + else: + print("pass") + pass + #print(reward) + if done: + if agents[agent].get_squad() == 'blue': + blue_agents -= 1 + else: + red_agents -= 1 + + env.step(action) diff --git a/src/QD_MARL/training/evaluations.py b/src/QD_MARL/training/evaluations.py new file mode 100644 index 000000000..653e8eeb8 --- /dev/null +++ b/src/QD_MARL/training/evaluations.py @@ -0,0 +1,167 @@ +import os +import sys + +sys.path.append(".") +import random +import time +from copy import deepcopy +from math import sqrt + +import numpy as np +import pettingzoo + +# from memory_profiler import profile +import training.differentObservations as differentObservations +from agents.agents import * +from algorithms import ( + grammatical_evolution, + individuals, + map_elites, + map_elites_Pyribs, + mapElitesCMA_pyRibs, +) +from decisiontrees import ( + ConditionFactory, + DecisionTree, + QLearningLeafFactory, + RLDecisionTree, +) +from magent2.environments import battle_v4, battlefield_v5 +from pettingzoo.utils import aec_to_parallel, parallel_to_aec +from utils import * + + + +def evaluate(trees, config): + # FOR LOGGING + # Getting the pid as it is in a parallel process + # Create the log folder + pid = os.getpid() + eval_logs = os.path.join(config["log_path"], "eval_log", str(config['generation']), str(pid)) + os.makedirs(eval_logs, exist_ok=True) + + # Starting training and evaluation process + compute_features = getattr( + differentObservations, f"compute_features_{config['observation']}" + ) + policy = None + # Setup the agents + agents = {} + actions = {} + env = battlefield_v5.env(**config["environment"]) + env.reset() # This reset lead to the problem + for agent_name in env.agents: + agent_squad = "_".join(agent_name.split("_")[:-1]) + if agent_squad == config["team_to_optimize"]: + agent_number = int("_".join(agent_name.split("_")[1])) + + # Search the index of the set in which the agent belongs + set_ = 0 + for set_index in range(len(config["sets"])): + if agent_number in config["sets"][set_index]: + set_ = set_index + + # Initialize training agent + agents[agent_name] = Agent( + agent_name, agent_squad, set_, trees[set_], None, True + ) + else: + # Initialize random or policy agent + agents[agent_name] = Agent( + agent_name, agent_squad, None, None, policy, False + ) + + for agent_name in agents: + actions[agent_name] = [] + + rewards = [] + for i in range(config["training"]["episodes"]): + red_done = 0 + blue_done = 0 + # Seed the environment + env.reset(seed=i) + np.random.seed(i) + env.reset() + + # Set variabler for new episode + for agent_name in agents: + if agents[agent_name].to_optimize(): + agents[agent_name].new_episode() + red_agents = 12 + blue_agents = 12 + # tree.empty_buffers() # NO-BUFFER LEAFS + # Iterate over all the agents + for index, agent_name in enumerate(env.agent_iter()): + + obs, rew, done, trunc, _ = env.last() + + agent = agents[agent_name] + + if agent.to_optimize(): + # Register the reward + agent.set_reward(rew) + # print_configs(f"Agent {agent_name} reward: {rew}") + rewards.append(rew) + + action = None + if not done and not trunc: # if the agent is alive + if agent.to_optimize(): + # compute_features(observation, allies, enemies) + if agent.get_squad() == "blue": + action = agent.get_output( + compute_features(obs, blue_agents, red_agents) + ) + if action is None: + print("None action") + print_debugging(type(agent.get_tree())) + else: + action = agent.get_output( + compute_features(obs, red_agents, blue_agents) + ) + else: + if agent.has_policy(): + if agent.get_squad() == "blue": + action = agent.get_output( + compute_features(obs, blue_agents, red_agents) + ) + else: + action = agent.get_output( + compute_features(obs, red_agents, blue_agents) + ) + else: + # action = env.action_space(agent_name).sample() + action = np.random.randint(21) + else: # update the number of active agents + if agent.get_squad() == "red": + red_agents -= 1 + else: + blue_agents -= 1 + env.step(action) + # actions[agent_name].append(action) + + # Log the number of kill per in each episode + # with open(os.path.join(eval_logs,"log_n_kills.txt"), "a") as f: + # f.write("Episode: " + str(i)+ " red: " + str(red_done) + " blue: " + str(blue_done) + "\n") + # f.close() + env.close() + # plot_actions(actions, pid, config) + # rewards count + rewards = np.array(rewards) + unique, counts = np.unique(rewards, return_counts=True) + rewards_dict = dict(zip(unique, counts)) + + # Log the rewards in each episode + with open(os.path.join(eval_logs, "log_rewards.txt"), "a") as f: + f.write(str(rewards_dict) + "\n") + f.close() + + # Compute the statistics and scores for each agent(Decision Tree) + scores = [] + actual_trees = [] + for agent_name in agents: + if agents[agent_name].to_optimize(): + scores.append( + agents[agent_name].get_score_statistics(config["statistics"]["agent"]) + ) + actual_trees.append(agents[agent_name].get_tree()) + return scores, actual_trees diff --git a/src/QD_MARL/training/evaluations_no_sets.py b/src/QD_MARL/training/evaluations_no_sets.py new file mode 100644 index 000000000..f477724d7 --- /dev/null +++ b/src/QD_MARL/training/evaluations_no_sets.py @@ -0,0 +1,144 @@ +import os +import sys + +sys.path.append(".") +import random +import time +from copy import deepcopy +from math import sqrt + +import numpy as np +import pettingzoo + +# from memory_profiler import profile +import training.differentObservations as differentObservations +from agents.agents import * +from algorithms import ( + grammatical_evolution, + individuals, + map_elites, + map_elites_Pyribs, + mapElitesCMA_pyRibs, +) +from decisiontrees import ( + ConditionFactory, + DecisionTree, + QLearningLeafFactory, + RLDecisionTree, +) +from magent2.environments import battle_v4, battlefield_v5 +from pettingzoo.utils import aec_to_parallel, parallel_to_aec +from utils import * + + + +def evaluate(trees, config): + # Check whether the phenotype is valid + for tree in trees: + if tree is None: + return -10**3, None + pid = os.getpid() + eval_logs = os.path.join(config["log_path"], "eval_log", str(config['generation']), str(pid)) + os.makedirs(eval_logs, exist_ok=True) + # Re-import the environments here to avoid problems with parallelization + import training.differentObservations as differentObservations + #from manual_policies import Policies + import numpy as np + from magent2.environments import battlefield_v5 + + # Load the function used to computer the features from the observation + compute_features = getattr(differentObservations, f"compute_features_{config['observation']}") + + # Load manual policy if present + policy = None + #if config['manual_policy']: + # policy = Policies(config['manual_policy']) + + # Load the environment + env = battlefield_v5.env(**config['environment']) + env.reset() + + # Set tree and policy to agents + agents = {} + for agent_name in env.agents: + agent_squad = "_".join(agent_name.split("_")[:-1]) + if agent_squad == config["team_to_optimize"]: + agent_number = int("_".join(agent_name.split("_")[1])) + + + # Initialize trining agent + agents[agent_name] = Agent(agent_name, agent_squad, None, trees[agent_number-1], None, True) + else: + # Initialize random or policy agent + agents[agent_name] = Agent(agent_name, agent_squad, None, None, policy, False) + + # Start the training + kills = [] + for i in range(config["training"]["episodes"]): + kills.append(0) + # Seed the environment + env.reset(seed=i) + np.random.seed(i) + env.reset() + + # Set variabler for new episode + for agent_name in agents: + if agents[agent_name].to_optimize(): + agents[agent_name].new_episode() + red_agents = 12 + blue_agents = 12 + + # tree.empty_buffers() # NO-BUFFER LEAFS + # Iterate over all the agents + for index, agent_name in enumerate(env.agent_iter()): + + obs, rew, done, trunc, _ = env.last() + + agent = agents[agent_name] + + if agent.to_optimize(): + # Register the reward + agent.set_reward(rew) + + action = None + if not done and not trunc: # if the agent is alive + if agent.to_optimize(): + # compute_features(observation, allies, enemies) + if agent.get_squad() == 'blue': + action = agent.get_output(compute_features(obs, blue_agents, red_agents)) + else: + action = agent.get_output(compute_features(obs, red_agents, blue_agents)) + else: + if agent.has_policy(): + if agent.get_squad() == 'blue': + action = agent.get_output(compute_features(obs, blue_agents, red_agents)) + else: + action = agent.get_output(compute_features(obs, red_agents, blue_agents)) + else: + #action = env.action_space(agent_name).sample() + action = np.random.randint(21) + else: # update the number of active agents + if agent.get_squad() == 'red': + red_agents -= 1 + if done: + kills[-1] += 1 + else: + blue_agents -= 1 + + env.step(action) + env.close() + tot_kills = np.sum(kills) + kills.insert(0,tot_kills) + # Log the rewards in each episode + with open(os.path.join(eval_logs, "log_rewards.txt"), "a") as f: + f.write(str(kills) + "\n") + f.close() + + scores = [] + actual_trees = [] + for agent_name in agents: + if agents[agent_name].to_optimize(): + scores.append(agents[agent_name].get_score_statistics(config['statistics']['agent'])) + actual_trees.append(agents[agent_name].get_tree()) + + return scores, actual_trees diff --git a/src/QD_MARL/training/evolve_tree_me.py b/src/QD_MARL/training/evolve_tree_me.py new file mode 100644 index 000000000..4ebd3c382 --- /dev/null +++ b/src/QD_MARL/training/evolve_tree_me.py @@ -0,0 +1,254 @@ +#!/usr/bin/env python +# -*- coding: utf-8 -*- +""" + experiment_launchers.history_reuse_gym + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + + This module allows to evolve diverse trees for a domain + by using Novelty search. + + :copyright: (c) 2021 by Leonardo Lucio Custode. + :license: MIT, see LICENSE for more details. +""" +import os +import sys +sys.path.append("../..") +import gym +from time import time, sleep +import utils +import random +import numpy as np +from tqdm import tqdm +from copy import deepcopy +from algorithms import grammatical_evolution, map_elites +from decisiontrees import QLearningLeafFactory, ConditionFactory, \ + RLDecisionTree +from joblib import Parallel, delayed + +def pretrain_tree(t, rb): + """ + Pretrains a tree + + :t: A tree + :rb: The replay buffer + :returns: The pretrained tree + """ + if t is None: + return None + for e in rb: + t.empty_buffers() + if len(e) > 0: + for s, a, r, sp in e: + t.force_action(s, a) + t.set_reward(r) + t.set_reward_end_of_episode() + return t + + +def evaluate_tree(tree, config): + """ + Evaluates the tree + + :tree: The tree to evaluate + :config: The config + :returns: A tuple (episodes, fitness) + """ + # Check if the tree is valid + if tree is None: + return ([], -10**5, None) + + env = gym.make(config["env"]["env_name"]) + episodes = [] + cum_rews = [] + + # Iterate over the episodes + for i in range(config["training"]["episodes"]): + tree.empty_buffers() + episodes.append([]) + env.seed(i) + obs = env.reset() + done = False + cum_rews.append(0) + step = 0 + while not done: + action = tree.get_output(obs) + obs, rew, done, _ = env.step(action) + tree.set_reward(rew) + cum_rews[-1] += rew + # episodes[-1].append([list(obs), action, rew]) + if step != 0: + episodes[-1][-1][-1] = obs + step += 1 + episodes[-1].append([obs, action, rew, None]) + tree.set_reward_end_of_episode() + + return episodes, np.mean(cum_rews), tree + + +def evaluate(trees, config, replay_buffer, map_): + """ + Evaluates the fitness of the population of trees + + :trees: A list of trees + :config: A dictionary with all the settings + :replay_buffer: a list of episodes (lists of (state, action, rew)) + :map_: a mapping function + :returns: A list of (float, tree) + """ + ti = time() + if len(replay_buffer) > 0: + trees = map_(pretrain_tree, trees, replay_buffer) + + print("Pretraining took", time() - ti, "\bs") + + + ti = time() + ti = time() + outputs = map_(evaluate_tree, [trees[i] for i in range(len(trees))], config) + print("Training took", time() - ti, "\bs") + + best_fitness = -float("inf") + best_episodes = None + ret_values = [None for _ in range(len(trees))] + for index, (episodes, fitness, tree) in zip(list(range(len(trees))), outputs): + trees[index] = tree + ret_values[index] = (fitness, trees[index]) + """ + replay_buffer.extend(episodes) + while len(replay_buffer) > config["training"]["max_buffer_size"]: + del replay_buffer[0] + """ + if fitness > best_fitness: + best_episodes = episodes + trees.append(tree) + + return ret_values + + +def produce_tree(config, log_path, debug=False): + """ + Produces a tree for the selected problem by using the Grammatical Evolution + + :config: a dictionary containing all the parameters + :log_path: a path to the log directory + """ + # Setup GE + me_config = config["me"] + + # Build classes of the operators from the config file + me_config["c_factory"] = ConditionFactory() + me_config["l_factory"] = QLearningLeafFactory( + config["leaves"]["params"], + config["leaves"]["decorators"] + ) + me = map_elites.MapElites(**me_config) + + # Init replay buffer + replay_buffer = [] + + # Retrieve the map function from utils + map_ = utils.get_map(config["training"]["jobs"], debug) + # Initialize best individual + best, best_fit, new_best = None, -float("inf"), False + + with open(os.path.join(log_path, "log.txt"), "a") as f: + f.write(f"Generation Min Mean Max Std\n") + print(f"{'Generation' : <10} {'Min': <10} {'Mean': <10} \ + {'Max': <10} {'Std': <10} {'Invalid': <10} {'Best': <10}") + + trees = me.init_pop() + print(trees) + trees = [RLDecisionTree(t, config["training"]["gamma"]) for t in trees] + # Compute the fitnesses + # We need to return the trees in order to retrieve the + # correct values for the leaves when using the + # parallelization + return_values = evaluate(trees, config, replay_buffer, map_) + fitnesses = [r[0] for r in return_values] + trees = [r[1] for r in return_values] + + # Check whether the best has to be updated + print(fitnesses) + amax = np.argmax(fitnesses) + max_ = fitnesses[amax] + me.init_tell(fitnesses, trees) + # Iterate over the generations + for i in range(config["training"]["generations"]): + # Retrieve the current population + trees = me.ask() + print(trees) + trees = [RLDecisionTree(t, config["training"]["gamma"]) for t in trees] + # Compute the fitnesses + # We need to return the trees in order to retrieve the + # correct values for the leaves when using the + # parallelization + return_values = evaluate(trees, config, replay_buffer, map_) + fitnesses = [r[0] for r in return_values] + trees = [r[1] for r in return_values] + + # Check whether the best has to be updated + amax = np.argmax(fitnesses) + max_ = fitnesses[amax] + + if max_ > best_fit: + best_fit = max_ + best = trees[amax] + new_best = True + + # Tell the fitnesses to the GE + me.tell(fitnesses) + + # Compute stats + fitnesses = np.array(fitnesses) + valid = fitnesses != -100000 + min_ = np.min(fitnesses[valid]) + mean = np.mean(fitnesses[valid]) + max_ = np.max(fitnesses[valid]) + std = np.std(fitnesses[valid]) + invalid = np.sum(fitnesses == -100000) + + print(f"{i: <10} {min_: <10.2f} {mean: <10.2f} \ + {max_: <10.2f} {std: <10.2f} {invalid: <10} {best_fit: <10.2f}") + + # Update the log file + with open(os.path.join(log_path, "log.txt"), "a") as f: + f.write(f"{i} {min_} {mean} {max_} {std} {invalid}\n") + if new_best: + f.write(f"New best: {best}; Fitness: {best_fit}\n") + with open(join("best_tree.mermaid"), "w") as f: + f.write(str(best)) + new_best = False + return best + + +if __name__ == "__main__": + import json + import utils + import shutil + import argparse + from joblib import parallel_backend + + parser = argparse.ArgumentParser() + parser.add_argument("config", help="Path of the config file to use") + parser.add_argument("--debug", action="store_true", help="Debug flag") + parser.add_argument("seed", type=int, help="Random seed to use") + args = parser.parse_args() + + # Load the config file + config = json.load(open(args.config)) + + # Set the random seed + random.seed(args.seed) + np.random.seed(args.seed) + + # Setup logging + logdir_name = utils.get_logdir_name() + log_path = f"logs/me/gym/{logdir_name}" + join = lambda x: os.path.join(log_path, x) + #exp = Experiment(args.seed, log_path, **args.config) + os.makedirs(log_path, exist_ok=False) + shutil.copy(args.config, join("config.json")) + with open(join("seed.log"), "w") as f: + f.write(str(args.seed)) + + best = produce_tree(config, log_path, args.debug) diff --git a/src/QD_MARL/util_processing_elements/__init__.py b/src/QD_MARL/util_processing_elements/__init__.py index 17237bd9b..6420b411e 100644 --- a/src/QD_MARL/util_processing_elements/__init__.py +++ b/src/QD_MARL/util_processing_elements/__init__.py @@ -13,7 +13,15 @@ import cv2 import pickle import numpy as np +<<<<<<< HEAD from processing_element import ProcessingElement, ProcessingElementFactory, PEFMetaClass +======= +from .processing_element import ( + ProcessingElement, + ProcessingElementFactory, + PEFMetaClass, +) +>>>>>>> aca3e01 (merged from private repo) class UtilMetaClass(type): @@ -74,7 +82,10 @@ def __repr__(self): class MoveDim(ProcessingElement, metaclass=UtilMetaClass): +<<<<<<< HEAD +======= +>>>>>>> aca3e01 (merged from private repo) """ This processing element moves the dimensions in an tensor """ @@ -163,7 +174,10 @@ def __unicode__(self): class MinMaxNormalizer(ProcessingElement, metaclass=UtilMetaClass): +<<<<<<< HEAD +======= +>>>>>>> aca3e01 (merged from private repo) """ This processing element normalizes all the elements by using min-max normalization. @@ -256,9 +270,13 @@ def __init__(self, predictors_path, window_size): files = list(map(lambda f: os.path.join(predictors_path, f), files)) for p in files: +<<<<<<< HEAD self._predictors.append( pickle.load(open(p, "rb")) ) +======= + self._predictors.append(pickle.load(open(p, "rb"))) +>>>>>>> aca3e01 (merged from private repo) self._window_size = window_size self._memory = [[] for _ in range(len(predictors_path))] diff --git a/src/QD_MARL/util_processing_elements/__pycache__/__init__.cpython-311.pyc b/src/QD_MARL/util_processing_elements/__pycache__/__init__.cpython-311.pyc new file mode 100644 index 000000000..265432ee5 Binary files /dev/null and b/src/QD_MARL/util_processing_elements/__pycache__/__init__.cpython-311.pyc differ diff --git a/src/QD_MARL/util_processing_elements/__pycache__/__init__.cpython-38.pyc b/src/QD_MARL/util_processing_elements/__pycache__/__init__.cpython-38.pyc new file mode 100644 index 000000000..1c00d24aa Binary files /dev/null and b/src/QD_MARL/util_processing_elements/__pycache__/__init__.cpython-38.pyc differ diff --git a/src/QD_MARL/util_processing_elements/__pycache__/processing_element.cpython-311.pyc b/src/QD_MARL/util_processing_elements/__pycache__/processing_element.cpython-311.pyc new file mode 100644 index 000000000..7ec509a95 Binary files /dev/null and b/src/QD_MARL/util_processing_elements/__pycache__/processing_element.cpython-311.pyc differ diff --git a/src/QD_MARL/util_processing_elements/__pycache__/processing_element.cpython-38.pyc b/src/QD_MARL/util_processing_elements/__pycache__/processing_element.cpython-38.pyc new file mode 100644 index 000000000..3c541f89c Binary files /dev/null and b/src/QD_MARL/util_processing_elements/__pycache__/processing_element.cpython-38.pyc differ diff --git a/src/QD_MARL/util_processing_elements/processing_element.py b/src/QD_MARL/util_processing_elements/processing_element.py new file mode 100644 index 000000000..a3e68de1f --- /dev/null +++ b/src/QD_MARL/util_processing_elements/processing_element.py @@ -0,0 +1,86 @@ +#!/usr/bin/env python +# -*- coding: utf-8 -*- +""" + experiment_launchers.processing_element + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + + This module implements the interface for processing modules + + :copyright: (c) 2021 by Leonardo Lucio Custode. + :license: MIT, see LICENSE for more details. +""" +from abc import abstractmethod + + +class ProcessingElement: + """ + This interface defines a ProcessingElement, i.e., an element that + takes an input and produces an output and composes a pipeline. + """ + + @abstractmethod + def get_output(self, input_): + """ + This method returns the output of the agent given the input + + :input_: The agent's input + :returns: The agent's output, which may be either a scalar, an ndarray + or a torch Tensor + """ + pass + + @abstractmethod + def set_reward(self, reward): + """ + Allows to give the reward to the agent + + :reward: A float representing the reward + """ + pass + + @abstractmethod + def new_episode(self): + """ + Tells the agent that a new episode has begun + """ + pass + + +class ProcessingElementFactory: + """ + This class defines the interface for factories of ProcessingElements. + """ + + @abstractmethod + def ask_pop(self): + """ + This method returns a whole population of solutions for the factory. + :returns: A population of solutions. + """ + pass + + @abstractmethod + def tell_pop(self, fitnesses, data=None): + """ + This methods assigns the computed fitness for each individual of the population. + """ + pass + + +class PEFMetaClass(type): + _registry = {} + + def __new__(meta, name, bases, class_dict): + cls = type.__new__(meta, name, bases, class_dict) + PEFMetaClass._registry[cls.__name__] = cls + return cls + + @staticmethod + def get(class_name): + """ + Retrieves the class associated to the string + + :class_name: The name of the class + :returns: A class + """ + return PEFMetaClass._registry[class_name] diff --git a/src/QD_MARL/utils/__pycache__/__init__.cpython-38.pyc b/src/QD_MARL/utils/__pycache__/__init__.cpython-38.pyc new file mode 100644 index 000000000..6be1e7ada Binary files /dev/null and b/src/QD_MARL/utils/__pycache__/__init__.cpython-38.pyc differ diff --git a/src/QD_MARL/utils/__pycache__/print_outputs.cpython-38.pyc b/src/QD_MARL/utils/__pycache__/print_outputs.cpython-38.pyc new file mode 100644 index 000000000..83200dba4 Binary files /dev/null and b/src/QD_MARL/utils/__pycache__/print_outputs.cpython-38.pyc differ diff --git a/src/QD_MARL/utils/__pycache__/utils.cpython-38.pyc b/src/QD_MARL/utils/__pycache__/utils.cpython-38.pyc new file mode 100644 index 000000000..5dbda6fb7 Binary files /dev/null and b/src/QD_MARL/utils/__pycache__/utils.cpython-38.pyc differ diff --git a/src/QD_MARL/utils/gif_creator.py b/src/QD_MARL/utils/gif_creator.py new file mode 100644 index 000000000..1b0037519 --- /dev/null +++ b/src/QD_MARL/utils/gif_creator.py @@ -0,0 +1,18 @@ +import argparse +import os +from PIL import Image + +def gif_creator(path): + filenames = [f for f in os.listdir(path) if f.endswith('.png')] + filenames = sorted(filenames, key=lambda x: int(x.split('_')[-1].split('.')[0])) + images = [Image.open(os.path.join(path, f)) for f in filenames] + frames = images[0] + output_file = os.path.join(path, 'output.gif') + frames.save(output_file, format='GIF', append_images=images, save_all=True, duration=400, loop=0) + +if __name__ == "__main__": + parser = argparse.ArgumentParser() + parser.add_argument("imgs_folder", help="folder with imgs to create gif") + path = parser.parse_args().imgs_folder + gif_creator(path) + diff --git a/src/QD_MARL/utils/update_config.py b/src/QD_MARL/utils/update_config.py new file mode 100644 index 000000000..e09f7d45a --- /dev/null +++ b/src/QD_MARL/utils/update_config.py @@ -0,0 +1,33 @@ +import argparse +import json +import os + +parser = argparse.ArgumentParser() +parser.add_argument("config", help="Path of the config file to use") +args = parser.parse_args() + +# Load the config file +config = json.load(open(args.config)) + +# Load the template file +for file_name in config["files_names"]: + new_configs = config["template"] + + names= file_name.replace(".json", "").split("_") + print(names) + + new_configs["me_config"]["me"]["kwargs"]['selection_type'] = names[-1] + if names[-2] == "pyribsCMA": + new_configs["me_config"]["me"]["kwargs"]['me_type'] = "MapElitesCMA_pyRibs" + elif names[-2] == "pyribs": + new_configs["me_config"]["me"]["kwargs"]['me_type'] = "MapElites_pyRibs" + else: + raise ValueError("Unknown ME type") + # Save the new configs in a JSON file + if new_configs['hpc'] == True: + new_config_file = os.path.join("src/QD_MARL/configs/hpc",file_name) + else: + new_config_file = os.path.join("src/QD_MARL/configs/local","test_config.json") + with open(new_config_file, "w") as f: + json.dump(new_configs, f, indent=4) + print(f"New configs saved in {new_config_file}") \ No newline at end of file diff --git a/src/QD_MARL/utils/utils.py b/src/QD_MARL/utils/utils.py index 672ecdeb2..81ffd3366 100644 --- a/src/QD_MARL/utils/utils.py +++ b/src/QD_MARL/utils/utils.py @@ -9,6 +9,7 @@ :copyright: (c) 2021 by Leonardo Lucio Custode. :license: MIT, see LICENSE for more details. """ +<<<<<<< HEAD import string import numpy as np import os @@ -17,17 +18,42 @@ from joblib import Parallel, delayed from decisiontrees import RLDecisionTree from decisiontrees import ConditionFactory, QLearningLeafFactory +======= +import os +import pickle +import string + +# from memory_profiler import profile +from datetime import datetime + +import matplotlib as plt +import matplotlib.pyplot as plt +import numpy as np +import pandas as pd +import seaborn as sns +from decisiontrees import ConditionFactory, QLearningLeafFactory, RLDecisionTree +from joblib import Parallel, delayed +from .print_outputs import print_debugging + +>>>>>>> aca3e01 (merged from private repo) def get_logdir_name(): """ Returns a name for the dir :returns: a name in the format dd-mm-yyyy_:mm:ss_ """ +<<<<<<< HEAD time = datetime.now().strftime("%d-%m-%Y_%H-%M-%S") rand_str = "".join(np.random.choice([*string.ascii_lowercase], 8)) return f"{time}_{rand_str}" +======= + time = datetime.now().strftime("%d-%m-%Y_%H-%M-%S-%f") + rand_str = "".join(np.random.choice([*string.ascii_lowercase], 8)) + return f"{time}_{rand_str}" + +>>>>>>> aca3e01 (merged from private repo) def get_map(n_jobs, debug=False): """ Returns a function pointer that implements a parallel map function @@ -38,24 +64,45 @@ def get_map(n_jobs, debug=False): """ if debug: +<<<<<<< HEAD +======= + +>>>>>>> aca3e01 (merged from private repo) def fcn(function, iterable, config): ret_vals = [] for i in iterable: ret_vals.append(function(i, config)) return ret_vals +<<<<<<< HEAD else: def fcn(function, iterable, config): with Parallel(n_jobs) as p: return p(delayed(function)(elem, config) for elem in iterable) +======= + + else: + + def fcn(function, iterable, config): + with Parallel(n_jobs) as p: + return p(delayed(function)(elem, config) for elem in iterable) + +>>>>>>> aca3e01 (merged from private repo) return fcn class CircularList(list): +<<<<<<< HEAD +======= +>>>>>>> aca3e01 (merged from private repo) """ A list that, when indexed outside its bounds (index i), returns the element in position i % len(self) """ +<<<<<<< HEAD +======= + +>>>>>>> aca3e01 (merged from private repo) def __init__(self, iterable): """ Initializes the list. @@ -88,9 +135,17 @@ def __init__(self, dictionary): circular_dict[k] = CircularList(v) dict.__init__(self, circular_dict) +<<<<<<< HEAD +# PER RIPARAZIONE +from decisiontrees import Condition + +======= + # PER RIPARAZIONE from decisiontrees import Condition + +>>>>>>> aca3e01 (merged from private repo) def genotype2phenotype(individual, config): """ Converts a genotype in a phenotype @@ -104,15 +159,23 @@ def genotype2phenotype(individual, config): grammar = Grammar(config["grammar"]) cfactory = ConditionFactory(config["conditions"]["type"]) lfactory = QLearningLeafFactory( +<<<<<<< HEAD config["leaves"]["params"], config["leaves"]["decorators"] +======= + config["leaves"]["params"], config["leaves"]["decorators"] +>>>>>>> aca3e01 (merged from private repo) ) if grammar["root"][next(gene)] == "condition": params = cfactory.get_trainable_parameters() +<<<<<<< HEAD root = cfactory.create( [grammar[p][next(gene)] for p in params] ) +======= + root = cfactory.create([grammar[p][next(gene)] for p in params]) +>>>>>>> aca3e01 (merged from private repo) else: root = lfactory.create() return RLDecisionTree(root, config["training"]["gamma"]) @@ -125,9 +188,13 @@ def genotype2phenotype(individual, config): for i, n in enumerate(["left", "right"]): if grammar["root"][next(gene)] == "condition": params = cfactory.get_trainable_parameters() +<<<<<<< HEAD newnode = cfactory.create( [grammar[p][next(gene)] for p in params] ) +======= + newnode = cfactory.create([grammar[p][next(gene)] for p in params]) +>>>>>>> aca3e01 (merged from private repo) getattr(node, f"set_{n}")(newnode) fringe.insert(i, newnode) else: @@ -144,7 +211,11 @@ def genotype2phenotype(individual, config): for i, n in enumerate(["left", "right"]): actual_node = getattr(node, f"get_{n}")() if actual_node is None: +<<<<<<< HEAD #print("INVALIDO") +======= + # print("INVALIDO") +>>>>>>> aca3e01 (merged from private repo) actual_node = lfactory.create() getattr(node, f"set_{n}")(actual_node) fringe.insert(i, actual_node) @@ -166,6 +237,10 @@ def genotype2str(genotype, config): """ pass +<<<<<<< HEAD +======= + +>>>>>>> aca3e01 (merged from private repo) def save_tree(tree, log_dir, name): if log_dir is not None: assert isinstance(tree, RLDecisionTree), "Object passed is not a RLDecisionTree" @@ -173,9 +248,132 @@ def save_tree(tree, log_dir, name): with open(log_file, "wb") as f: pickle.dump(tree, f) +<<<<<<< HEAD +======= + +>>>>>>> aca3e01 (merged from private repo) def get_tree(log_file): tree = None if log_file is not None: with open(log_file, "rb") as f: tree = pickle.load(f) return tree +<<<<<<< HEAD +======= + +def fitnesses_stats(all_fitnesses, team_fitnesses=None): + """ + Computes the statistics of the fitnesses + + :all_fitnesses: A list of fitnesses + :team_fitnesse: A dictionary with the fitnesses of the teams + :returns: A dictionary with the statistics + """ + stats = {} + all_fitnesses = np.array(all_fitnesses) + valid = all_fitnesses != -100000 + stats["min"] = np.min(all_fitnesses) + stats["max"] = np.max(all_fitnesses) + stats['max_index'] = np.argmax(all_fitnesses) + stats["mean"] = np.mean(all_fitnesses) + stats["std"] = np.std(all_fitnesses) + stats["valid"] = np.sum(valid) + stats["invalid"] = len(all_fitnesses) - stats["valid"] + if team_fitnesses is not None: + team_fitnesses = np.array(team_fitnesses) + valid = team_fitnesses != -100000 + stats["teams"] = {} + stats["teams"]["min"] = np.min(team_fitnesses) + stats["teams"]["max"] = np.max(team_fitnesses) + stats["teams"]['max_index'] = np.argmax(team_fitnesses) + stats["teams"]["mean"] = np.mean(team_fitnesses) + stats["teams"]["std"] = np.std(team_fitnesses) + stats["teams"]["valid"] = np.sum(valid) + stats["teams"]["invalid"] = len(team_fitnesses) - stats["teams"]["valid"] + return stats + + + +def plot_log(log_path=None, file=None, gen=None): + # the log file si a csv file with the following format: + # + + pop_file = os.path.join(log_path, file) + plot_name = pop_file.split("/")[-1].split(".")[0] + plot_dir = os.path.join(log_path, "stat_plots") + os.makedirs(plot_dir, exist_ok=True) + plot_path = os.path.join(plot_dir, f"{plot_name}_{gen}.png") + df = pd.read_csv(pop_file) + df = df.sort_values(by=["Generation"]) + figure, ax = plt.subplots() + ax.plot(df["Generation"].to_list(), df["Min"].to_list(), label="Min") + ax.errorbar( + df["Generation"].to_list(), + df["Mean"].to_list(), + yerr=df["Std"].to_list(), + label="Mean", + marker="o", + ) + ax.plot(df["Generation"].to_list(), df["Max"].to_list(), label="Max") + + ax.set_xlabel("Generation") + ax.set_ylabel("Fitness") + ax.set_title("Fitness over generations") + ax.legend() + ax.grid(True) + plt.savefig(plot_path) + plt.close(fig=figure) + pass + +def plot_actions(actions, pid, config): + gen = config["generation"] + actions_path = os.path.join(config["log_path"], "actions_plt", str(gen), str(pid)) + os.makedirs( + actions_path, exist_ok=True + ) + + for agent in actions: + # heatmap + if 'blue" in agent': + action_matrix = np.zeros((len(actions[agent]), 21)) + count = 0 + for a in actions[agent]: + action_matrix[count, a] += 1 + count += 1 + + action_matrix = action_matrix.T + plt.title(f"Heatmap of actions during generation {gen} for {agent} on pid {pid}") + sns.heatmap(action_matrix, cmap="YlOrRd", cbar=False) + plt.xlabel("Cycles") + plt.ylabel("Actions") + path = ( + actions_path + + f"/heatmap_actions_{agent}.png" + ) + plt.savefig(path) + plt.close() + x = np.arange(21) + y = np.sum(action_matrix, axis=1) + + plt.title(f"Count of actions during generation {gen} for {agent}") + + path = ( + actions_path + + f"/log_actions_{agent}.png" + ) + ax = sns.barplot(x=x, y=y, hue=x, dodge=False, palette='husl', legend=False) + ax.set(xlabel="Actions", ylabel="Count") + ax.bar_label(ax.containers[0]) + plt.savefig(path) + plt.close() + del action_matrix + + +if __name__ == "__main__": + log_path = "logs/qd-marl/magent_battlefield/THIS" + log_file = "logs/qd-marl/magent_battlefield/THIS/_all_sel_log.csv" + + plot_log(log_path, log_file) + + +>>>>>>> aca3e01 (merged from private repo) diff --git a/src/base/dts4marl/__pycache__/differentObservations.cpython-38.pyc b/src/base/dts4marl/__pycache__/differentObservations.cpython-38.pyc new file mode 100644 index 000000000..31530427f Binary files /dev/null and b/src/base/dts4marl/__pycache__/differentObservations.cpython-38.pyc differ diff --git a/src/base/dts4marl/__pycache__/manual_policies.cpython-310.pyc b/src/base/dts4marl/__pycache__/manual_policies.cpython-310.pyc new file mode 100644 index 000000000..2f5474e8d Binary files /dev/null and b/src/base/dts4marl/__pycache__/manual_policies.cpython-310.pyc differ diff --git a/src/base/dts4marl/__pycache__/manual_policies.cpython-38.pyc b/src/base/dts4marl/__pycache__/manual_policies.cpython-38.pyc new file mode 100644 index 000000000..82ed18b1d Binary files /dev/null and b/src/base/dts4marl/__pycache__/manual_policies.cpython-38.pyc differ diff --git a/src/base/dts4marl/__pycache__/observeTest.cpython-38.pyc b/src/base/dts4marl/__pycache__/observeTest.cpython-38.pyc new file mode 100644 index 000000000..da12b7ddd Binary files /dev/null and b/src/base/dts4marl/__pycache__/observeTest.cpython-38.pyc differ diff --git a/src/base/dts4marl/__pycache__/testVari.cpython-38.pyc b/src/base/dts4marl/__pycache__/testVari.cpython-38.pyc new file mode 100644 index 000000000..eba858af3 Binary files /dev/null and b/src/base/dts4marl/__pycache__/testVari.cpython-38.pyc differ diff --git a/src/base/dts4marl/algorithms/grammatical_evolution.py b/src/base/dts4marl/algorithms/grammatical_evolution.py index a6cb786b9..781ddbb52 100644 --- a/src/base/dts4marl/algorithms/grammatical_evolution.py +++ b/src/base/dts4marl/algorithms/grammatical_evolution.py @@ -6,6 +6,7 @@ Creation Date: 04-04-2020 Last modified: mer 6 mag 2020, 16:30:41 """ +<<<<<<< HEAD import os import re import string @@ -13,6 +14,15 @@ from typing import List import numpy as np +======= +import re +import os +import string +import numpy as np +from typing import List +from abc import abstractmethod + +>>>>>>> aca3e01 (merged from private repo) TAB = " " * 4 @@ -408,7 +418,11 @@ def __init__(self, pop_size, agents, sets, mutation, crossover, selection, repla self._logdir = logdir if logdir is not None else None self._init_pop() if self._individual_genes_injected is not None: +<<<<<<< HEAD self._inject_individual(self._individual_genes_injected, self._injection_rate) +======= + self._inject_individual(self._individual_genes_injected, self._injection_rate) +>>>>>>> aca3e01 (merged from private repo) self._old_individuals = [ [] for _ in range(self._sets)] self._updated = [False for _ in range(self._sets)] # To detect the first generation @@ -424,10 +438,17 @@ def _inject_individual(self, individual_genes, injection_rate): # Reshape the genes if necessary if (len(individual_genes) < self._genotype_length): +<<<<<<< HEAD # ones because one means leaf individual_genes = np.hstack([individual_genes, np.ones(self._genotype_length - len(individual_genes), dtype=int)]) elif (len(individual_genes) > self._genotype_length): individual_genes = individual_genes[:self._genotype_length] +======= + # ones because one means leaf + individual_genes = np.hstack([individual_genes, np.ones(self._genotype_length - len(individual_genes), dtype=int)]) + elif (len(individual_genes) > self._genotype_length): + individual_genes = individual_genes[:self._genotype_length] +>>>>>>> aca3e01 (merged from private repo) individue_to_inject = Individual(individual_genes, None, None) # Create the individual with the genes injeceted for set_ in range(self._sets): # Inject the individual in each set with the same rate diff --git a/src/base/dts4marl/launcher.py b/src/base/dts4marl/launcher.py index 1d460aeab..f7b4da94a 100644 --- a/src/base/dts4marl/launcher.py +++ b/src/base/dts4marl/launcher.py @@ -1,5 +1,6 @@ import os import sys +<<<<<<< HEAD sys.path.append(".") import random @@ -16,6 +17,20 @@ from magent2.environments import battlefield_v5 +======= +sys.path.append(".") +import time +import utils +import random +import pettingzoo +import numpy as np +from math import sqrt +from copy import deepcopy +from algorithms import grammatical_evolution +from decisiontreelibrary import QLearningLeafFactory, ConditionFactory, RLDecisionTree +from magent2.environments import battlefield_v5 + +>>>>>>> aca3e01 (merged from private repo) class Agent: def __init__(self, name, squad, set_, tree, manual_policy, to_optimize): self._name = name @@ -49,11 +64,18 @@ def get_output(self, observation): def set_reward(self, reward): self._tree.set_reward(reward) +<<<<<<< HEAD self._score[-1] = reward def get_score_statistics(self, params): score_values = [score_dict[key] for score_dict in self._score for key in score_dict] return getattr(np, f"{params['type']}")(a=score_values, **params['params'])#Can't compare dicts with > +======= + self._score[-1] += reward + + def get_score_statistics(self, params): + return getattr(np, f"{params['type']}")(a=self._score, **params['params']) +>>>>>>> aca3e01 (merged from private repo) def new_episode(self): self._score.append(0) @@ -69,11 +91,21 @@ def evaluate(trees, config): for tree in trees: if tree is None: return -10**3, None +<<<<<<< HEAD +======= + pid = os.getpid() + eval_logs = os.path.join(config["log_path"], "eval_log", str(config['generation']), str(pid)) + os.makedirs(eval_logs, exist_ok=True) +>>>>>>> aca3e01 (merged from private repo) # Re-import the environments here to avoid problems with parallelization import differentObservations #from manual_policies import Policies import numpy as np +<<<<<<< HEAD +======= + from magent2.environments import battlefield_v5 +>>>>>>> aca3e01 (merged from private repo) # Load the function used to computer the features from the observation compute_features = getattr(differentObservations, f"compute_features_{config['observation']}") @@ -84,8 +116,13 @@ def evaluate(trees, config): # policy = Policies(config['manual_policy']) # Load the environment +<<<<<<< HEAD env = battlefield_v5.env(**config['environment']).unwrapped env.reset()#This reset lead to the problem +======= + env = battlefield_v5.env(**config['environment']) + env.reset() +>>>>>>> aca3e01 (merged from private repo) # Set tree and policy to agents agents = {} @@ -107,6 +144,10 @@ def evaluate(trees, config): agents[agent_name] = Agent(agent_name, agent_squad, None, None, policy, False) # Start the training +<<<<<<< HEAD +======= + rewards = [] +>>>>>>> aca3e01 (merged from private repo) for i in range(config["training"]["episodes"]): # Seed the environment @@ -123,17 +164,29 @@ def evaluate(trees, config): # tree.empty_buffers() # NO-BUFFER LEAFS # Iterate over all the agents +<<<<<<< HEAD for index, agent_name in enumerate(env.agents): actions = {agent: env.action_space(agent).sample() for agent in env.agents} obs, rew, done, trunc, _ = env.step(actions) +======= + for index, agent_name in enumerate(env.agent_iter()): + + obs, rew, done, trunc, _ = env.last() +>>>>>>> aca3e01 (merged from private repo) agent = agents[agent_name] if agent.to_optimize(): # Register the reward agent.set_reward(rew) +<<<<<<< HEAD action = env.action_space(agent_name) +======= + rewards.append(rew) + + action = None +>>>>>>> aca3e01 (merged from private repo) if not done and not trunc: # if the agent is alive if agent.to_optimize(): # compute_features(observation, allies, enemies) @@ -148,8 +201,13 @@ def evaluate(trees, config): else: action = agent.get_output(compute_features(obs, red_agents, blue_agents)) else: +<<<<<<< HEAD action = env.action_space(agent_name).sample() #action = np.random.randint(21) +======= + #action = env.action_space(agent_name).sample() + action = np.random.randint(21) +>>>>>>> aca3e01 (merged from private repo) else: # update the number of active agents if agent.get_squad() == 'red': red_agents -= 1 @@ -157,15 +215,35 @@ def evaluate(trees, config): blue_agents -= 1 env.step(action) +<<<<<<< HEAD + + env.close() + +======= + env.close() + rewards = np.array(rewards) + unique, counts = np.unique(rewards, return_counts=True) + rewards_dict = dict(zip(unique, counts)) + + # Log the rewards in each episode + with open(os.path.join(eval_logs, "log_rewards.txt"), "a") as f: + f.write(str(rewards_dict) + "\n") + f.close() + +>>>>>>> aca3e01 (merged from private repo) scores = [] actual_trees = [] for agent_name in agents: if agents[agent_name].to_optimize(): scores.append(agents[agent_name].get_score_statistics(config['statistics']['agent'])) actual_trees.append(agents[agent_name].get_tree()) +<<<<<<< HEAD +======= + +>>>>>>> aca3e01 (merged from private repo) return scores, actual_trees def produce_tree(config, log_path=None, extra_log=False, debug=False, manual_policy=False): @@ -238,7 +316,11 @@ def produce_tree(config, log_path=None, extra_log=False, debug=False, manual_pol print(f"{'Generation' : <10} {'Set': <10} {'Min': <10} {'Mean': <10} {'Max': <10} {'Std': <10}") # Iterate over the generations for gen in range(config["training"]["generations"]): +<<<<<<< HEAD +======= + config['generation'] = gen +>>>>>>> aca3e01 (merged from private repo) # Retrive the current population pop = ge.ask() @@ -246,10 +328,18 @@ def produce_tree(config, log_path=None, extra_log=False, debug=False, manual_pol trees = [map_(utils.genotype2phenotype, pop[i], config) for i in range(number_of_sets)] # Form different groups of trees squads = [[trees[j][i] for j in range(number_of_sets)] for i in range(config['ge']['pop_size'])] +<<<<<<< HEAD # Compute the fitnesses # We need to return the trees in order to retrive the # correct values for the leaves when using the parallelization return_values = map_(evaluate, squads, config) #TODO: check if this is correct +======= + + # Compute the fitnesses + # We need to return the trees in order to retrive the + # correct values for the leaves when using the parallelization + return_values = map_(evaluate, squads, config) +>>>>>>> aca3e01 (merged from private repo) agents_fitness = [ [] for _ in range(number_of_agents)] agents_tree = [ [] for _ in range(number_of_agents)] @@ -318,11 +408,18 @@ def produce_tree(config, log_path=None, extra_log=False, debug=False, manual_pol return best if __name__ == "__main__": +<<<<<<< HEAD import argparse import json import shutil import utils +======= + import json + import utils + import shutil + import argparse +>>>>>>> aca3e01 (merged from private repo) parser = argparse.ArgumentParser() parser.add_argument("config", help="Path of the config file to use") @@ -333,6 +430,10 @@ def produce_tree(config, log_path=None, extra_log=False, debug=False, manual_pol # Load the config file config = json.load(open(args.config)) +<<<<<<< HEAD +======= + +>>>>>>> aca3e01 (merged from private repo) # Set the random seed random.seed(args.seed) np.random.seed(args.seed) @@ -340,12 +441,17 @@ def produce_tree(config, log_path=None, extra_log=False, debug=False, manual_pol # Setup logging logdir_name = utils.get_logdir_name() log_path = f"logs/magent_battlefield/{logdir_name}" +<<<<<<< HEAD +======= + config["log_path"] = log_path +>>>>>>> aca3e01 (merged from private repo) join = lambda x: os.path.join(log_path, x) os.makedirs(log_path, exist_ok=False) shutil.copy(args.config, join("config.json")) with open(join("seed.log"), "w") as f: f.write(str(args.seed)) +<<<<<<< HEAD best = produce_tree(config, log_path, args.log, args.debug) @@ -356,3 +462,10 @@ def produce_tree(config, log_path=None, extra_log=False, debug=False, manual_pol for index, tree in enumerate(best): print(f"\nagent_{index}:\n{tree}") logger.info(f"\nagent_{index}:\n{tree}") +======= + + best = produce_tree(config, log_path, args.log, args.debug) + + for index, tree in enumerate(best): + print(f"\nagent_{index}:\n{tree}") +>>>>>>> aca3e01 (merged from private repo) diff --git a/src/test/get_interpretability.py b/src/test/get_interpretability.py new file mode 100644 index 000000000..cd82a9c55 --- /dev/null +++ b/src/test/get_interpretability.py @@ -0,0 +1,432 @@ +from importlib.resources import path +# from experiment_launchers.pipeline import * +import pickle +import re +import os +import sys +import gym +import random +import numpy as np +from tqdm import tqdm +import string + + +def random_string(): + return "{}{}{}{}{}{}{}{}".format(*np.random.choice(list(string.ascii_lowercase), 10)) + + +def count_spaces(current): + spaces = 0 + for c in current: + if c == " ": + spaces += 1 + else: + break + return spaces + + +def get_tree(code, parent=None, branch=None): + if len(code) == 0: + return "" + current = code[0] + + # Base case + if "out" in current: + return "{} -->|{}| {}[{}]\n".format(parent, branch, random_string(), current.split("=")[1]) + + if not "if" in current: + return get_tree(code[1:], parent, branch) + + # Recursion + node_id = random_string() + condition = current.replace("if ", "").replace(":", "").replace(" ", "") + # print(condition) + + indentation_level = count_spaces(current) + else_position = None + + for i, n in enumerate(code[1:]): + if count_spaces(n) == indentation_level: + else_position = i + 1 + break + + + left_branch = get_tree(code[1:else_position], node_id, "True") + right_branch = get_tree(code[else_position + 1:], node_id, "False") + + subtree = "" + if parent is None: + left_branch = "```mermaid\ngraph TD\n" + left_branch.replace(" -->", "[{}] -->".format(condition), 1) + else: + subtree = "{} -->|{}| {}[{}]\n".format(parent, branch, node_id, condition) + subtree += left_branch + right_branch + if parent is None: + subtree += "```" + return subtree + + +random.seed(0) +np.random.seed(0) +class Node: + def __init__(self, id_, value=None, left_branch=None, right_branch=None, parent=None): + self._id = id_ + self._value = value + self._left_branch = left_branch + self._right_branch = right_branch + self._parent = parent + self._visits = 0 + + def __repr__(self): + out = "{}[{}]\n".format(self._id, self._value, self._visits) + if self._left_branch is not None: + out += "{} -->|True| {}\n".format(self._id, self._left_branch._id) + out += repr(self._left_branch) + if self._right_branch is not None: + out += "{} -->|False| {}\n".format(self._id, self._right_branch._id) + out += repr(self._right_branch) + return out + + def __str__(self): + return repr(self) + + def get_output(self, input_): + self._visits += 1 + if "_in_" in self._value or "<" in self._value or ">" in self._value: + #print(self._value) + val = re.sub(r"_in_([0-9]*)", "input_[\\1]", self._value) + #print(val) + branch = eval(val) + if branch: + return self._left_branch.get_output(input_) + else: + return self._right_branch.get_output(input_) + else: + return int(self._value) + + +def tree2Node(string): + nodes = {} + + lines = string.split("\n") + + #print(string) + for line in lines[2:-1]: + if " -->" in line: + left, right = line.split(" -->") + true = "True" in right + right = right.replace("|True| ", "").replace("|False| ", "") + + if "[" in left: + left_id_, left_condition = left.split("[") + left_condition = left_condition.replace("]", "") + else: + left_id_ = left + left_condition = None + + if "[" in right: + right_id_, right_condition = right.split("[") + right_condition = right_condition.replace("]", "") + else: + right_id_ = right + right_condition = None + + for id_, value in zip([left_id_, right_id_], [left_condition, right_condition]): + if id_ not in nodes: + nodes[id_] = Node(id_) + if value is not None: + nodes[id_]._value = value + + if true: + nodes[left_id_]._left_branch = nodes[right_id_] + else: + nodes[left_id_]._right_branch = nodes[right_id_] + nodes[right_id_]._parent = nodes[left_id_] + else: + # print(line) + pass + + for n in nodes.values(): + if n._parent is None: + return n + + +def calc_complexity_from_string(code, env, tree2Node=tree2Node): + code = code.replace("/(1.0 - 0.0)", "").replace("- 0.0 ", "").replace("1.0 *", "").replace("+ 0.0 ", "").replace("- -","+ ") + #print(code) + tree = get_tree(code.split("\n")) + #print(tree) + root = tree2Node(tree) + #print(code) + mean_reward = [] + + for episode in (range(5)): + mean_reward.append(0) + e = env + e.seed(episode) + obs = e.reset() + done = False + i = 0 + current_ep_reward = 0 + while not done: + # e.render() + action = root.get_output(obs) if root is not None else 0 + # action = root.get_output([i, *obs]) + i += 1 + + obs, reward, done, _ = e.step(action) + current_ep_reward += reward + mean_reward[-1] += current_ep_reward + + e.close() + # if np.mean(mean_reward) != 500: + # return 0 + #print(f"Mean reward: {np.mean(mean_reward)}") + + + change = True + if root is None: + return 0,None,-200 + while change: + change = False + fringe = [root] + + while len(fringe) > 0: + node = fringe.pop(0) + if node._left_branch is not None and node._right_branch is not None: + if node._left_branch._visits == 0: + # print(node, "has 0 visits") + if node._parent is not None: + if node == node._parent._left_branch: + node._parent._left_branch = node._right_branch + else: + node._parent._right_branch = node._right_branch + else: + root = node._right_branch + node._right_branch._parent = node._parent + fringe.append(node._right_branch) + change = True + elif node._right_branch._visits == 0: + if node._parent is not None: + if node == node._parent._left_branch: + node._parent._left_branch = node._left_branch + else: + node._parent._right_branch = node._left_branch + else: + root = node._left_branch + node._left_branch._parent = node._parent + fringe.append(node._left_branch) + change = True + else: + # print(node._left_branch._visits) + fringe.append(node._left_branch) + fringe.append(node._right_branch) + else: + # print("Not entered in {}".format(node)) + pass + + change = True + while change: + change = False + fringe = [root] + + while len(fringe) > 0: + node = fringe.pop(0) + if node._left_branch is not None and node._right_branch is not None: + # print(node._left_branch._value, len(node._left_branch._value)) + if node._left_branch._value == node._right_branch._value and len(node._left_branch._value) == 1: + node._value = node._left_branch._value + node._left_branch = None + node._right_branch = None + change = True + else: + fringe.append(node._left_branch) + fringe.append(node._right_branch) + + l = 0 + no = 0 + nnao = 0 + ncnao = 0 + + fringe = [root] + nodes_seen = 0 + while len(fringe) > 0: + node = fringe.pop(0) + nodes_seen += 1 + + if "_in_" in node._value: + parts = node._value.replace("(", "").replace(")", "").replace("+-", "- ").replace(" ", " ").split(" ") + # TODO: <26-11-20, leonardo> # This works only for the type of trees used in the paper. Check it for different trees. + l += 1 + for p in parts: + l += 1 + if len(p) == 1 and p in ["+", "-", "*", "/", "<", ">"]: + no += 1 + if p == "<" or p == ">": + nnao += 1 + ncnao += 1 + """ + l += 4 + no += 2 + nnao += 2 + ncnao += 2 + """ + no += 1 + nnao += 1 + ncnao += 1 + else: + l += 1 + + if node._left_branch is not None: + fringe.append(node._left_branch) + if node._right_branch is not None: + fringe.append(node._right_branch) + # print(root) + return (-0.2 + 0.2 * l + 0.5 * no + 3.4 * nnao + 4.5 * ncnao), root,np.mean(mean_reward) + +class Node_m_to_p: + def __init__(self, condition): + self.condition = condition + self.left = None + self.right = None + + def print(self, ind=0): + if self.left is not None: + string = f"{' '*ind}if {self.condition}:\n" + self.left.print(ind + 4) + f"\n{' '*ind}else:\n" + self.right.print(ind + 4) + return string + else: + return " "*ind + f"out={str(self.condition)}" + + +def convert(string): + root = None + nodes = {} + + for l in string.split("\n"): + if len(l) == 0: + continue + if "-->" in l: + from_, branch, to = l.split(" ") + + if "True" in branch: + nodes[from_].left = nodes[to] + else: + nodes[from_].right = nodes[to] + else: + id_, value = l.split("[") + value = value[:-1] + nodes[id_] = Node_m_to_p(value) + if len(nodes) == 1: + root = nodes[id_] + return root + +def fit_tree(tree): + tree = str(tree) + print(tree) + tree = tree.replace(" [","[") + tree = tree.replace("[0]","0") + tree = tree.replace("[1]","1") + tree = tree.replace("[2]","2") + tree = tree.replace("[3]","3") + tree = tree.replace("[4]","4") + tree = tree.replace("[5]","5") + tree = tree.replace("input","_in") + tree = tree.replace("Node_m_to_p object at ","") + tree = re.sub("\(.*?\)","",tree) + #tree = re.sub("\<.*?\>","",tree) + tree = tree.replace("\nOneHotEncoder\n\n","") + tree = tree.replace("RLDecisionTree\n","") + tree = tree[:tree.rfind('\n')] + tree = tree[:tree.rfind('\n')] + tree = tree[:tree.rfind('\n')] + print(tree) + return tree + +if __name__ == "__main__": + #path_dir = "/home/matteo/marl_dts/src/logs/CartPole-ME_pyRibs_3/28-05-2022_18-38-45_xvkuotpf/" + path_dir = "/home/matteo/marl_dts/src/logs/MountainCar-ME_pyRibs_7/29-05-2022_18-27-08_jlnjexmc" + dir_list = os.listdir(path_dir) + #dir_list = [f for f in dir_list if ".pkl" in f and "best" not in f] + dir_list = [f for f in dir_list if ".pkl" in f] + inter_list = [] + outmax = (0,None,-250) + for f in dir_list: + fw = open(os.path.join(path_dir, f), "rb") + tree = pickle.load(fw) + tree = str(tree) + print(tree) + tree = tree.replace(" [","[") + tree = tree.replace("[0]","0") + tree = tree.replace("[1]","1") + tree = tree.replace("[2]","2") + tree = tree.replace("[3]","3") + tree = tree.replace("[4]","4") + tree = tree.replace("[5]","5") + tree = tree.replace("input","_in") + tree = tree.replace("Node_m_to_p object at ","") + tree = re.sub("\(.*?\)","",tree) + #tree = re.sub("\<.*?\>","",tree) + tree = tree.replace("\nOneHotEncoder\n\n","") + tree = tree.replace("RLDecisionTree\n","") + tree = tree[:tree.rfind('\n')] + tree = tree[:tree.rfind('\n')] + tree = tree[:tree.rfind('\n')] + print(tree) + root = convert(tree) + #print(root) + #print(f) + out = calc_complexity_from_string(root.print(),"MountainCar-v0") + if out[2] > outmax[2] or out[2] >= 475: + print(fw) + outmax=out + print(out) + inter_list.append(out[0]) + print("\n") + fw.close() + print(inter_list) + lines = [] + with open(path_dir+"/all.txt") as f: + lines = f.readlines() + #lines.pop(0) + lines = [l for l in lines if l != "\n" and "Gen Index Fitness" not in l] + for i,l in enumerate(lines): + lines[i] = lines[i].replace('(',"") + lines[i] = lines[i].replace(')',"") + lines[i] = lines[i].replace(',',"") + lines[i] = lines[i].replace('\n',"") + lines[i] = lines[i].split(" ") + lines[i] = [int(lines[i][0]),[int(lines[i][1]),int(lines[i][2])],float(lines[i][3])] + final_gen = max([item[0] for item in lines]) + lines = [item for item in lines if item[0]==final_gen] + counter = 0 + for l in lines: + if l[2]>= -110: + counter += 1 + print("NUMBER OF TREES THAT RESOLVE THE TASK: ", counter) + if len(inter_list)>0: + print("MAX = ",max(inter_list)) + print("MIN = ",min(inter_list)) + print("MEAN = ",np.mean(inter_list)) + print("VARIANCE = ",np.var(inter_list)) + # fl = open("/home/matteo/marl_dts/src/logs/MountainCar-GP_4/14-06-2022_18-35-47_jxoemoti/2123402292240.pkl","rb") + # tree = pickle.load(fl) + # tree = str(tree) + # #print(tree) + # print(tree) + # tree = tree.replace(" [","[") + # tree = tree.replace("[0]","0") + # tree = tree.replace("[1]","1") + # tree = tree.replace("[2]","2") + # tree = tree.replace("[3]","3") + # tree = tree.replace("[4]","4") + # tree = tree.replace("[5]","5") + # tree = tree.replace("input","_in") + # tree = re.sub("\(.*?\)","",tree) + # tree = re.sub("\<.*?\>","",tree) + # tree = tree.replace("\nOneHotEncoder\n\n","") + # tree = tree.replace("RLDecisionTree\n","") + # print(tree) + # root = convert(tree) + # #print(root.print()) + # # # print(calc_complexity_from_string(root.print(),"CartPole-v1")) + + diff --git a/src/test/plot_runs.py b/src/test/plot_runs.py new file mode 100644 index 000000000..0af87519a --- /dev/null +++ b/src/test/plot_runs.py @@ -0,0 +1,34 @@ +import os +import sys +import matplotlib.pyplot as plt +sys.path.append(".") +from math import sqrt +from test_environments import * +import numpy as np + + +if __name__ == "__main__": + + import argparse + import json + import shutil + import yaml + import utils + # the log file si a csv file with the following format: + # + + + df = pd.read_csv(log_file) + df = df.sort_values(by=['Generation']) + figure, ax= plt.subplots() + ax.plot(df['Generation'].to_list(), df['Min'].to_list(), label='Min') + ax.errorbar(df['Generation'].to_list(), df['Mean'].to_list(), yerr = df["Std"].to_list() , label='Mean') + ax.plot(df['Generation'].to_list(), df['Max'].to_list(), label='Max') + + ax.set_xlabel('Generation') + ax.set_ylabel('Fitness') + ax.set_title('Fitness over generations') + ax.legend() + ax.grid(True) + plt.savefig(os.path.join(os.path.dirname(log_file), "_fitness.png")) + plt.show() \ No newline at end of file diff --git a/src/test/test_environments.py b/src/test/test_environments.py new file mode 100644 index 000000000..1eda707e0 --- /dev/null +++ b/src/test/test_environments.py @@ -0,0 +1,94 @@ +from magent2.environments import battlefield_v5 +import pettingzoo +from pettingzoo.utils import BaseParallelWrapper +from pettingzoo.utils.conversions import parallel_to_aec, aec_to_parallel +import numpy as np +import random +import pandas as pd +import os +import time +from utils.print_outputs import * + +''' +['__annotations__', '__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', +'__getattribute__', '__getstate__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', +'__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__setstate__', '__sizeof__', +'__str__', '__subclasshook__', '__weakref__', '_all_handles', '_calc_obs_shapes', '_calc_state_shape', '_compute_observations', +'_compute_rewards', '_compute_terminates', '_ezpickle_args', '_ezpickle_kwargs', '_minimap_features', '_renderer', +'_zero_obs', 'action_space', 'action_spaces', 'agents', 'base_state', 'close', 'env', 'extra_features', 'frames', +'generate_map', 'handles', 'leftID', 'map_size', 'max_cycles', 'max_num_agents', 'metadata', 'minimap_mode', 'num_agents', +'observation_space', 'observation_spaces', 'possible_agents', 'render', 'render_mode', 'reset', 'rightID', 'seed', 'state', +'state_space', 'step', 'team_sizes', 'unwrapped'] +''' + +class Quick_Test(): + def __init__(self, config) -> None: + self._env = battlefield_v5.parallel_env(**config['environment']) + print_debugging(self._env.metadata) + self._episodes = 1000 + + def run_env_test(self): + total_reward = 0 + completed_episodes = 0 + render = True + actions = {agent: self._env.action_spaces[agent].sample() for agent in self._env.agents} + obs, reward, termination, truncation, _ = self._env.step(actions) + + + while completed_episodes < self._episodes: + obs = self._env.reset() + for agent in self._env.agents: + if render: + self._env.render() + obs_ = obs[agent] + rew = reward[agent] + term = termination[agent] + trunc = truncation[agent] + total_reward += rew + + if agent == "blue_0": + print_debugging(obs_) + print_debugging(rew) + print_debugging(term) + print_debugging(trunc) + + if term or trunc: + action = None + elif isinstance(obs, dict) and "action_mask" in obs: + action = random.choice(np.flatnonzero(obs["action_mask"]).tolist()) + else: + action = self._env.action_space(agent).sample() + actions[agent] = action + obs, reward, termination, truncation, _ =self._env.step(actions) + print_configs(obs['blue_0']) + print_debugging(self._env.num_agents) + completed_episodes += 1 + + if render: + self._env.close() + + print("Average total reward", total_reward / self._episodes) + return 0 + +class Battlefield(): + def __init__(self, config) -> None: + self._env = battlefield_v5.env(**config['environment']).unwrapped + + +if __name__ == "__main__": + import argparse + import json + import shutil + import yaml + import utils + + # Load the config file + file = "src/QD_MARL/configs/battlefield.json" + config = json.load(open(file)) + + # Set the random seed + random.seed(1) + np.random.seed(1) + + quick_test = Quick_Test(config) + quick_test.run_env_test() \ No newline at end of file