This is the implementation of the auxiliary paper titled: "I Know How: Combining Prior Policies to Solve New Tasks" accepted by IEEE Conference on Games (CoG) 2024
The repository presents a novel way of reusing past knowledge learned by RL agents on old tasks. RL agents are efficient to solve simple tasks, but if new tasks becomes more and more complex to be solved, training each time from scratch is not a viable or sustainable solution.
In the IKH framework, we stores a set of pre-trained policies on simple tasks as a set of basic skills the agents can reuse. For a new arrived task, we train a higher-level policy to combine the actions drawn from skills, and the final actions is used to interact with the environment.
Illustration of the implementation of the IKH framework used in this work. Given a set of pre-trained policies on auxiliary tasks, Master Policy predicts the weights w to assign at each action from the policies to define agents' behavior.
git clone --recursive https://github.com/xiaoli98/IKH-Combining-Prior-Policies-to-Solve-New-Tasks.git
cd IKH-Combining-Prior-Policies-to-Solve-New-Tasks
pip install -e ./HighwayEnv -e ./stable-baslines3
pip install -r requirements.txt
You can find some examples in experiments*.sh files. Some pretrained skills can be found in src/checkpoint, which are ready to use for sac_master.py and sac_pnn.py To train your own skills:
python3 src/sac.py
To see configurable parameters use:
python3 src/sac.py -h
To adapt to our framework we modified the environment of HighwayEnv and stable-baselines3
We added two additional environments: indiana-v0 and lane-centering-v0, other environments are similar to the original ones, with environment reset when the controlled vehicle is going off-track and customized reward function.
We modified stable-baselines3's SAC algorithm to adapt it to act as the Master Policy, therefore receives actions from skills and combines them to obtain the final action.
@misc{li2024iknowhowcombining,
title={I Know How: Combining Prior Policies to Solve New Tasks},
author={Malio Li and Elia Piccoli and Vincenzo Lomonaco and Davide Bacciu},
year={2024},
eprint={2406.09835},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2406.09835},
}