IKH-Combining-Prior-Policies-to-Solve-New-Tasks

This is the implementation of the auxiliary paper titled: "I Know How: Combining Prior Policies to Solve New Tasks" accepted by IEEE Conference on Games (CoG) 2024

About

The repository presents a novel way of reusing past knowledge learned by RL agents on old tasks. RL agents are efficient to solve simple tasks, but if new tasks becomes more and more complex to be solved, training each time from scratch is not a viable or sustainable solution.

In the IKH framework, we stores a set of pre-trained policies on simple tasks as a set of basic skills the agents can reuse. For a new arrived task, we train a higher-level policy to combine the actions drawn from skills, and the final actions is used to interact with the environment.

Illustration of the implementation of the IKH framework used in this work. Given a set of pre-trained policies on auxiliary tasks, Master Policy predicts the weights w to assign at each action from the policies to define agents' behavior.

Getting Start

1. Clone the repository

git clone --recursive https://github.com/xiaoli98/IKH-Combining-Prior-Policies-to-Solve-New-Tasks.git
cd IKH-Combining-Prior-Policies-to-Solve-New-Tasks

2. Install custom version of HighwayEnv and stable-baselines3

pip install -e ./HighwayEnv -e ./stable-baslines3

3. Install other requirements

pip install -r requirements.txt

4. Running experiments

You can find some examples in experiments*.sh files. Some pretrained skills can be found in src/checkpoint, which are ready to use for sac_master.py and sac_pnn.py To train your own skills:

python3 src/sac.py

To see configurable parameters use:

python3 src/sac.py -h

Changes in HighwayEnv and stable-baselines3

To adapt to our framework we modified the environment of HighwayEnv and stable-baselines3

Highway-env

We added two additional environments: indiana-v0 and lane-centering-v0, other environments are similar to the original ones, with environment reset when the controlled vehicle is going off-track and customized reward function.

indiana-v0

lane-centering-v0

stable-baselines3

We modified stable-baselines3's SAC algorithm to adapt it to act as the Master Policy, therefore receives actions from skills and combines them to obtain the final action.

Citation

@misc{li2024iknowhowcombining,
    title={I Know How: Combining Prior Policies to Solve New Tasks}, 
    author={Malio Li and Elia Piccoli and Vincenzo Lomonaco and Davide Bacciu},
    year={2024},
    eprint={2406.09835},
    archivePrefix={arXiv},
    primaryClass={cs.LG},
    url={https://arxiv.org/abs/2406.09835}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
HighwayEnv @ 21ecb21		HighwayEnv @ 21ecb21
images		images
src		src
stable-baselines3 @ 81c5167		stable-baselines3 @ 81c5167
.gitmodules		.gitmodules
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

IKH-Combining-Prior-Policies-to-Solve-New-Tasks

About

Getting Start

1. Clone the repository

2. Install custom version of HighwayEnv and stable-baselines3

3. Install other requirements

4. Running experiments

Changes in HighwayEnv and stable-baselines3

Highway-env

indiana-v0

lane-centering-v0

stable-baselines3

Citation

About

Releases

Packages

Languages

xiaoli98/I-Know-How

Folders and files

Latest commit

History

Repository files navigation

IKH-Combining-Prior-Policies-to-Solve-New-Tasks

About

Getting Start

1. Clone the repository

2. Install custom version of HighwayEnv and stable-baselines3

3. Install other requirements

4. Running experiments

Changes in HighwayEnv and stable-baselines3

Highway-env

indiana-v0

lane-centering-v0

stable-baselines3

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages