Advancing Towards Safe Reinforcement Learning over Sparse Environments with Out-Of-Distribution Observations: Detection and Adaptation Strategies

Installation

Clone this repository (tested with Python 3.10).
Install the dependencies with pip3, so that gym-minigrid(environment), pytorch and other necessary packages/libraries are installed:

pip3 install -r requirements.txt

Note: This code was accordingly modified from: https://github.com/lcswillems/rl-starter-files. The torch_ac holds the same structure but does not work as the original implementation. Nevertheless, the example of usage is almost straightforward.

Reproduce results

To reproduce the results, run the experiments on simulation_scripts folder in this order:

MN5S8_train.sh
MN5S8_train_with_ssl.sh
All RQ1 scripts
All RQ2 scripts

IMPORTANT NOTE: The GPU ids are arbitrary, so got into the .sh scripts and to the end of the line to change those as --gpu-id <gpu_id_wanted>

Example of use

In the simulation_scripts folder are provided the necessary scripts to train the agent with some intrinsic motivation techniques. Use either 1st or episdodic ones for better results.

Training Agent

An example of a single simulation is as follows:

python3 -m scripts.train --model KS3R3_c_im0005_ent00005_1 --seed 1  --save-interval 10 --frames 30000000  --env 'MiniGrid-KeyCorridorS3R3-v0' --intrinsic-motivation 0.005 --im-type 'counts' --entropy-coef 0.0005 --normalize-intrinsic-bonus 0 --separated-networks 0

The hyperparameters and different criterias can me modified by referencing them directly from the command line (or by modifying the default values directly in the scripts/train.py). Some of the most important hyperparameters are:

--env: environment to be used
--frames: the number of frames/timesteps to be run
--seed: the seed used to reproduce results
--im_type: specifies which kind of intrinsic motivation module is going to be used to compute the intrinsic reward
--intrinsic_motivation: the intrinsic coefficient value
--separated-networks: used to determine if the actor-critic agent will be trained with a single two-head CNN architecture or with two independent networks
--model: the directory where the logs and the models are saved

For example, setting the --intrinsic_motivation 0 means that the agent will be trained without intrinsic rewards.

Evaluation of an agent

Example 1 (using the default environments)

python3 -m scripts.evaluate --model MN7S8_c_1st_im0005_ent00005_1 --env MiniGrid-MultiRoom-N7-S8-v0 --num_episodes 10

Evaluates the trained model MN7S8_c_1st_im0005_ent00005_1 over the environment MiniGrid-MultiRoom-N7-S8-v0 with 10 different random seeds.

Example 2 (pre-loaded numpy environment/maze)

By the virtue of using:

--env-list: loads an environment/maze that is codified in .npy format. The file has to be stored at /numpyworldfiles. It does not require to use --env.

python3 -m scripts.evaluate --model MN3S10_c_1st_im0005_ent00005_1 --env-list MN3S38_test_lava

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.vscode		.vscode
gym_minigrid		gym_minigrid
numpyworldfiles		numpyworldfiles
ood_storage		ood_storage
paper_figures		paper_figures
scripts		scripts
simulations_scripts		simulations_scripts
torch_ac		torch_ac
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
constants.py		constants.py
env_map_visualizer_editer.py		env_map_visualizer_editer.py
model.py		model.py
plots_rq1.ipynb		plots_rq1.ipynb
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Advancing Towards Safe Reinforcement Learning over Sparse Environments with Out-Of-Distribution Observations: Detection and Adaptation Strategies

Installation

Reproduce results

Example of use

Training Agent

Evaluation of an agent

Example 1 (using the default environments)

Example 2 (pre-loaded numpy environment/maze)

About

Releases

Packages

Languages

License

aitor-martinez-seras/ood-sparse-rl

Folders and files

Latest commit

History

Repository files navigation

Advancing Towards Safe Reinforcement Learning over Sparse Environments with Out-Of-Distribution Observations: Detection and Adaptation Strategies

Installation

Reproduce results

Example of use

Training Agent

Evaluation of an agent

Example 1 (using the default environments)

Example 2 (pre-loaded numpy environment/maze)

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages