Inverse Reinforcement Learning using Diffusion models in Trajectory Space

This reposity contains the code for the MSc Thesis on using the Diffuser for Inverse Reinforcement Learning. The thesis can be accessed here.

This repo results from a fork from the original Diffuser repository. Development of our method for all environments was done in the original maze2d branch, and merged into the main branch at the end of the project. The "cluster" branch contains the code for machines with CUDA.

Updates

06/10/2024: Merged development into main branch (from maze2d branch).

Installation

Conda environment

conda env create -f environment.yml
conda activate diffuser
pip install -e .

Mujoco Installation

Download mujoco210 from https://github.com/google-deepmind/mujoco/releases/tag/2.1.0 , extract it and copy it to ~/.mujoco/mujoco210. Download the mujoco key file from https://www.roboti.us/file/mjkey.txt and add it to ~/.mujoco.

Run the folllowing three commands:

conda install -c conda-forge glew
conda install -c conda-forge mesalib
conda install -c menpo glfw3

Then add the following three lines to .bashrc.

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:~/.mujoco/mujoco210/bin
export LD_PRELOAD=$LD_PRELOAD:~/miniconda3/envs/diffuser/lib/libstdc++.so.6
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib/nvidia

Running Code

The scripts/ folder contains all the scripts to generate the results presented in the MSc Thesis. The u_maze/ folder presents the scripts for experiments on the U-Maze Maze2D environment, the large_maze/ folder contains the scripts for experiments on the Large Maze Maze2d environment, the locomotion/ folder contains the scripts for the Mujoco Locomotion environments (including HalfCheetah), and the evaluations/ folder contains code for performance (reward and ERC) analysis, and visualisation of learnt behaviour.

Diffuser Training

To train the Base Diffuser, run the appropriate script for your choice of environment.

python scripts/{CHOICE_OF_ENV}/train.py

For HalfCheetah, you can add a dataset flag such as --dataset halfcheetah-medium-replay-v2 for your choice of dataset.

Reward Model Learning

To learn a reward model, firstly run the set-up script initiate_value.py, followed by either guided_learning_reward.py for MSE Loss or guided_learning_mmd.py for MMD Loss.

python scripts/{CHOICE_OF_ENV}/initiate_value.py
python scripts/{CHOICE_OF_ENV}/guided_learning_reward.py  # MSE Loss
python scripts/{CHOICE_OF_ENV}/guided_learning_mmd.py  # MMD Loss

Guided Planning with Learnt Reward Model

To create trajectories/rollouts obtained from planning (using the Diffuser) with a learnt reward model, run:

python scripts/{CHOICE_OF_ENV}/guided_learnt_reward.py # To generate only 1 trajectory
python scripts/{CHOICE_OF_ENV}/parallel/guided_learnt_reward.py # To generate multiple trajectories

Acknowledgements

This project was done as part of my MSc Thesis for the MSc in Machine Learning at University College London (UCL). This work was done as part of Ilija Bogunovic's Group, and under the supervision of Dr Bogunovic, William Bankes and Lorenz Wolf.

Name		Name	Last commit message	Last commit date
Latest commit History 82 Commits
baselines		baselines
config		config
diffuser		diffuser
scripts		scripts
.gitignore		.gitignore
README.md		README.md
diffuser_sample.ipynb		diffuser_sample.ipynb
environment.yml		environment.yml
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Inverse Reinforcement Learning using Diffusion models in Trajectory Space

Installation

Conda environment

Mujoco Installation

Running Code

Diffuser Training

Reward Model Learning

Guided Planning with Learnt Reward Model

Acknowledgements

About

Uh oh!

Releases

Packages

Languages

Sam-Oliveira/diffuser_irl

Folders and files

Latest commit

History

Repository files navigation

Inverse Reinforcement Learning using Diffusion models in Trajectory Space

Installation

Conda environment

Mujoco Installation

Running Code

Diffuser Training

Reward Model Learning

Guided Planning with Learnt Reward Model

Acknowledgements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages