A project experimenting with Generative Adversarial Imitation Learning (GAIL) and Formal Methods.
Currently, the container-based environment has been tested to work on both Ubuntu (GPU / CPU) and macOS (CPU-only) hosts.
Table of Contents
This repo contains the docker container and python code to fully experiment with GAIL. The whole experiment is contained in GAIL_testing.ipynb
.
This project is based on stable-baselines, OpenAI Gym, MiniGym, tensorflow, PRISM, and wombats
I will likely be changing to the imitation library instead of stable-baselines for the GAIL implementation, as stable-baselines has decided to drop support for GAIL and also imitation has a PPO-based GAIL learned (definitely better than the older TRPO GAIL learner in stable-baselines).
Here are some of the results from the GAIL experiments. Right now, I have a small bug somewhere in the training of GAIL, so it does not work - I've been trying to fix GAIL for weeks now. On the bright side, I think I just accidentally created an extremely powerful, general-purpose reinforcement learning algorithm to become the mathematically optimal game troll.
Here are videos of the agents one of the DeepMind AI Safety environments. Here, the agent must get to the green goal while always avoiding the lava.
Expert Policy
Imitation Learner Policy
To get an expert demonstrator for this environment, I used the stable-baselines PPO2 implementation. See the jupyter notebook for hyperparameters.
Expert Episodic Reward
The final PPO2 training episodic, non-discounted reward as a function of training step.
Expert Entropy Loss
The final PPO2 entropy loss as a function of training step.
To train an imitation learner for this environment, I used the stable-baselines GAIL implementation. See the jupyter notebook for hyperparameters.
Learner Episodic Reward
The final GAIL training episodic, non-discounted reward as a function of training step.
Learner Discriminator Classification Loss
The final GAIL discriminator classification loss as a function of training step.
Learner Internal Adversarial Reward
The final GAIL policy network discounted ”reward” signal from the descriminator as a function of training step.
Basically, you first train an expert agent using RL (in this case with PPO2), collect sampled trajectories from the trained expert, and then train the imitation learner (in this case with GAIL) using those state-action pairs. GAIL has access to the environment as a dynamics model, but not the reward signal. It must train a robust policy using only the expert demonstrations as the specification of the task.
-
run with a GPU-enabled image and start a jupyter notebook server with default network settings:
./docker_scripts/run_docker.sh --device=gpu
-
run with a CPU-only image and start a jupyter notebook server with default network settings:
./docker_scripts/run_docker.sh --device=cpu
-
run with a GPU-enabled image with the jupyter notebook served over a desired host port, in this example, port 8008, with tensorboard configured to run on port 6996. You might do this if you have other services on your host machine running over
localhost:8888
and/orlocalhost:6666
:./docker_scripts/run_docker.sh --device=gpu --jupyterport=8008 --tensorboardport=6996
-
run with a GPU-enabled image and drop into the terminal:
./docker_scripts/run_docker.sh --device=gpu bash
-
run a bash command in a CPU-only image interactively:
./docker_scripts/run_docker.sh --device=cpu $OPTIONAL_BASH_COMMAND_FOR_INTERACTIVE_MODE
-
run a bash command in a GPU-enabled image interactively:
./docker_scripts/run_docker.sh --device=gpu $OPTIONAL_BASH_COMMAND_FOR_INTERACTIVE_MODE
To access the jupyter notebook: make sure you can access port 8008 on the host machine and then modify the generated jupyter url:
http://localhost:8888/?token=TOKEN_STRING
with the new, desired port number:
http://localhost:8008/?token=TOKEN_STRING
and paste this url into the host machine's browser.
To access tensorboard: make sure you can access port 6996 on the host machine and then modify the generated tensorboard url:
(e.g. TensorBoard 1.15.0)
http://0.0.0.0:6006/
with the new, desired port number:
http://localhost:6996
and paste this url into the host machine's browser.
This repo houses a docker container with jupyter
and tensorbaord
services running. If you have a NVIDIA GPU, check here to see if your GPU can support CUDA. If so, then you can use the GPU-only instruction below.
Follow steps one (and two if you have a CUDA-enabled GPU) from this guide from tensorflow to prepare your computer for the tensorflow docker base container images. Don't actually install the tensorflow container, that will happen automatically later.
Follow the *nix docker post-installation guide.
Now that you have docker configured, you can need to clone this repo. Pick your favorite directory on your computer (mine is /$HOME/Downloads
ofc) and run:
git clone --recurse-submodules https://github.com/nicholasRenninger/GAIL-Formal_Methods
cd GAIL-Formal_Methods
The container builder uses make
:
- If you have a CUDA-enabled GPU and thus you followed step 2 of the docker install section above, then run:
make docker-gpu
- If you don't have a CUDA-enabled GPU and thus you didn't follow step 2 of the docker install section above, then run:
make docker-cpu