Skip to content

A docker environment and notebooks to experiment with Generative Adversarial Imitation Learning and Formal Methods

Notifications You must be signed in to change notification settings

nicholasRenninger/GAIL-Formal_Methods

Repository files navigation

GAIL-Formal_Methods

A project experimenting with Generative Adversarial Imitation Learning (GAIL) and Formal Methods.

Currently, the container-based environment has been tested to work on both Ubuntu (GPU / CPU) and macOS (CPU-only) hosts.

Table of Contents

About

This repo contains the docker container and python code to fully experiment with GAIL. The whole experiment is contained in GAIL_testing.ipynb.

This project is based on stable-baselines, OpenAI Gym, MiniGym, tensorflow, PRISM, and wombats

I will likely be changing to the imitation library instead of stable-baselines for the GAIL implementation, as stable-baselines has decided to drop support for GAIL and also imitation has a PPO-based GAIL learned (definitely better than the older TRPO GAIL learner in stable-baselines).

Results

Here are some of the results from the GAIL experiments. Right now, I have a small bug somewhere in the training of GAIL, so it does not work - I've been trying to fix GAIL for weeks now. On the bright side, I think I just accidentally created an extremely powerful, general-purpose reinforcement learning algorithm to become the mathematically optimal game troll.

Final Policies

Here are videos of the agents one of the DeepMind AI Safety environments. Here, the agent must get to the green goal while always avoiding the lava.

Expert Policy

Imitation Learner Policy


Expert Demonstrator Training

To get an expert demonstrator for this environment, I used the stable-baselines PPO2 implementation. See the jupyter notebook for hyperparameters.

Expert Episodic Reward

The final PPO2 training episodic, non-discounted reward as a function of training step.

Expert Entropy Loss

The final PPO2 entropy loss as a function of training step.


Imitation Learner Training

To train an imitation learner for this environment, I used the stable-baselines GAIL implementation. See the jupyter notebook for hyperparameters.

Learner Episodic Reward

The final GAIL training episodic, non-discounted reward as a function of training step.

Learner Discriminator Classification Loss

The final GAIL discriminator classification loss as a function of training step.

Learner Internal Adversarial Reward

The final GAIL policy network discounted ”reward” signal from the descriminator as a function of training step.

Methodology

Basically, you first train an expert agent using RL (in this case with PPO2), collect sampled trajectories from the trained expert, and then train the imitation learner (in this case with GAIL) using those state-action pairs. GAIL has access to the environment as a dynamics model, but not the reward signal. It must train a robust policy using only the expert demonstrations as the specification of the task.

Container Usage

  • run with a GPU-enabled image and start a jupyter notebook server with default network settings:

    ./docker_scripts/run_docker.sh --device=gpu
  • run with a CPU-only image and start a jupyter notebook server with default network settings:

    ./docker_scripts/run_docker.sh --device=cpu
  • run with a GPU-enabled image with the jupyter notebook served over a desired host port, in this example, port 8008, with tensorboard configured to run on port 6996. You might do this if you have other services on your host machine running over localhost:8888 and/or localhost:6666:

    ./docker_scripts/run_docker.sh --device=gpu --jupyterport=8008 --tensorboardport=6996
  • run with a GPU-enabled image and drop into the terminal:

    ./docker_scripts/run_docker.sh --device=gpu bash
  • run a bash command in a CPU-only image interactively:

    ./docker_scripts/run_docker.sh --device=cpu $OPTIONAL_BASH_COMMAND_FOR_INTERACTIVE_MODE
  • run a bash command in a GPU-enabled image interactively:

    ./docker_scripts/run_docker.sh --device=gpu $OPTIONAL_BASH_COMMAND_FOR_INTERACTIVE_MODE

Accessing the Jupyter and Tensorboard Servers

To access the jupyter notebook: make sure you can access port 8008 on the host machine and then modify the generated jupyter url:

http://localhost:8888/?token=TOKEN_STRING

with the new, desired port number:

http://localhost:8008/?token=TOKEN_STRING

and paste this url into the host machine's browser.

To access tensorboard: make sure you can access port 6996 on the host machine and then modify the generated tensorboard url:

(e.g. TensorBoard 1.15.0)

http://0.0.0.0:6006/

with the new, desired port number:

http://localhost:6996

and paste this url into the host machine's browser.

Installation

This repo houses a docker container with jupyter and tensorbaord services running. If you have a NVIDIA GPU, check here to see if your GPU can support CUDA. If so, then you can use the GPU-only instruction below.

Install Docker and Pre-requisties

Follow steps one (and two if you have a CUDA-enabled GPU) from this guide from tensorflow to prepare your computer for the tensorflow docker base container images. Don't actually install the tensorflow container, that will happen automatically later.

Post-installation

Follow the *nix docker post-installation guide.

Building the Container

Now that you have docker configured, you can need to clone this repo. Pick your favorite directory on your computer (mine is /$HOME/Downloads ofc) and run:

git clone --recurse-submodules https://github.com/nicholasRenninger/GAIL-Formal_Methods
cd GAIL-Formal_Methods

The container builder uses make:

  • If you have a CUDA-enabled GPU and thus you followed step 2 of the docker install section above, then run:
make docker-gpu
  • If you don't have a CUDA-enabled GPU and thus you didn't follow step 2 of the docker install section above, then run:
make docker-cpu

About

A docker environment and notebooks to experiment with Generative Adversarial Imitation Learning and Formal Methods

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published