GAIL-Formal_Methods

A project experimenting with Generative Adversarial Imitation Learning (GAIL) and Formal Methods.

Currently, the container-based environment has been tested to work on both Ubuntu (GPU / CPU) and macOS (CPU-only) hosts.

Table of Contents

About

This repo contains the docker container and python code to fully experiment with GAIL. The whole experiment is contained in GAIL_testing.ipynb.

This project is based on stable-baselines, OpenAI Gym, MiniGym, tensorflow, PRISM, and wombats

I will likely be changing to the imitation library instead of stable-baselines for the GAIL implementation, as stable-baselines has decided to drop support for GAIL and also imitation has a PPO-based GAIL learned (definitely better than the older TRPO GAIL learner in stable-baselines).

Results

Here are some of the results from the GAIL experiments. Right now, I have a small bug somewhere in the training of GAIL, so it does not work - I've been trying to fix GAIL for weeks now. On the bright side, I think I just accidentally created an extremely powerful, general-purpose reinforcement learning algorithm to become the mathematically optimal game troll.

Final Policies

Here are videos of the agents one of the DeepMind AI Safety environments. Here, the agent must get to the green goal while always avoiding the lava.

Expert Policy

Imitation Learner Policy

Expert Demonstrator Training

To get an expert demonstrator for this environment, I used the stable-baselines PPO2 implementation. See the jupyter notebook for hyperparameters.

Expert Episodic Reward

The final PPO2 training episodic, non-discounted reward as a function of training step.

Expert Entropy Loss

The final PPO2 entropy loss as a function of training step.

Imitation Learner Training

To train an imitation learner for this environment, I used the stable-baselines GAIL implementation. See the jupyter notebook for hyperparameters.

Learner Episodic Reward

The final GAIL training episodic, non-discounted reward as a function of training step.

Learner Discriminator Classification Loss

The final GAIL discriminator classification loss as a function of training step.

Learner Internal Adversarial Reward

The final GAIL policy network discounted ”reward” signal from the descriminator as a function of training step.

Methodology

Basically, you first train an expert agent using RL (in this case with PPO2), collect sampled trajectories from the trained expert, and then train the imitation learner (in this case with GAIL) using those state-action pairs. GAIL has access to the environment as a dynamics model, but not the reward signal. It must train a robust policy using only the expert demonstrations as the specification of the task.

Container Usage

run with a GPU-enabled image and start a jupyter notebook server with default network settings:
```
./docker_scripts/run_docker.sh --device=gpu
```
run with a CPU-only image and start a jupyter notebook server with default network settings:
```
./docker_scripts/run_docker.sh --device=cpu
```
run with a GPU-enabled image with the jupyter notebook served over a desired host port, in this example, port 8008, with tensorboard configured to run on port 6996. You might do this if you have other services on your host machine running over localhost:8888 and/or localhost:6666:
```
./docker_scripts/run_docker.sh --device=gpu --jupyterport=8008 --tensorboardport=6996
```
run with a GPU-enabled image and drop into the terminal:
```
./docker_scripts/run_docker.sh --device=gpu bash
```

run a bash command in a CPU-only image interactively:

./docker_scripts/run_docker.sh --device=cpu $OPTIONAL_BASH_COMMAND_FOR_INTERACTIVE_MODE

run a bash command in a GPU-enabled image interactively:

./docker_scripts/run_docker.sh --device=gpu $OPTIONAL_BASH_COMMAND_FOR_INTERACTIVE_MODE

Accessing the Jupyter and Tensorboard Servers

To access the jupyter notebook: make sure you can access port 8008 on the host machine and then modify the generated jupyter url:

http://localhost:8888/?token=TOKEN_STRING

with the new, desired port number:

http://localhost:8008/?token=TOKEN_STRING

and paste this url into the host machine's browser.

To access tensorboard: make sure you can access port 6996 on the host machine and then modify the generated tensorboard url:

(e.g. TensorBoard 1.15.0)

http://0.0.0.0:6006/

with the new, desired port number:

http://localhost:6996

and paste this url into the host machine's browser.

Installation

This repo houses a docker container with jupyter and tensorbaord services running. If you have a NVIDIA GPU, check here to see if your GPU can support CUDA. If so, then you can use the GPU-only instruction below.

Install Docker and Pre-requisties

Follow steps one (and two if you have a CUDA-enabled GPU) from this guide from tensorflow to prepare your computer for the tensorflow docker base container images. Don't actually install the tensorflow container, that will happen automatically later.

Post-installation

Follow the *nix docker post-installation guide.

Building the Container

Now that you have docker configured, you can need to clone this repo. Pick your favorite directory on your computer (mine is /$HOME/Downloads ofc) and run:

git clone --recurse-submodules https://github.com/nicholasRenninger/GAIL-Formal_Methods
cd GAIL-Formal_Methods

The container builder uses make:

If you have a CUDA-enabled GPU and thus you followed step 2 of the docker install section above, then run:

make docker-gpu

If you don't have a CUDA-enabled GPU and thus you didn't follow step 2 of the docker install section above, then run:

make docker-cpu

Name		Name	Last commit message	Last commit date
Latest commit History 50 Commits
ASEN6519-DMU-Project-Paper @ 4873869		ASEN6519-DMU-Project-Paper @ 4873869
conda		conda
docker_scripts		docker_scripts
experiment_data		experiment_data
logs		logs
mod_envs		mod_envs
results		results
rl_baselines_zoo @ 645ea17		rl_baselines_zoo @ 645ea17
stable_baselines		stable_baselines
util		util
.gitattributes		.gitattributes
.gitignore		.gitignore
.gitmodules		.gitmodules
Dockerfile		Dockerfile
GAIL_testing.ipynb		GAIL_testing.ipynb
Makefile		Makefile
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GAIL-Formal_Methods

About

Results

Final Policies

Expert Demonstrator Training

Imitation Learner Training

Methodology

Container Usage

Accessing the Jupyter and Tensorboard Servers

Installation

Install Docker and Pre-requisties

Post-installation

Building the Container

About

Releases

Packages

Languages

nicholasRenninger/GAIL-Formal_Methods

Folders and files

Latest commit

History

Repository files navigation

GAIL-Formal_Methods

About

Results

Final Policies

Expert Demonstrator Training

Imitation Learner Training

Methodology

Container Usage

Accessing the Jupyter and Tensorboard Servers

Installation

Install Docker and Pre-requisties

Post-installation

Building the Container

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages