Example TRPO implementation with ReLAx

This repository contains an implementation of trust region policy optimization (TRPO) with ReLAx.

TRPO actor was trained on HalfCheetah-v2 Mujoco Gym environment for 4m env-steps.

The graph of average return vs training step is shown below (batch_size=40000):

Resulting Policy:

trpo_run.mp4

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.ipynb_checkpoints		.ipynb_checkpoints
content/video		content/video
tensorboard_logs/trpo_halfcheetah		tensorboard_logs/trpo_halfcheetah
trained_models		trained_models
README.md		README.md
trpo_example.ipynb		trpo_example.ipynb
trpo_training.png		trpo_training.png

Provide feedback