Skip to content

nslyubaykin/relax_trpo_example

Repository files navigation

Example TRPO implementation with ReLAx

This repository contains an implementation of trust region policy optimization (TRPO) with ReLAx.

TRPO actor was trained on HalfCheetah-v2 Mujoco Gym environment for 4m env-steps.

The graph of average return vs training step is shown below (batch_size=40000):

trpo_training

Resulting Policy:

trpo_run.mp4