This repository contains the implementation of Discriminative Reward Co-Training (DIRECT), a novel reinforcement learning extension designed to enhance policy optimization in challenging environments with sparse rewards, hard exploration tasks, and dynamic conditions. DIRECT integrates a self-imitation buffer for storing high-return trajectories and a discriminator to evaluate policy-generated actions against these stored experiences. By using the discriminator as a surrogate reward signal, DIRECT enables efficient navigation of the reward landscape, outperforming existing state-of-the-art methods in various benchmark scenarios. This implementation supports reproducibility and further exploration of DIRECT's capabilities.
- python 3.10
- hyphi_gym
- Stable Baselines 3
pip install -r requirements.txt
Example for training DIRECT:
from baselines import DIRECT
envs = ['Maze9Sparse']; epochs = 24
model = DIRECT(envs=envs, seed=42, path='results')
model.learn(total_timesteps = epochs * 2048 * 4)
model.save()
python -m run DIRECT -e Maze9Sparse -t 24 --path 'results/1-eval'
python -m run [DIRECT|GASIL|SIL|A2C|PPO|VIME|PrefPPO] -e FetchReach -t 96 --path 'results/2-bench'
python -m run -h
./run/1-eval/kappa.sh
./run/1-eval/omega.sh
./run/1-eval/chi.sh
./run/2-bench/maze.sh
./run/2-bench/shift.sh
./run/2-bench/fetch.sh
python -m plot results/1-eval/kappa -m Buffer --merge Training Momentum Scores
python -m plot results/1-eval/omega -m Discriminator --merge Training
python -m plot results/1-eval/chi -m DIRECT --merge Training
python -m plot results/2-bench -e Maze9Sparse -m Training
python -m plot results/2-bench -e HoleyGrid -m Shift --merge Training
python -m plot results/2-bench -e FetchReach -m Training