Skip to content

philippaltmann/DIRECT

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Discriminative Reward Co-Training

DOI:10.1007/s00521-024-10512-8 PDF

This repository contains the implementation of Discriminative Reward Co-Training (DIRECT), a novel reinforcement learning extension designed to enhance policy optimization in challenging environments with sparse rewards, hard exploration tasks, and dynamic conditions. DIRECT integrates a self-imitation buffer for storing high-return trajectories and a discriminator to evaluate policy-generated actions against these stored experiences. By using the discriminator as a surrogate reward signal, DIRECT enables efficient navigation of the reward landscape, outperforming existing state-of-the-art methods in various benchmark scenarios. This implementation supports reproducibility and further exploration of DIRECT's capabilities.

DIRECT Architecture Evaluation Results

Setup

Requirements

Installation

pip install -r requirements.txt

Training

Example for training DIRECT:

from baselines import DIRECT 

envs = ['Maze9Sparse']; epochs = 24
model = DIRECT(envs=envs, seed=42, path='results')
model.learn(total_timesteps = epochs * 2048 * 4)
model.save()

Running Experiments

Train DIRECT and baselines

python -m run DIRECT -e Maze9Sparse -t 24 --path 'results/1-eval'
python -m run [DIRECT|GASIL|SIL|A2C|PPO|VIME|PrefPPO] -e FetchReach -t 96 --path 'results/2-bench'

Display help for command line arguments

python -m run -h

Run Evaluation Scripts

./run/1-eval/kappa.sh
./run/1-eval/omega.sh
./run/1-eval/chi.sh

Run Benchmark Scripts

./run/2-bench/maze.sh
./run/2-bench/shift.sh
./run/2-bench/fetch.sh

Plotting

Evaluation

Kappa

python -m plot results/1-eval/kappa -m Buffer --merge Training Momentum Scores 

Omega

python -m plot results/1-eval/omega -m Discriminator --merge Training

Chi

python -m plot results/1-eval/chi -m DIRECT --merge Training

Benchmarks

Maze

python -m plot results/2-bench -e Maze9Sparse -m Training

HoleyGrid

python -m plot results/2-bench -e HoleyGrid -m Shift --merge Training

Fetch

python -m plot results/2-bench -e FetchReach -m Training

About

Discriminative Reward Co-Training

Resources

License

Stars

Watchers

Forks

Packages

No packages published