Skip to content

Implementation of VALOR (Variational Option Discovery Algorithms)

Notifications You must be signed in to change notification settings

Steven-Ho/VALOR

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

VALOR - Variational Option Discovery Algorithms

This is a pytorch implementation of VALOR.

Motivation of this project

Variational methods are recently introduced into reinforcement learning research. It allows RL algorithms learn various modes of policies besides maximize accumulated return. Mutual information measures the degree of relation between pre-sampled policy label and the following states or trajectories. By maximizing MI, we can assign $\pi$ different task labels, leading to different behavior. VALOR is one of these variational methods that concentrates on MI between task labels and the whole trajectory - not states alone. The paper was published. Unlike traditional RL procedure, a decoder is trained to recognize what the underlying task label is in collected trajectories. Furthermore, the performance of the decoder is added to reward function. Thus, the policy and the decoder should collaborate to learn a good way to transmit the information from pre-sampled label to the decoder. Unfortunately, so far was no code released. This project partially implements this algorithm, including VPG learning procedure and MI maximization through a decoder (discriminator in the codes). The curriculum learning mechanism is omitted. hyperparameters are not carefully finetuned so you can reach a higher score on mujoco tasks.

Prerequisites

OpenAI Gym MUJOCO License Pytorch

Let it run!

Just run valor.py

About

Implementation of VALOR (Variational Option Discovery Algorithms)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages