This is a pytorch implementation of VALOR.
Variational methods are recently introduced into reinforcement learning research. It allows RL algorithms learn various modes of policies besides maximize accumulated return. Mutual information measures the degree of relation between pre-sampled policy label and the following states or trajectories. By maximizing MI, we can assign
OpenAI Gym MUJOCO License Pytorch
Just run valor.py