Implementation of deep reinforcement learning models
- Soft Actor Critic
- DARC
- GAIL (In Progress)
Soft actor critic is an off-policy model that attempts to maximize reward as well as entropy of its actions. With its objective being
This pushes the policy to balance between exploration and exploitation of its environment with minimum number of hyperparameters to tune.
The policy uses a gaussian distribution for continuous action prediction and the value network uses a twin q-net to prevent explosive growth in reward.
DARC builds on top of SAC for transfer from source to target domain by attempting to match transition probabilities. This is done through an additional classifier for classification between source and target domains and adding reward based on dynamics adaptation.