Deep Deterministic Policy Gradient (DDPG)

Theory

Agent is using DDPG algorithm to predict continuous actions in continuous state space. It has two networks: Actor and Critic.

https://towardsdatascience.com/reinforcement-learning-w-keras-openai-actor-critic-models-f084612cfd69

https://towardsdatascience.com/hyper-parameters-in-action-part-ii-weight-initializers-35aee1a28404

https://spinningup.openai.com/en/latest/algorithms/ddpg.html

Actor topology

Critic topology

Inputs/Outputs

The Actor network has 2 inputs from game: position, velocity. The output layer consists from fully-connected 'tanh()' layer for doing actions in range (-1.0, 1.0): force. Hidden layers are using ReLU function.

The Critic network has 2 inputs from game (states) and 1 input from Actor network (action). Hidden layers are using ReLU function. The main function of this network is estimate quality of the action[t] in the state[t].

The Critic network is trained by Bellman equation:

Q_target = reward + (1-done) * gamma * Q_next_state

Q_target       ->  Q value to be trained,
reward         ->  reward from game for action in state,
gamma          ->  discount factor,
Q_next_state   ->  quality of action in next state 
done           ->  1, if it's terminal state or 0 in non-terminal state

Summary

Framework: Tensorflow 2.0
Languages: Python 3
Author: Martin Kubovcik

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Deep Deterministic Policy Gradient (DDPG)

Theory

Inputs/Outputs

Files

README.md

Latest commit

History

README.md

File metadata and controls

Deep Deterministic Policy Gradient (DDPG)

Theory

Inputs/Outputs