Jupyter notebooks of the algorithms developed in the Self-Optimizing Systems (HAW hamburg) class.
The vehicle's task is to start from an initial position and speed and move towards a goal in a straight line, modifying its speed in order to not invade a person's personal space.
The vehicle state (
Where
The action
The reward (
where
The vehicle receives a reward
While it receives a penalty when it approaches the human
Rewards obtained in an instant of time, when
The end of an episode is defined by the following terminal states:
- The vehicle reaches the goal, i.e.
$|\mathbf{P}^g - \mathbf{P}^v| \approx 0$ - The vehicle invades personal space, i.e.
$|\mathbf{P}^h - \mathbf{P}^v| < l^2$ - The vehicle is moving backwards, i.e.
$\dot{p}^h < 0$
Examples of episodes
Restricted to one pedestrian with rectilinear motion.
Experience buffer composed by the position and speed of 4 observations.
Execution example:
David Silver. (2015). Lectures on Reinforcement Learning.
Sutton, R. S., & Barto, A. G. (2018). Reinforcement learning: An introduction. MIT press.