A gradient descent Sarsa agent that controls a custom two degrees-of-freedom arm.
- Low memory footprint update implementation
- Logging utilities (written in Python) to parse data sent over serial
- Plotting utilities
Reinforcement learning is a powerful and flexible approach to learning from interaction. Embedded reinforcement learning agents could be a key component to creating engaging, interactive experiences with everyday objects. However, RL methods have not typically been designed with memory constraints in mind. To investigate the issues embedded agents face, I wanted to see how a common learning algorithm would work in the 2kb of SRAM available on an Atmel 328p (Arduino Uno/Pro Mini).

The agent gets to control a two degrees-of-freedom arm. The joints have 155 degrees of rotation. The elbow joint controls a rod tipped with an LED which the agent can toggle on and off. A photo resistor on the surface can detect whether the agent is pointing at it.
The agent must point the LED at the photoresistor in as few actions as possible. Each episode ends when the photocell reads above a threshold, and the agent is reset to a random start position. The agent is penalized for turning on the LED uneccesarily.
To see the details on the implementation and approach, as well as the specification of the reward function, please see the writeup. You can also watch a video of the agent in action.