In this paper we test the state-of-the-art reinforcement learning algorithm Proximal Policy Optimisation (PPO) in the robotic control domain for their ability to transfer between similar, albeit different, tasks. We will use OpenAI’s Bipedal walker’s two environments; with the aim to reduce training times and improve performance comparative to training from scratch for both tasks.
Our experiments show that weight sharing with all layers transferred can increase the initial level of performance - when transferring to both more complex or simpler tasks. We also show that training on multiple tasks can significantly increase performance on complex environments.
Our results show us that similar methods could be used to prepare an agent for very complex task by training on a simpler task. Thus, speeding up learning of a complex task and increasing the performance level.