BipedalWalker Reinforcement Learning

This project was developed for the "Introduction to Intelligent Autonomous Systems" course and aims to develop a reinforcement learning agent using Gymnasium environment as a base. The task is to introduce specific changes or customizations to the environment and train the BipedalWalker-v3 using the Stable Baselines library. The goal is to assess how these changes impact the agent's learning process and performance. First Semester of the Third Year of the Bachelor's Degree in Artificial Intelligence and Data Science.

Programming Language:

Requirements:

- stable-baselines3[extra]
- swig
- gymnasium[box2d]
- sb3-contrib

The Standard BipedalWalker:

States: The state is a continuous vector of 24 dimensions wich include:

Angle and angular velocity of the hull (main body): 2 points.
Horizontal and vertical hull speed: 2 points.
Joint angles and leg angular velocities: 8 points (4 joints x 2 points each).
Ground contact sensors on the legs: 2 values (indicating whether each leg is in contact with the ground).
Information on the terrain ahead (LIDAR sensors): 10 points (terrain readings to anticipate obstacles).

Rewards: Reward is given for moving forward, totalling 300+ points up to the far end. If the robot falls, it gets -100. Applying motor torque costs a small amount of points.

Percepts: The agent sees everything it needs to (state = perceptions), as the environment is fully observable, eliminating the need to infer hidden information or deal with noise.

Actions: Actions are motor speed values in the [-1, 1] range for each of the 4 joints at both hips and knees.

The Project:

At an early stage, we started by trying to decide which RL algorithms would be best suited for BipedalWalker-v3, so we tested several different algorithms in normal mode with 5M timesteps:

When analyzing the results, we chose to use PPO, SAC and TRPO.

First Phase:

Initially, we decided to test small rewards to understand the impact each of them had and whether their use was justified or not, having trained the models with 30M timesteps. For this we made two tests:

Tests	Rewards	Penalties	Video
Test 1	- Overcome steep terrain	- Vertical sudden moves (ex: falling) - Severe instability (ex: torso inclination)
Test 2	- Alternate feet	- Vertical sudden moves (ex: falling)

Second Phase:

After detecting the initial errors, we realized that we should encourage the agent to walk forward and remove the alternating use of both feet:

Rewards:

Overcome steep terrain
Moving forward

Penalties:

Vertical sudden moves (ex: falling)
Severe instability (ex: torso inclination)
Stands still for a long time or stops moving
Agent fails

Videos:

PPO	TRPO

SAC	CONTROL

Third Phase:

In an attempt to further improve the second phase, we decided to do a third test by changing the rewards again and truing to correct the minor errors detected in the previous phase.

Rewards:

Overcome steep terrain
Moving forward
Lifting its legs from the ground
Using each leg the same number of times

Penalties:

Vertical sudden moves (ex: falling)
Severe instability (ex: torso inclination)
Stands still for a long time or stops moving
Agent fails

Videos:

PPO	TRPO

SAC	CONTROL

Other Tests:

In addition to the tests demonstrated above, we also performed two other tests:

Feet VS No Feet:

In order to try to understand if the agent would move better with feet, we created an agent with feet.

Hiperparameter Tunning

About the repository:

rewards ➡️ Its a folder with python files with the different rewards that were used;
Assignment.pdf ➡️ Project statement;
BipedalWalker.pptx ➡️ Its a Powerpoint with information about the work developed;
ExtraBW.pptx ➡️ Its a Powerpoint with some extra information about the work developed (graphs, videos...);
bipedal_walker_custom.txt ➡️ If you wanna try the BipedalWalker with feet this is what you have to use;
rewards_train.py ➡️ The code for trainning the agent with rewards;
test_model.py ➡️ The code used to test the agents;
train_models.py ➡️ The code used to train the agents.

Note:

When trainning agents we are training several algorihtms at the same time, you can choose which ones you are using and how many environments at the same time for each algorithm, and if you are using CPU or GPU for each algorithm.

Link to the course:

This course is part of the first semester of the third year of the Bachelor's Degree in Artificial Intelligence and Data Science at FCUP and FEUP in the academic year 2024/2025. You can find more information about this course at the following link:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BipedalWalker Reinforcement Learning

Programming Language:

Requirements:

The Standard BipedalWalker: