This project was developed for the "Introduction to Intelligent Autonomous Systems" course and aims to develop a reinforcement learning agent using Gymnasium environment as a base. The task is to introduce specific changes or customizations to the environment and train the BipedalWalker-v3 using the Stable Baselines library. The goal is to assess how these changes impact the agent's learning process and performance. First Semester of the Third Year of the Bachelor's Degree in Artificial Intelligence and Data Science.
- stable-baselines3[extra]
- swig
- gymnasium[box2d]
- sb3-contrib
States: The state is a continuous vector of 24 dimensions wich include:
- Angle and angular velocity of the hull (main body): 2 points.
- Horizontal and vertical hull speed: 2 points.
- Joint angles and leg angular velocities: 8 points (4 joints x 2 points each).
- Ground contact sensors on the legs: 2 values (indicating whether each leg is in contact with the ground).
- Information on the terrain ahead (LIDAR sensors): 10 points (terrain readings to anticipate obstacles).
Rewards: Reward is given for moving forward, totalling 300+ points up to the far end. If the robot falls, it gets -100. Applying motor torque costs a small amount of points.
Percepts: The agent sees everything it needs to (state = perceptions), as the environment is fully observable, eliminating the need to infer hidden information or deal with noise.
Actions: Actions are motor speed values in the [-1, 1] range for each of the 4 joints at both hips and knees.
At an early stage, we started by trying to decide which RL algorithms would be best suited for BipedalWalker-v3, so we tested several different algorithms in normal mode with 5M timesteps:
When analyzing the results, we chose to use PPO, SAC and TRPO.
Initially, we decided to test small rewards to understand the impact each of them had and whether their use was justified or not, having trained the models with 30M timesteps. For this we made two tests:
After detecting the initial errors, we realized that we should encourage the agent to walk forward and remove the alternating use of both feet:
Rewards:
- Overcome steep terrain
- Moving forward
Penalties:
- Vertical sudden moves (ex: falling)
- Severe instability (ex: torso inclination)
- Stands still for a long time or stops moving
- Agent fails
Videos:
PPO | TRPO |
---|---|
![]() |
SAC | CONTROL |
---|---|
![]() |
![]() |
In an attempt to further improve the second phase, we decided to do a third test by changing the rewards again and truing to correct the minor errors detected in the previous phase.
Rewards:
- Overcome steep terrain
- Moving forward
- Lifting its legs from the ground
- Using each leg the same number of times
Penalties:
- Vertical sudden moves (ex: falling)
- Severe instability (ex: torso inclination)
- Stands still for a long time or stops moving
- Agent fails
Videos:
PPO | TRPO |
---|---|
![]() |
![]() |
SAC | CONTROL |
---|---|
![]() |
![]() |
In addition to the tests demonstrated above, we also performed two other tests:
In order to try to understand if the agent would move better with feet, we created an agent with feet.
- rewards ➡️ Its a folder with python files with the different rewards that were used;
- Assignment.pdf ➡️ Project statement;
- BipedalWalker.pptx ➡️ Its a Powerpoint with information about the work developed;
- ExtraBW.pptx ➡️ Its a Powerpoint with some extra information about the work developed (graphs, videos...);
- bipedal_walker_custom.txt ➡️ If you wanna try the BipedalWalker with feet this is what you have to use;
- rewards_train.py ➡️ The code for trainning the agent with rewards;
- test_model.py ➡️ The code used to test the agents;
- train_models.py ➡️ The code used to train the agents.
Note:
- When trainning agents we are training several algorihtms at the same time, you can choose which ones you are using and how many environments at the same time for each algorithm, and if you are using CPU or GPU for each algorithm.
This course is part of the first semester of the third year of the Bachelor's Degree in Artificial Intelligence and Data Science at FCUP and FEUP in the academic year 2024/2025. You can find more information about this course at the following link: