This project demonstrates how to train a drone to perform a stable altitude-hold maneuver using Reinforcement Learning. The system integrates the power of the PX4 flight stack, the Gazebo physics simulator, and the ROS 2 robotics framework with a Deep Q-Network (DQN) agent trained using Stable Baselines3.
A key challenge in this project was overcoming the agent's initial tendency to "give up." The GIF below shows the final trained agent successfully controlling its thrust to maintain the target altitude of 20 meters, even after starting from a slightly lower position.
The goal of this project is to replace a traditional PID controller for a single objective (altitude hold) with a Reinforcement Learning agent that learns the optimal control policy from scratch. This serves as a foundational project for more complex autonomous behaviors.
The agent's objective is simple:
- Goal: Maintain the drone at a target altitude of 20.0 meters.
- Actions: The agent can choose to
increase thrust,decrease thrust, orrestartthe episode. - Observations: It perceives the environment through its vertical position (z), vertical velocity (z), and the last applied thrust command.
- End-to-End RL Pipeline: A complete, working pipeline from simulation to a trained AI agent.
- Custom ROS 2 & Gymnasium Environment: A custom
DroneEnvclass that bridges the gap between the ROS 2 ecosystem and the standard Gymnasium (formerly OpenAI Gym) API for RL training. - Reward Shaping: The reward function was carefully designed to provide continuous feedback, rewarding the agent for getting closer to the target altitude.
- Constraint Handling: The environment enforces physical constraints, such as maximum velocity and operational altitude, terminating the episode and penalizing the agent for violations.
- Stable Baselines3 Integration: Leverages the robust and popular Stable Baselines3 library for training a DQN agent.
# create
conda create venv
# activate
conda activate venv
cd ~
git clone https://github.com/PX4/PX4-Autopilot.git --recursive
cd PX4-Autopilot/
git checkout v1.15.2
git submodule update --init --recursive
make px4_sitl
To get the best installation possible check the ROS2 Jazzy website installation guide : https://docs.ros.org/en/jazzy/Installation.html
cd ~
git clone https://github.com/eProsima/Micro-XRCE-DDS-Agent.git
cd Micro-XRCE-DDS-Agent
mkdir build
cd build
cmake ..
make
sudo make install
sudo ldconfig /usr/local/lib/
We will need px4_msgs and px4_ros_com:
mkdir -p ~/ws_ros2/src/
cd ~/ws_ros2/src/
git clone https://github.com/PX4/px4_msgs.git
git clone https://github.com/PX4/px4_ros_com.git
git clone https://github.com/eOvic/rl-drone-control-ros2-jazzy.git
cd ..
source /opt/ros/jazzy/setup.bash
colcon build
source install/setup.bash
pip install mavsdk
pip install aioconsole
pip install pygame
pip install numpy
Navigate to the package folder and run this command :
pip install -r requirements.txt
- Put below lines in your bashrc:
source /opt/ros/jazzy/setup.bash
export GZ_SIM_RESOURCE_PATH=~/.gz/models
- AI / Reinforcement Learning:
- Stable Baselines3 (DQN)
- Gymnasium (OpenAI Gym)
- Python 3
- Robotics & Simulation:
- ROS 2 (Jazzy)
- PX4 Autopilot (Software-in-the-Loop)
- Gazebo Simulator
- Ubuntu 24.04 with a complete ROS 2 Jazzy installation.
- PX4 Autopilot source code cloned and built, with the SITL + Gazebo simulation environment fully set up version 1.15.2 since it's the most stable
- Python packages:
stable-baselines3,gymnasium,rclpy
-
Build the ROS 2 Package: Navigate to the root of your workspace and build the package using
colcon.cd ~/ws_ros2/ colcon build --packages-select px4_rl_project
-
Source the Workspace: In every new terminal you use, you must source the workspace to make the package available.
source ~/ws_ros2/install/setup.bash
Running the project requires three terminals.
First, launch the PX4 SITL simulation. This command will start the PX4 flight controller and open a Gazebo window with the drone model.
# Navigate to your PX4 directory
cd ~/PX4-Autopilot/
# You must choose a world that has a ground plane.
PX4_GZ_WORLD=lawn make px4_sitl gz_x500_depthWait for the simulation to load completely and for the PX4 terminal to be ready for takeoff.
To ensure the connection between Gazebo, ROS and PX4 we use MicroXRCEAgent Open another Terminal and write this command :
MicroXRCEAgent udp4 -p 8888
IMPORTANT NOTE : You must run QGroundControl in the backgroud : https://docs.qgroundcontrol.com/master/en/qgc-user-guide/getting_started/download_and_install.html
Once the simulation is running, open a new terminal, source your ROS 2 workspace, and run the training script.
conda activate venv
# Source your workspace
source ~/ws_ros2/install/setup.bash
# Run the training script
python3 ~/ws_ros2/src/px4_rl_project/px4_rl_project/train_agent.pyYou will see log output from both the drone environment (takeoff sequence, position updates) and the Stable Baselines3 trainer (rollout rewards, episode lengths, etc.).
A fascinating part of this project was debugging the agent's behavior.
- Initial Problem: The agent quickly learned that "restarting" the episode was an easy way to escape situations where it was performing poorly. It got stuck in a loop of immediately choosing the
RESTARTaction. - Solution - Reward Shaping: The
RESTARTaction was given a large negative penalty (-10.0). This immediately taught the agent that restarting was an undesirable outcome.
- Complex Tasks: Extend the agent's capabilities to perform path following or obstacle avoidance.
- Continuous Action Space: Move from a discrete action space (DQN) to a continuous one using an algorithm like PPO or SAC for finer thrust control.
- Sim-to-Real Transfer: Explore techniques to transfer the policy learned in simulation to a real drone and using PX4 ensures that.
This project is licensed under the MIT License. See the LICENSE file for details.
