Implementation of different reinforcement learning algorithms to solve the navigation problem of an unmanned aerial vehicle (UAV) using ROS/Gazebo and Python.
navigation_env.py
: The goal of this environment is to navigate an UAV on a
track without crashing into the walls. Initially, the UAV is placed randomly
into the track but at a safe distance from the walls. The state space consists
of 5 range measurements. The action space consists of 3 actions (move_forward,
rotate_left, rotate_right). Furthermore, both actions and states have additive
white Gaussian noise. The UAV is rewarded +5 for moving forward and -0.5 for
rotating. If the UAV crashes into the wall it is penalized with -200.
There are 3 available worlds/tracks:
Track1:
Track2:
Track3:
Reinforcement learning is an area of machine learning concerned with how autonomous agents learn to behave in unknown environments through trial-and-error. The goal of a reinforcement learning agent is to learn a sequential decision policy that maximizes the notion of cumulative reward through continuous interaction with the unknown environment. A challenging problem in robotics is the autonomous navigation of an Unmanned Aerial Vehicle (UAV) in worlds with no available map. This ability is critical in many applications, such as search and rescue operations or the mapping of geographical areas. In this thesis, we present a map-less approach for the autonomous, safe navigation of a UAV in unknown environments using reinforcement learning. Specifically, we implemented two popular algorithms, SARSA(λ) and Least-Squares Policy Iteration (LSPI), and combined them with tile coding, a parametric, linear approximation architecture for value function in order to deal with the 5- or 3-dimensional continuous state space defined by the measurements of the UAV distance sensors. The final policy of each algorithm, learned over only 500 episodes, was tested in unknown environments more complex than the one used for training in order to evaluate the behavior of each policy. Results show that SARSA(λ) was able to learn a near-optimal policy that performed adequately even in unknown situations, leading the UAV along paths free-of-collisions with obstacles. LSPI's policy required less learning time and its performance was promising, but not as effective, as it occasionally leads to collisions in unknown situations. The whole project was implemented using the Robot Operating System (ROS) framework and the Gazebo robot simulation environment.
Supervisor: Associate Professor Michail G. Lagoudakis
https://doi.org/10.26233/heallink.tuc.87066
This package is provided as a docker image.
Build the docker image:
docker build -t rl-uav .
Run the container:
docker run --name rl-uav -it rl-uav
Due to XmlRpcServer querying all possible file descriptors, it may be required to lower the corresponding limit depending on your system:
docker run --ulimit nofile=1024:524288 --name rl-uav -it rl-uav
Build the ros package:
source /opt/ros/noetic/setup.bash
cd /home/catkin_ws
catkin_make
Due to the high resource intensity of the catkin_make, it may be required to run only one job at a time.
catkin_make -j1
The above image also contains a vnc server in order to display the gazebo simulation.
First create a docker network:
docker network create ros
And then run the container as following:
docker run --ulimit nofile=1024:524288 --name rl-uav --net=ros --env="DISPLAY=novnc:0.0" --env="RESOLUTION=1920x1080" --env="USER=root" -it rl-uav
Start the vnc server with the following command:
vncserver -geometry $RESOLUTION
Run the noVNC client:
docker run -d --rm --net=ros --env="DISPLAY_WIDTH=1920" --env="DISPLAY_HEIGHT=1800" --env="RUN_XTERM=no" --name=novnc -p=8080:8080 theasp/novnc:latest
Connect to novnc using the following url: http://localhost:8080/vnc.html.
In order to launch a new world you must start the train.launch
file.
You can select the desired track by changing the world
parameter accordingly.
Moreover, you can choose whether to display the GUI with the gui
parameter.
source /home/catkin_ws/devel/setup.bash
roslaunch rl_uav train.launch world:=track1 gui:=true
After the world has started, run the train_uav
node (train_uav.py
) to begin
the training process and test different algorithms:
source /home/catkin_ws/devel/setup.bash
rosrun rl_uav train_uav.py
Under maintenance.
Distributed under the GPL-3.0 License. See LICENSE
for more information.