Overview

This package enables the creation of ROS robot environments in OpenAI Gym.

This is a fork of the original openai_ros package from The Construct, found here: https://bitbucket.org/theconstructcore/openai_ros.git

Installation

Installation is similar to other ROS packages. Simply clone it into a catkin workspace and build.

Execute the following commands:
cd ~/ros_ws/src
git clone https://github.com/roboav8r/openai_ros
cd ~/ros_ws
catkin_make
source devel/setup.bash
rosdep install openai_ros

New environments

The utexas fork is meant to represent UT-specific robots and environments for training of ROS robots in Gazebo simulation. As of now, there are two new robot environments and three new task environments, described below.

Robot environments

walrus: A robot environment for the Walrus robot. It launches a typical Walrus robot with two LIDAR scanning rangefinders, an IMU, and odometry. The file is stored at robot_envs/walrus_env.py
walrus_upright: A robot environment for the Walrus robot, which spawns the robot in an upright position. This is used for the self-balancing task described below. The robot has the same sensor suite as in walrus_env, and the only difference is the spawn orientation as specified by the launch file. The file is stored at robot_envs/walrus_upright_env.py

Both Walrus environments depend on the walrus_description and walrus_gazebo packages (openai branch): https://github.com/UTNuclearRobotics/walrus_description/tree/openai

https://github.com/UTNuclearRobotics/walrus_gazebo/tree/openai

Task Environments

WalrusBalance-v0 - An inverted pendulum/self-balancing robot task.
- Defined in task_envs/walrus/walrus_balance.py
- Parameters in task_envs/walrus/config/walrus_balance.yaml
- Observations
  - 16 Scans: 8 each from the LIDAR scan messages on /scan and /scan_1 topics.
  - 2 IMU measurements. Pitch attitude (imu/data/orientation/y) and pitch rate (imu/data/angular_velocity/y)
  - 1 Odometry measurement: Horizontal position (/odom/pose/pose/position/x)
- Actions
  - Commanded linear velocity (/cmd_vel/linear/x), with speed range defined by [linear_speed_(min/max)] values in the .yaml file
- Rewards
  - [stay_up_reward] value is awarded each timestep.
  - [position_penalty] is subtracted for every meter of nonzero position in x. For example, a penalty of 10 results in -10 reward if the x-position is 1m.
  - [ang_velocity_reward] is designed to keep the rotation slow, and avoid jerky or sudden movements. It is awarded when the pitch velocity is less than [ang_velocity_threshold].
- Completion conditions
  - "Crash" when robot acceleration exceeds [max_linear_acceleration] parameter.
  - "Rollover" when pitch attitude (imu/data/orientation/y) is out of bounds of [min_pitch_orient, max_pitch_orient] OR
  - Pitch rate (imu/data/angular_velocity/y) is out of bounds of [min_pitch_rate, max_pitch_rate]
WalrusStairs-v0 - A task environment designed to teach the robot to climb and descend stairs without rolling over. The stairs and motion are entirely along the x-axis.
- NOTE: This one needs tuning. Inertial parameters of the Walrus, and friction of the ground_plane and stairs need adjusting so that the robot can gain traction.
- Defined in task_envs/walrus/walrus_stairs.py
- Parameters in task_envs/walrus/config/walrus_stairs.yaml
- Observations
  - 16 Scans: 8 each from the LIDAR scan messages on /scan and /scan_1 topics.
  - 2 IMU measurements. Pitch attitude (imu/data/orientation/y) and pitch rate (imu/data/angular_velocity/y)
  - 1 Odometry measurement: Horizontal position (/odom/pose/pose/position/x)
- Actions
  - Commanded linear velocity (/cmd_vel/linear/x), with speed range defined by [linear_speed_(min/max)] values in the .yaml file
- Rewards
  - [stay_alive_reward] value is awarded each timestep.
  - [ang_velocity_reward] is designed to keep the rotation slow, and avoid jerky or sudden movements. It is awarded when the pitch velocity is less than [ang_velocity_threshold].
  - [forward_velocity_reward] is given as a multiple of forward linear speed. If this reward value is positive, forward motion gives a reward. Rearward motion does not give a penalty, but it isn't rewarded.
  - [position_reward] is awarded at each timestep as a multiple of forward progress in the x-direction. For example, a value of 10 gives a reward of 100 if the robot is 10m from the origin, and a reward of 10 if the robot is 1m away from the origin.
  - TO DO: Add a completion reward when the robot reaches [max_x_disp].
- Completion Conditions
  - "Crash" when robot acceleration exceeds [max_linear_acceleration] parameter.
  - "Rollover" when pitch attitude (imu/data/orientation/y) is out of bounds of [min_pitch_orient, max_pitch_orient] OR
  - Pitch rate (imu/data/angular_velocity/y) is out of bounds of [min_pitch_rate, max_pitch_rate]
  - x-position exceeds [max_x_disp] parameter (i.e., it's completed the entire course of stairs).
WalrusNav-v0 - a simple 2D nav task.
- NOTE: sometimes the barriers in the clearpath_playpen environment spawn in an incorrect orientation. Needs to be fixed.
- Defined in task_envs/walrus/walrus.nav.py
- Parameters in task_envs/walrus/config/walrus_nav.yaml
- Observations
  - 16 Scans: 8 each from the LIDAR scan messages on (/scan) and (/scan_1) topics.
  - 1 yaw orientation measurement (imu/data/orientation/z)
  - 2 Odometry measurements to describe the 2D position: (/odom/pose/pose/position/x) and (/odom/pose/pose/position/y)
- Actions
  - Commanded velocity (/cmd_vel/linear/x) and (/cmd_vel/angular/y), with speed range defined by [linear_speed_(max/min)] and [angular_speed_(max/min)] values in the .yaml file
- Rewards
  - [stay_alive_reward] value is awarded each timestep.
  - [forward_velocity_reward] is given as a multiple of linear speed. If this reward value is positive, forward motion gives a reward, and rear motion gives a penalty.
  - [position_reward]/(distance to goal in m) is awarded at each timestep. For example, a value of 10 gives a reward of 1 if the robot is 10m from the goal, and a reward of 10 if the robot is 1m away from the goal.
  - [goal_reached_reward] is given if the robot position is within [success_radius] meters of [x_goal, y_goal].
- Completion conditions
  - "Crash" when robot acceleration exceeds [max_linear_acceleration] parameter.
  - "Out of bounds" when robot exceeds [(min/max)_(x/y)_disp] parameters.
  - Robot position is within [success_radius] meters of [x_goal, y_goal].

Name		Name	Last commit message	Last commit date
Latest commit History 243 Commits
openai_ros		openai_ros
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Overview

Installation

New environments

Robot environments

Task Environments

About

Releases

Packages

Contributors 3

Languages

roboav8r/openai_ros

Folders and files

Latest commit

History

Repository files navigation

Overview

Installation

New environments

Robot environments

Task Environments

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages