This repository contains a ROS 2 and Gazebo Sim project for a small SCARA sorting robot. The robot detects a wood cube and a steel cube from a fixed top-down table camera, converts the detected image pixels into robot-base x and y coordinates, and sorts the cubes into their matching bins.
The project was made for the MOGI Robotrendszerek laboratórium and Kognitív robotika subjects by Csongor Telegdi and Zsombor Veszprémi.
The meshes for the cubes and collection bins were modelled in Blender and are used by Gazebo through the model files in projekt/meshes.
- Assumptions
- Repository Layout
- Quick Start With The Pretrained Detector
- Install Dependencies
- Run Checks And Retesting
- Launch Files
- World Content
- Robot URDF
- Sorting Pipeline
- YOLO Cube Detector
- Image Processing And Mask Logic
- Pixel To Robot Coordinates
- Inverse Kinematics
- Pick And Place Logic
- Safety And Security Steps
- Attach And Detach Controller
- Training Images
- Training YOLO
- Common Problems
- Development Notes
- Figure Files
This README assumes you already have a Linux system with ROS 2 installed. The project has been developed for ROS 2 Jazzy on Ubuntu 24.04 with Gazebo Harmonic. If you use another ROS 2 distribution, replace jazzy in the package names and check the matching Gazebo version. The URDF currently references the Jazzy gz_ros2_control plugin path, so Jazzy is the expected setup.
Official installation and reference pages used by this guide:
- ROS 2 Jazzy Ubuntu installation
- Using colcon to build ROS 2 packages
- Gazebo Harmonic binary installation
- Gazebo and ROS 2 integration
- RViz user guide
- Python virtual environments
- Ultralytics quickstart
- Ultralytics detection datasets
- Ultralytics training mode
- Ultralytics export mode
ROS package links in the sections below point to their ROS Index pages.
SCARA_projekt/
├── docs/
│ └── images/ # README screenshots and detector debug image
├── README.md
└── projekt/
├── config/ # ros2_control and Gazebo bridge configuration
├── datasets/scara_cubes/ # YOLO-format cube dataset
├── launch/ # check_urdf, world, spawn_robot, start_sorting
├── meshes/ # Blender-made cube/bin Gazebo models
├── msg/ # PixelDetection custom ROS messages
├── runs/ # YOLO training outputs and exported detector
├── rviz/ # RViz configs
├── scripts/ # sorting, detector, attach-detach, image tools
├── training_images/ # source class-folder images for annotation
├── urdf/scara.urdf # robot, camera, sensors, ros2_control plugins
├── worlds/world.sdf # table, cubes, bins, lights, physics
├── yolo26n.pt # optional pretrained starting model
└── yolov8n.pt # pretrained starting model used for training
The default trained detector used by the sorting launch is:
projekt/runs/detect/train-3/weights/best.onnx
Use this path if you only want to run the sorting demo with the pretrained neural network already included in the repository and you do not want to train a new model.
Install dependencies, clone, build, and source the workspace as described in Install Dependencies. Then open two terminals.
Terminal 1 starts Gazebo, RViz, the robot, the controllers, and the camera bridge:
source /opt/ros/jazzy/setup.bash
cd ~/projekt_ws/SCARA_projekt
source .venv/bin/activate
source install/setup.bash
ros2 launch projekt spawn_robot.launch.pyWait until Gazebo is open and the robot has moved to its home position. Terminal 2 starts sorting with the included ONNX model:
source /opt/ros/jazzy/setup.bash
cd ~/projekt_ws/SCARA_projekt
source .venv/bin/activate
source install/setup.bash
ros2 launch projekt start_sorting.launch.pystart_sorting.launch.py uses projekt/runs/detect/train-3/weights/best.onnx by default, so no detector_model argument is needed for the pretrained repo model. If you want to be explicit, run:
ros2 launch projekt start_sorting.launch.py \
detector_model:="$PWD/projekt/runs/detect/train-3/weights/best.onnx" \
detector_backend:=onnxruntimeThe detector will take one masked table-camera snapshot while the robot is home, publish the detected wood and steel cube pixels, and the sorter will move each reachable cube to its matching bin. To test another arrangement, stop only the sorting launch with Ctrl+C, move the cubes in Gazebo, and run ros2 launch projekt start_sorting.launch.py again.
If the large Gazebo model pack is not already present after cloning, download it here:
Extract or copy the model folders into ~/gazebo_models. The launch files add this directory to GZ_SIM_RESOURCE_PATH automatically.
Start from a terminal with ROS 2 Jazzy available:
source /opt/ros/jazzy/setup.bashInstall the ROS, Gazebo, RViz, build, and Python packages used by the project:
sudo apt update
sudo apt install -y \
git curl lsb-release gnupg unzip \
python3-pip python3-venv python3-colcon-common-extensions python3-rosdep \
python3-numpy python3-opencv python3-pyqt5 python3-pyqt5.qtsvg \
ros-jazzy-ament-cmake \
ros-jazzy-ament-index-python \
ros-jazzy-control-msgs \
ros-jazzy-controller-manager \
ros-jazzy-cv-bridge \
ros-jazzy-gz-ros2-control \
ros-jazzy-joint-state-broadcaster \
ros-jazzy-joint-state-publisher \
ros-jazzy-joint-state-publisher-gui \
ros-jazzy-joint-trajectory-controller \
ros-jazzy-launch \
ros-jazzy-launch-ros \
ros-jazzy-rclpy \
ros-jazzy-robot-state-publisher \
ros-jazzy-ros-gz \
ros-jazzy-ros-gz-bridge \
ros-jazzy-ros-gz-image \
ros-jazzy-ros-gz-interfaces \
ros-jazzy-ros-gz-sim \
ros-jazzy-rosgraph-msgs \
ros-jazzy-rosidl-default-generators \
ros-jazzy-rosidl-default-runtime \
ros-jazzy-ros2-control \
ros-jazzy-ros2-controllers \
ros-jazzy-rqt-image-view \
ros-jazzy-rviz2 \
ros-jazzy-sensor-msgs \
ros-jazzy-std-msgs \
ros-jazzy-tf2-ros \
ros-jazzy-topic-tools \
ros-jazzy-trajectory-msgs \
ros-jazzy-urdf-launch \
ros-jazzy-xacroIf gz sim --version does not work after installing the ROS Gazebo packages, install Gazebo Harmonic from the official Gazebo repository:
sudo apt-get update
sudo apt-get install -y curl lsb-release gnupg
sudo curl https://packages.osrfoundation.org/gazebo.gpg --output /usr/share/keyrings/pkgs-osrf-archive-keyring.gpg
echo "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/pkgs-osrf-archive-keyring.gpg] https://packages.osrfoundation.org/gazebo/ubuntu-stable $(lsb_release -cs) main" | sudo tee /etc/apt/sources.list.d/gazebo-stable.list > /dev/null
sudo apt-get update
sudo apt-get install -y gz-harmonicIf this is the first time rosdep is used on the machine, initialize it once:
sudo rosdep init
rosdep updateClone the repository:
mkdir -p ~/projekt_ws
cd ~/projekt_ws
git clone https://github.com/telegdicsongor/SCARA_projekt.git SCARA_projekt
cd SCARA_projektInstall any dependency that is declared in projekt/package.xml and not already installed:
rosdep install --from-paths projekt --ignore-src -r -yThe ROS package is built with ament_cmake and installed resources are found at runtime with ament_index_python.
Create the Python virtual environment used by YOLO training, YOLO export, and ONNX Runtime inference. The project uses --system-site-packages because ROS 2 Python packages such as rclpy, cv_bridge, and generated message modules are installed by apt under /opt/ros/jazzy:
source /opt/ros/jazzy/setup.bash
cd ~/projekt_ws/SCARA_projekt
python3 -m venv --system-site-packages .venv
source .venv/bin/activate
python -m pip install --upgrade pip
python -m pip install ultralytics onnx onnxruntimeCheck that the environment can see both the ROS Python packages and the ML packages:
python -c "import rclpy, cv2, onnxruntime; from ultralytics import YOLO; print('virtual environment OK')"If that command fails because rclpy or cv_bridge is missing, recreate the environment with --system-site-packages. If it fails because onnxruntime or ultralytics is missing, activate .venv again and rerun the pip install command.
Build the ROS 2 package:
source /opt/ros/jazzy/setup.bash
cd ~/projekt_ws/SCARA_projekt
colcon build --symlink-install --packages-select projekt
source install/setup.bashFor every new terminal that runs the detector, sorter, training, or export commands, source ROS, activate .venv, and source the built workspace:
source /opt/ros/jazzy/setup.bash
cd ~/projekt_ws/SCARA_projekt
source .venv/bin/activate
source install/setup.bashFor GUI-only tools such as rqt_image_view, use a separate terminal without .venv if Qt bindings are not visible:
deactivate # only if .venv is active
source /opt/ros/jazzy/setup.bash
cd ~/projekt_ws/SCARA_projekt
source install/setup.bashAfter the two quick-start launch commands are running, the detector reads /table_camera/image/compressed, publishes one home-position detection snapshot to /sorting/pixel_detections, and the sorter moves every reachable cube to its bin. If a detected cube is outside the SCARA workspace, the sorter prints a warning and tries the next reachable detection. If no reachable cube remains, it returns to home.
Useful checks:
ros2 topic list
ros2 topic hz /table_camera/image
ros2 topic echo /table_camera/camera_info --once
ros2 topic echo /sorting/pixel_detections --once
ros2 action listTo test whether image detection works repeatedly, keep spawn_robot.launch.py running in the first terminal. After the sorting motion sequence finishes and the robot returns home, stop only the sorting launch in the second terminal with Ctrl+C. In Gazebo, move the wood and/or steel cube to new positions on the table, outside the masked bin areas and inside the robot workspace. Then start the sorting launch again:
ros2 launch projekt start_sorting.launch.pyThis restarts yolo_cube_detector.py, takes a new home-position camera snapshot, publishes new pixel detections, and lets scara_sorter.py run another sorting sequence without respawning the robot or restarting Gazebo.
The launch files are ROS 2 Python launch descriptions built with launch and launch_ros.
check_urdf.launch.py
ros2 launch projekt check_urdf.launch.pyThis launch file uses the standard urdf_launch display.launch.py helper. It loads projekt/urdf/scara.urdf, starts robot_state_publisher, starts joint_state_publisher_gui by default, and opens rviz2 with projekt/rviz/urdf.rviz. The GUI sliders let you move joint1, joint2, and joint3 without Gazebo, which is useful for checking the URDF frames, joint limits, TF tree, link origins, and visual/collision geometry.
Optional arguments:
ros2 launch projekt check_urdf.launch.py gui:=false
ros2 launch projekt check_urdf.launch.py model:=scara.urdfworld.launch.py
ros2 launch projekt world.launch.pyThis starts Gazebo Sim with projekt/worlds/world.sdf. It sets GZ_SIM_RESOURCE_PATH to include:
projekt/meshes- the installed package parent directory
~/gazebo_models
The world launch then includes ros_gz_sim gz_sim.launch.py with render settings for Gazebo Sim.
spawn_robot.launch.py
ros2 launch projekt spawn_robot.launch.pyThis is the main simulation launch. It includes world.launch.py, expands the URDF with xacro, publishes /robot_description, spawns the robot in Gazebo through ros_gz_sim create, starts the Gazebo to ROS bridge, starts RViz, starts the table camera image bridge, and starts the arm controllers.
The most important nodes are:
robot_state_publisher: publishes the robot TF tree from the URDF and joint states.ros_gz_sim create: inserts the SCARA robot into the Gazebo world from/robot_description.ros_gz_bridgeparameter_bridge: bridges clock withrosgraph_msgs, contact through Gazebo message types, detachable-joint commands, and camera-info topics fromprojekt/config/gz_bridge.yaml.ros_gz_imageimage_bridge: bridges the Gazebo table camera image into ROS.controller_managerspawner: startsjoint_state_broadcasterandjoint_trajectory_controller.topic_toolsrelay: republishes table camera info for tools that expect it beside the image topic.controller_state_to_joint_states.py: fallback joint-state relay if the broadcaster package is missing.attach_detach_controller.py: starts after spawn so the cubes begin detached.
Useful options:
ros2 launch projekt spawn_robot.launch.py rviz:=false
ros2 launch projekt spawn_robot.launch.py x:=0.0 y:=-0.3 z:=1.02 yaw:=1.5708
ros2 launch projekt spawn_robot.launch.py fake_joint_states:=truefake_joint_states:=true starts joint_state_publisher for URDF-only debugging.
There is also a legacy static sorting mode:
ros2 launch projekt spawn_robot.launch.py sorting:=true static_pixel_detections:=trueThat mode publishes demo detections from world.sdf. For the neural-network task, use spawn_robot.launch.py first and then start_sorting.launch.py.
start_sorting.launch.py
ros2 launch projekt start_sorting.launch.pyThis launch assumes Gazebo, the robot, controllers, TF, and the camera bridge are already running. It starts:
attach_detach_controller.pyyolo_cube_detector.pyscara_sorter.py
By default, it does not use cube poses from world.sdf for picking. Instead, yolo_cube_detector.py detects the cubes in the compressed table-camera stream and publishes pixel detections. The sorter then projects the pixel centers onto the known cube-top plane and uses the resulting base-frame coordinates for the motion sequence.
The sorting order is confidence-based, not hard-coded by material. If both the wood cube and the steel cube are detected, scara_sorter.py first tries the reachable detection with the highest YOLO confidence. After that cube is completed or skipped, it tries the next reachable unfinished detection from the same home-position snapshot.
Default detector model:
projekt/runs/detect/train-3/weights/best.onnx
Equivalent launch argument:
ros2 launch projekt start_sorting.launch.py \
detector_model:="$PWD/projekt/runs/detect/train-3/weights/best.onnx"Other useful arguments:
ros2 launch projekt start_sorting.launch.py detector_confidence:=0.45
ros2 launch projekt start_sorting.launch.py detector_backend:=onnxruntime
ros2 launch projekt start_sorting.launch.py detector_publish_once:=false
ros2 launch projekt start_sorting.launch.py static_pixel_detections:=trueThe default launch masks the robot home area and the two bin areas with mask_base_rectangles. These rectangles are written in robot base coordinates, then projected into the camera image. This prevents the detector from trying to pick the robot itself or cubes that are already inside a bin.
world.sdf defines the physical environment:
- A ground plane.
- A table with a 1.5 m by 0.8 m top surface at table height.
- A wood cube named
wood_cube_5cm. - A steel cube named
steel_cube_5cm. - A wood bin named
wood_collection_bin. - A steel bin named
steel_collection_bin. - Gazebo physics, sensors, contact, user commands, scene broadcaster, and lighting systems.
The starting object poses are:
| Entity | Pose in world (x y z r p y) |
Purpose |
|---|---|---|
wood_cube_5cm |
-0.20 0.10 1.015 0 0 0 |
Wood target cube |
steel_cube_5cm |
0.12 0.18 1.015 0 0 0 |
Steel target cube |
wood_collection_bin |
-0.30 -0.15 1.015 0 0 0 |
Wood drop target |
steel_collection_bin |
0.30 -0.15 1.015 0 0 0 |
Steel drop target |
The sorter loads bin positions from world.sdf, transforms them into the robot base frame, and uses them as drop targets. Cube positions are not loaded from world.sdf during neural-network sorting.
The robot is a SCARA robot, short for Selective Compliance Assembly Robot Arm. A SCARA arm is stiff vertically but compliant in the horizontal plane, which makes it suitable for fast pick-and-place tasks on a table.
scara.urdf models a simple 3-DOF SCARA:
| Joint | Type | Motion | Limit |
|---|---|---|---|
joint1 |
revolute | base rotation around Z | -2.5 to 2.5 rad |
joint2 |
revolute | elbow rotation around Z | -2.5 to 2.5 rad |
joint3 |
prismatic | vertical end-effector motion | -0.15 to 0.05 m |
The first link length is 0.30 m and the second link length is 0.20 m. The ideal planar reach is therefore between about 0.10 m and 0.50 m from the base before joint limits and configured pick limits are considered.
The URDF also contains:
- A fixed
world -> base_footprint -> base_linktransform with launch-configurable base pose. - Inertial, visual, and collision geometry for each link.
- A top-down table camera mounted to
base_footprint. table_camera_link_optical, used for camera projection.- A
ros2_controlGazebo system with position command interfaces throughgz_ros2_control. - Two Gazebo detachable-joint plugins, one for each cube.
- A contact sensor on the end-effector collision geometry.
The table camera is configured as a 640 by 480 RGB camera at 20 Hz:
<sensor name="camera" type="camera">
<camera>
<image>
<width>640</width>
<height>480</height>
<format>R8G8B8</format>
</image>
<optical_frame_id>table_camera_link_optical</optical_frame_id>
<camera_info_topic>table_camera/camera_info</camera_info_topic>
</camera>
<topic>table_camera/image</topic>
</sensor>The neural-network sorting data flow is:
Gazebo table camera
-> /table_camera/image/compressed
-> yolo_cube_detector.py
-> /sorting/pixel_detections
-> scara_sorter.py
-> /arm_controller/follow_joint_trajectory
-> contact sensor and attach_detach_controller.py
-> cube attaches, moves, releases in bin
The custom detection messages are defined in projekt/msg:
PixelDetection:
object_id, object_class
center_x, center_y
bbox_width, bbox_height
confidence
target_bin
PixelDetectionArray:
header
detections[]
The motion command path uses control_msgs for FollowJointTrajectory actions and trajectory_msgs for joint trajectory points.
yolo_cube_detector.py is an rclpy node that subscribes to sensor_msgs camera topics /table_camera/image/compressed and /table_camera/camera_info. It can load:
.onnxmodels with ONNX Runtime, used by default..onnxmodels with OpenCV DNN if selected manually..ptmodels through Ultralytics if selected manually.
The launch file passes these important parameters:
"model_path": LaunchConfiguration("detector_model"),
"backend": LaunchConfiguration("detector_backend"),
"class_names": ["wood_cube", "steel_cube"],
"target_bins": ["wood_collection_bin", "steel_collection_bin"],
"confidence_threshold": LaunchConfiguration("detector_confidence"),
"publish_once": LaunchConfiguration("detector_publish_once"),
"publish_debug_image": LaunchConfiguration("detector_debug_image"),
"debug_image_topic": "/sorting/yolo_debug_image",
"mask_base_rectangles": (
"-0.08,0.08,-0.58,0.08;"
"0.02,0.28,0.18,0.42;"
"0.02,0.28,-0.42,-0.18"
),The detector waits for the startup delay, builds a mask in image space, runs YOLO on the masked frame, filters detections whose centers fall inside masked areas, and publishes the best detections. With detector_publish_once:=true, detection happens once while the robot is still in the home pose at the beginning of the sequence.
For each detection, labels are normalized to sorting classes:
if "wood" in label:
return "wood"
if "steel" in label or "metal" in label:
return "steel"That class is then used to select wood_collection_bin or steel_collection_bin.
The image processing is intentionally simple around the neural network:
- Decode the compressed image from
/table_camera/image/compressed. - Build a binary mask that is white in valid pick areas and black in ignored areas.
- Apply the mask to the RGB camera frame with
cv2.bitwise_and. - Run YOLO on the masked frame.
- Remove any detection whose center pixel still falls inside a black mask area.
- Publish the remaining detections as
PixelDetectionArray.
The default mask is defined in base-frame coordinates:
"mask_base_rectangles": (
"-0.08,0.08,-0.58,0.08;"
"0.02,0.28,0.18,0.42;"
"0.02,0.28,-0.42,-0.18"
),Those three rectangles cover:
- the robot/end-effector home area
- the wood bin area
- the steel bin area
Because the table camera is fixed to the robot base, the detector can project each rectangle corner from base coordinates to camera pixels:
u, v = project_point_to_pixel(point, self._camera_info, base_to_camera)
cv2.fillConvexPoly(mask, np.array(points, dtype=np.int32), 0)This matters for two security cases. First, when the robot is at home, it should not be detected as a cube. Second, once a cube is already in a bin, it should be treated as sorted and should not be picked again. If a cube in a bin is visually detected by YOLO, the bin mask removes it before it reaches the sorter.
The node uses cv_bridge to publish the OpenCV debug view as a ROS image. To see this graphically, open the debug image topic with rqt_image_view while start_sorting.launch.py is running:
ros2 run rqt_image_view rqt_image_view /sorting/yolo_debug_imageRun rqt_image_view from a terminal without the YOLO virtual environment activated if Qt cannot be found:
deactivate # only if a virtual environment such as (tf) or .venv is active
source /opt/ros/jazzy/setup.bash
cd ~/projekt_ws/SCARA_projekt
source install/setup.bash
ros2 run rqt_image_view rqt_image_view /sorting/yolo_debug_imageIf the GUI still reports Could not find Qt binding, install the Qt binding packages:
sudo apt update
sudo apt install -y python3-pyqt5 python3-pyqt5.qtsvg ros-jazzy-rqt-image-viewThe debug image has three panels:
- raw camera image with red overlay where the mask ignores pixels
- binary mask, where black means ignored and white means valid
- masked YOLO result with bounding boxes, labels, confidence values, and center crosses
The debug topic is enabled by default through detector_debug_image:=true and is republished periodically so it remains visible after the one-shot detection snapshot. You can also view it in RViz by adding an Image display and selecting /sorting/yolo_debug_image.
For a local OpenCV popup window instead of a ROS image topic, launch sorting with:
ros2 launch projekt start_sorting.launch.py detector_debug_window:=trueUse detector_debug_image:=false if you want to disable the debug topic.
The detector only publishes image pixels. scara_sorter.py converts a detection center (u, v) into a base-frame point by using:
- camera intrinsics from
/table_camera/camera_info tf2_rostransforms betweentable_camera_link_opticalandbase_link- the known cube-top plane
cube_top_z
The projection logic is in project_pixel_to_plane:
ray_camera = ((u - cx) / fx, (v - cy) / fy, 1.0)
origin_base = camera_to_base.translation
direction_base = rotate_vector(camera_to_base.rotation, ray_camera)
scale = (plane_z - origin_base[2]) / direction_base[2]The result is the point where the camera ray intersects the cube-top plane. This gives the x and y position used for inverse kinematics.
The SCARA arm uses a two-link planar inverse kinematics solution for joint1 and joint2, then a fixed vertical position for travel, pick, and drop height.
For target point (x, y):
r = sqrt(x^2 + y^2)
c2 = (x^2 + y^2 - l1^2 - l2^2) / (2 l1 l2)
q2 = atan2(sqrt(1 - c2^2), c2)
q1 = atan2(y, x) - atan2(l2 sin(q2), l1 + l2 cos(q2))
The implementation checks:
- finite target coordinates
- minimum and maximum reach
joint1limitsjoint2limits- configured pick region limits
If a detected cube is outside the motion range, the sorter logs a message like:
Detected target wood_cube_1 at base XY (...) is out of motion range: ...
That detection is skipped. If another reachable cube exists, the sorter continues with it. If not, the robot moves back to home.
The full sorting motion sequence starts with the robot in home position. While the robot is home, the detector publishes the initial camera snapshot. The sorter then repeatedly selects the best remaining reachable candidate and runs one pick-and-place cycle.
Candidate order is controlled here:
for detection in sorted(
detections, key=lambda item: item.confidence, reverse=True
):So the first cube is the highest-confidence reachable detection, regardless of whether it is wood or steel. If that detection is unreachable, already completed, too low-confidence, or inside a forbidden pick region, the sorter skips it and evaluates the next detection. Wood always goes to the wood bin and steel always goes to the steel bin because the detector fills target_bin from the predicted class.
For each reachable candidate, scara_sorter.py runs this sequence:
- Stay or return at home while waiting for a detection snapshot.
- Project the chosen detection center pixel to a base-frame
(x, y)pick point. - Solve inverse kinematics for the pick point.
- Move above the cube at
travel_joint3. - Move down to
pick_joint3. - Wait for the contact-based attach controller to report an attached cube.
- Lift back to travel height.
- Solve inverse kinematics for the target bin center.
- Move above the target bin.
- Move down to
drop_joint3. - Publish
/gripper/release. - Wait for the detachable joint to detach.
- Lift from the bin and mark the detection complete.
- Try the next reachable unfinished detection.
- Return home when no reachable candidate remains.
The sorter sends goals to the FollowJointTrajectory action server:
goal_msg = FollowJointTrajectory.Goal()
trajectory.joint_names = list(self._joint_names)
point.positions = list(positions)
goal_msg.trajectory = trajectoryDrop targets are selected from target_bin first. If that is missing, labels are matched against the configured bin names. If no matching bin is found, the sorter falls back to shared_bin_x and shared_bin_y.
Several checks keep the robot from doing unsafe or useless motions:
- Unreachable cube: after pixel projection, the sorter checks the configured pick region and the SCARA inverse kinematics. If the target is outside the arm range or violates joint limits, it logs an out-of-motion-range warning, remembers that detection, and continues with another reachable cube if one exists.
- No reachable cube remains: if all detections are completed, unreachable, masked, or below confidence, the sorter sends the robot back to home.
- Cube already in a bin: the bin rectangles are masked before YOLO inference and detections with centers inside masked pixels are filtered out. This prevents already-sorted cubes from being moved again.
- Robot visible in the camera: the home-position robot area is masked, so the detector does not treat robot parts or their shadows as cubes.
- No physical contact at pick: direct attachment is disabled by default. The sorter lowers to the pick pose, but the cube is attached only after the contact sensor reports contact with a configured cube. If no attachment is reported before
attach_timeout, the sorter lifts, marks that detection as skipped, and continues. - Release in the bin: when
/gripper/releaseis published, the attach-detach controller repeatedly sends detach commands and suppresses immediate reattachment for a short time.
The unreachable handling is implemented as:
self.get_logger().warning(
f"Detected target {key} at base XY ({x:.3f}, {y:.3f}) "
f"is out of motion range: {reason}"
)That warning is the message that appears when the robot cannot reach a detected cube.
Gazebo's detachable-joint plugin can attach a cube to the end effector when a message is sent to the cube's attach topic. The project does not blindly attach from software. Instead, attach_detach_controller.py requires physical contact from ros_gz_interfaces contact messages before requesting attachment. Attach, detach, release, and state topics use std_msgs Empty and String messages.
Important topics:
| Topic | Direction | Purpose |
|---|---|---|
/contact_end_effector |
Gazebo to ROS | Contact sensor reports collisions |
/wood_cube_5cm/attach |
ROS to Gazebo | Request wood detachable joint attach |
/wood_cube_5cm/detach |
ROS to Gazebo | Request wood detach |
/steel_cube_5cm/attach |
ROS to Gazebo | Request steel detachable joint attach |
/steel_cube_5cm/detach |
ROS to Gazebo | Request steel detach |
/gripper/attached_object |
ROS | Current attached cube name |
/gripper/release |
ROS | Sorter asks the controller to detach |
The key safety idea is:
if self._attached_target or self._pending_attach_target or not msg.contacts:
return
for contact in msg.contacts:
target = self._contact_target(contact)
if target:
self._publish_attach(target, ...)Contact detection is important because it prevents the robot from attaching a cube that was only detected visually but was not actually touched. This matters when the detector is uncertain, when a cube has already moved, or when the robot is outside a reachable pose. On release, the controller repeatedly publishes detach messages and temporarily suppresses contact handling so the cube does not immediately reattach while it is being dropped into a bin.
The training helper is save_training_images.py. It supports two workflows:
- live collection from the ROS table camera
- offline annotation of images already sorted into class folders
The YOLO classes are:
names:
0: wood_cube
1: steel_cubeThe generated YOLO dataset layout is:
projekt/datasets/scara_cubes/
├── data.yaml
├── images/
│ ├── train/
│ └── val/
└── labels/
├── train/
└── val/
Start the simulation with the Terminal 1 command from Quick Start With The Pretrained Detector. In another sourced terminal, collect frames from the table camera:
source /opt/ros/jazzy/setup.bash
cd ~/projekt_ws/SCARA_projekt
source .venv/bin/activate
source install/setup.bash
ros2 run projekt save_training_images.py --ros-args \
-p save_root:="$PWD/projekt/datasets/scara_cubes"OpenCV controls:
| Key or action | Meaning |
|---|---|
| drag left mouse | draw bounding box around the cube |
w |
save box as wood_cube |
s |
save box as steel_cube |
n |
save image as background, with an empty label file |
t |
save future samples into train |
v |
save future samples into val |
c or right click |
clear box |
q |
quit |
For already captured images stored in class-named folders such as projekt/training_images/wood_cube, projekt/training_images/steel_cube, and projekt/training_images/not_cube, run offline annotation:
source /opt/ros/jazzy/setup.bash
cd ~/projekt_ws/SCARA_projekt
source .venv/bin/activate
source install/setup.bash
ros2 run projekt save_training_images.py \
--source-root "$PWD/projekt/training_images" \
--save-root "$PWD/projekt/datasets/scara_cubes"For good training data, capture:
- both cube materials in many table positions
- both cubes near shadows and away from shadows
- background frames with no cube in the pick region
- scenes with the robot in home pose, because detection is done at home
- cubes in bins as background or ignored samples, because bin areas are masked during sorting
Activate the virtual environment:
source /opt/ros/jazzy/setup.bash
cd ~/projekt_ws/SCARA_projekt
source .venv/bin/activateTrain from the included YOLOv8 nano starting weights:
yolo detect train \
model=projekt/yolov8n.pt \
data=projekt/datasets/scara_cubes/data.yaml \
epochs=80 \
imgsz=640 \
project=projekt/runs/detect \
name=train-3 \
exist_ok=TrueYou can also start from the included YOLO26 nano weights:
yolo detect train \
model=projekt/yolo26n.pt \
data=projekt/datasets/scara_cubes/data.yaml \
epochs=80 \
imgsz=640 \
project=projekt/runs/detect \
name=train-3 \
exist_ok=TrueTraining outputs are written under:
projekt/runs/detect/train-3/
The most important output files are:
projekt/runs/detect/train-3/weights/best.pt
projekt/runs/detect/train-3/weights/last.pt
projekt/runs/detect/train-3/results.png
projekt/runs/detect/train-3/confusion_matrix.png
projekt/runs/detect/train-3/val_batch0_pred.jpg
Export the trained PyTorch model to ONNX:
yolo export \
model=projekt/runs/detect/train-3/weights/best.pt \
format=onnx \
imgsz=640The export creates:
projekt/runs/detect/train-3/weights/best.onnx
Use it in sorting:
ros2 launch projekt start_sorting.launch.py \
detector_model:="$PWD/projekt/runs/detect/train-3/weights/best.onnx" \
detector_backend:=onnxruntimeIf training creates a different run folder, find it with:
find projekt/runs/detect -path "*/weights/best.pt" -printThen export that best.pt or pass the matching best.onnx path to detector_model.
Dataset 'datasets/scara_cubes/data.yaml' does not exist
Run training from the repository root and use the correct path:
cd ~/projekt_ws/SCARA_projekt
yolo detect train model=projekt/yolov8n.pt data=projekt/datasets/scara_cubes/data.yaml epochs=80 imgsz=640best.pt is missing during export
Check which training run actually contains weights:
find projekt/runs/detect -path "*/weights/best.pt" -printThen export the exact file that exists.
OpenCV DNN ONNX shape error
Use the default onnxruntime backend instead of forcing opencv:
ros2 launch projekt start_sorting.launch.py detector_backend:=onnxruntimeNo detections are published
Check that the camera and model path are valid:
ros2 topic hz /table_camera/image
ros2 topic hz /table_camera/image/compressed
ls projekt/runs/detect/train-3/weights/best.onnxRobot does not attach a cube
The sorter waits for /gripper/attached_object. Check that contact messages arrive when the end effector touches a cube:
ros2 topic echo /contact_end_effector --once
ros2 topic echo /gripper/attached_objectGazebo cannot find models or textures
Make sure the external models are in ~/gazebo_models, or use the models under projekt/meshes. Then rebuild and source:
colcon build --symlink-install --packages-select projekt
source install/setup.bashRViz does not show the robot
Check that /robot_description, /joint_states, and TF are available:
ros2 topic echo /robot_description --once
ros2 topic hz /joint_states
ros2 run tf2_ros tf2_echo base_link end_effectorRun the package tests:
source /opt/ros/jazzy/setup.bash
cd ~/projekt_ws/SCARA_projekt
source install/setup.bash
colcon test --packages-select projekt
colcon test-result --verboseRebuild after changing messages, launch files, URDF, or installed scripts:
colcon build --symlink-install --packages-select projekt
source install/setup.bashBecause PixelDetection.msg and PixelDetectionArray.msg are custom interfaces, rosidl_default_generators creates the Python message modules at build time and rosidl_default_runtime provides them at runtime. Rebuild after message changes before Python nodes import the generated modules.
The README references the uploaded screenshots with these filenames:
docs/images/start_sorting.launch.py_3.png
docs/images/start_sorting.launch.py_2.png
docs/images/start_sorting.launch.py_1.png
docs/images/check_urdf.launch.py.png
docs/images/save_training_images.py.png
docs/images/spawn_robot.launch.py.png
docs/images/world.launch.py.png
docs/images/detector_debug_image.png
docs/images/presentation_video.png
If the images do not appear on GitHub, check that the files above are committed exactly with those names. GitHub image links are case-sensitive.









