NAO Robot Autonomous Ball Retrieval System

Project Overview

This repository contains the full implementation of my Sensing and Perception Group Project at King’s College London:

NAO Robot Autonomous Ball Retrieval System

Sensing and Perception Group Project | King's College London | August 2025

This project develops a comprehensive sensing and perception framework for the NAO V5 humanoid robot to autonomously detect, track, navigate to and kick a tennis ball. Inspired by RoboCup Soccer and tennis court ball kid assistance, the system integrates multiple robotics domains:

Computer Vision: Real-time ball detection and tracking using OpenCV
Path Planning: Dynamic obstacle avoidance with A(star) algorithm
SLAM: Sparse 3D reconstruction inspired by ORB-SLAM2
Motion Planning: Custom kick kinematics with balance constraints
Human-Robot Interaction: Voice command recognition system

The robot was tested in Quad Lab, King's College London.

┌─────────────────┐
│  Voice Command  │
│  Recognition    │
└────────┬────────┘
         │
         ▼
┌─────────────────┐      ┌──────────────┐
│  Ball Detection │◄─────┤ NAO Camera   │
│  (OpenCV)       │      └──────────────┘
└────────┬────────┘
         │
         ▼
┌─────────────────┐      ┌──────────────┐
│  Visual Tracking│◄─────┤ Head Control │
│  (Proportional) │      │ (ALProxy)    │
└────────┬────────┘      └──────────────┘
         │
         ▼
┌─────────────────┐      ┌──────────────┐
│  SLAM System    │◄─────┤ Feature      │
│  (ORB-based)    │      │ Extraction   │
└────────┬────────┘      └──────────────┘
         │
         ▼
┌─────────────────┐
│  Path Planning  │
│  (A* Algorithm) │
└────────┬────────┘
         │
         ▼
┌─────────────────┐      ┌──────────────┐
│  Motion         │◄─────┤ Kick         │
│  Execution      │      │ Kinematics   │
└─────────────────┘      └──────────────┘

The system consists of four primary layers:

1. Perception Layer

Image-based ball detection (optional extension)
Occupancy grid generation
Static obstacle identification

2. Planning Layer

A*-based global path planner
Manhattan distance heuristic
Node expansion, open/closed set management

3. Simulation Layer

Webots for physics-based robot simulation
RViz/Foxglove for visualising grid and planned path

4. Execution Layer

NAOqi API for body movement
Path smoothing and waypoint tracking

3. Simulation Environment

The project integrates multiple tools:

Webots

Full NAO model
Obstacle environment
Tennis-ball placement
Kinematic control

RViz

Grid visualisation
Path expansion timeline
Debugging of occupancy cells

Foxglove

Real-time monitoring
Playback of navigation logs

Gazebo + ROS Noetic + MoveIt

More realistic integration with ROS tools and MoveIt planning
More fragile on newer Ubuntu versions

4. Technical Implementations

4.1 Ball Detection & Tracking

Algorithm Steps:

Image Acquisition: Capture RGB frames from NAO's camera (320×240 resolution)
Color Filtering: Apply HSV color space conversion and yellow mask
Noise Reduction: Morphological operations (erosion + dilation)
Contour Detection: Identify closed contours using OpenCV
Circle Validation: Filter circular contours and compute center coordinates

Mathematical Model:

Proportional control for head tracking:

θ = k × (x - x_center)

Where:

θ = angular adjustment k = proportional gain constant x = ball center x-coordinate x_center = image frame center

Distance estimation from radius:

distance ≈ f(radius) [inverse relationship]

Visual Results

Ball Detection from the NAO camera

Coordinates and radius of ball

Code Snippet

# Ball Detection Core Logic
hsv = cv2.cvtColor(frame, cv2.COLOR_BGR2HSV)
mask = cv2.inRange(hsv, lower_yellow, upper_yellow)
mask = cv2.morphologyEx(mask, cv2.MORPH_OPEN, kernel)
contours, _ = cv2.findContours(mask, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)

for contour in contours:
    ((x, y), radius) = cv2.minEnclosingCircle(contour)
    if radius > min_radius:
        cv2.circle(frame, (int(x), int(y)), int(radius), (0, 255, 0), 2)
        theta = k * (x - x_center)  # Proportional control

4.2 Path Planning (A* Algorithm)

Algorithm Overview

The A* implementation uses Manhattan distance heuristic for efficient pathfinding on a 2D grid:

h(n) = |x_n - x_goal| + |y_n - y_goal|

Key Features

8-way movement (diagonal movement allowed)
Dynamic obstacle detection and avoidance
Real-time path replanning (15-50ms per obstacle)
Optimal path reconstruction via parent node tracking

Performance Metrics

Metric	A* Algorithm	Dijkstra Algorithm	Improvement
Success Rate	92% (50+ runs)	88%	+4.5%
Path Length	Optimized	Baseline	12% shorter
Replanning Time	15-50ms	25-70ms	40% faster
Memory Usage	Moderate	High	Lower

Visual Results

2D Simulation of A* Algorithm

Code Implementation

def a_star(start, goal, grid):
    open_set = PriorityQueue()
    open_set.put((0, start))
    came_from = {}
    g_score = {start: 0}
    
    while not open_set.empty():
        current = open_set.get()[1]
        
        if current == goal:
            return reconstruct_path(came_from, current)
        
        for neighbor in get_neighbors(current, grid):
            tentative_g = g_score[current] + 1
            if tentative_g < g_score.get(neighbor, float('inf')):
                came_from[neighbor] = current
                g_score[neighbor] = tentative_g
                f_score = tentative_g + manhattan_distance(neighbor, goal)
                open_set.put((f_score, neighbor))

4.3 SLAM Implementation

ORB-SLAM2 Inspired Pipeline

This visual SLAM system adapts ORB-SLAM2 architecture to Python 2.7 constraints:

Pipeline Stages:

Feature Extraction: ORB (Oriented FAST and Rotated BRIEF) feature detection (up to 3000 features)
Feature Matching: FLANN-based descriptor matching across frames
Motion Estimation: Essential matrix computation with RANSAC outlier rejection
Keyframe Selection: Add keyframes on significant camera translation
Triangulation: 3D point reconstruction from matched features
Loop Closure: Periodic global optimization (threshold: 10+ keyframes)
Map Building: Covisibility graph construction

Visual Results

(1) Nao in Gazebo environment with ball (2) Covisibility graph of landmarks and robot camera trajectory

Technical Details

ORB Feature Detection:

orb = cv2.ORB_create(nfeatures=3000)
keypoints, descriptors = orb.detectAndCompute(image, None)

Feature Matching with FLANN:

FLANN_INDEX_LSH = 6
index_params = dict(algorithm=FLANN_INDEX_LSH, table_number=6, 
                    key_size=12, multi_probe_level=1)
flann = cv2.FlannBasedMatcher(index_params, {})
matches = flann.knnMatch(desc1, desc2, k=2)

Essential Matrix & Camera Motion:

E, mask = cv2.findEssentialMat(pts1, pts2, focal=focal, pp=(cx, cy), 
                                method=cv2.RANSAC, prob=0.999, threshold=1.0)
_, R, t, mask = cv2.recoverPose(E, pts1, pts2, focal=focal, pp=(cx, cy))

Particle Filter SLAM:

Particle filter-based SLAM showing belief map evolution and robot state estimation

4.4 Kick Kinematics

Development Methodology

Development Steps:

Physical Teaching: Manually guide NAO's leg through desired kick motion
Joint Recording: Capture joint angles using Choregraphe timeline
Motion Refinement: Fine-tune keyframes for smooth trajectory
Balance Constraint: Weight shift to right leg + CoM recentering
Cartesian Control: End-effector position interpolation
Testing & Iteration: Validate stability and kick effectiveness

Joint Configuration

NAO Leg Degrees of Freedom (6 DOF per leg):

Hip Yaw/Pitch: Position adjustment
Hip Roll: Lateral movement
Knee Pitch: Leg extension
Ankle Pitch/Roll: Foot orientation

Visual Results

Kick motion visualization in Choregraphe showing successful execution

Code Implementation

# Balance and kick execution
motionProxy.wbFootState("Fixed", "RLeg")
motionProxy.wbEnableBalanceConstraint(True, "Legs")

# Cartesian interpolation for kick
effector = "LLeg"
space = motion.FRAME_ROBOT
path = [
    [0.0, 0.1, 0.05],   # Retract
    [0.15, 0.1, 0.05],  # Forward kick
    [0.0, 0.1, 0.0]     # Return
]
times = [1.0, 2.0, 3.0]
motionProxy.positionInterpolation(effector, space, path, 0x3f, times, True)
motionProxy.post.goToPosture("StandInit", 1.0)

Challenges:

Center of gravity balance during single-leg support
Preventing robot fall-over post-kick
Timing coordination between leg and arm movements

4.5 Speech Recognition

Dual-Script Architecture

Due to Python 2.7 constraints on NAO, a novel dual-script system was implemented.

System Flow:

Script 1 (Python 3.12): Runs on laptop, captures microphone input
Speech Recognition: Processes audio using Google Speech API
File I/O: Writes transcription to shared .txt file
Script 2 (Python 2.7): Polls file, executes NAO commands via NAOqi
Cleanup: Clears file after command execution to manage memory

Code Implementation

# Python 3.12 - Speech Recognition Script
import speech_recognition as sr

recognizer = sr.Recognizer()
with sr.Microphone() as source:
    audio = recognizer.listen(source)
    text = recognizer.recognize_google(audio)
    with open("command.txt", "w") as f:
        f.write(text)

while True:
    if os.path.exists("command.txt"):
        with open("command.txt", "r") as f:
            command = f.read().strip()
        if command == "go get the ball":
            execute_ball_retrieval()
        open("command.txt", "w").close()  # Clear file
    time.sleep(0.5)

Limitations:

Unable to run directly on NAO due to microphone compatibility issues
Choregraphe simulation software incompatibility
Workaround demonstrates concept but not fully integrated

5. Results & Performance

Quantitative Results

Component	Metric	Performance	Notes
Ball Detection	Accuracy	95%+	Controlled lighting conditions
	Frame Rate	15-20 FPS	320×240 resolution
	Detection Range	0.5m - 3m	Based on ball size
Path Planning	Success Rate	92%	50+ test runs
	Path Optimality	12% better than Dijkstra	Length comparison
	Replanning Time	15-50ms	Per obstacle update
SLAM	Feature Detection	Up to 3000 ORB features	Per frame
	Keyframe Threshold	10+ frames	For global optimization
	Map Density	Sparse	Monocular constraints
Kick Kinematics	Success in Simulation	100%	Choregraphe testing
	Real-world Stability	Unstable	Falls post-kick (needs tuning)

Qualitative Analysis

Strengths:

Robust ball detection under varying ball positions
Efficient path planning with obstacle avoidance
Successful SLAM feature extraction and matching
Modular, maintainable codebase
Comprehensive documentation

Limitations:

Legacy Python 2.7 constraints limit modern libraries
Kick kinematics require fine-tuning for stability
SLAM trajectory distortion due to incomplete loop closure
Speech recognition not fully integrated with NAO
Limited testing time with physical robot

6. Installation & Setup

Prerequisites

Hardware:

NAO V5 Humanoid Robot
Computer running Ubuntu 14.04 (for ROS Indigo compatibility)
Minimum 4GB RAM, 20GB storage

Software:

- Python 2.7.x (NAO compatibility)
- Python 3.12+ (Speech recognition)
- ROS Indigo
- NAOqi SDK 2.1.4.13
- OpenCV 3.x
- NumPy 1.x
- Gazebo 2.x
- MoveIt
- Choregraphe 2.1.4

Installation Steps:

Clone Repository

git clone https://github.com/Degas01/nao_robot.git
cd nao_robot

Set Up Python 2.7 Environment (NAO)

virtualenv -p python2.7 venv_nao
source venv_nao/bin/activate
pip install -r requirements.txt

Set Up Python 3.12 Environment (Speech)

python3.12 -m venv venv_speech
source venv_speech/bin/activate
pip install -r requirements_py312.txt

Install ROS Indigo & Dependencies

sudo sh -c 'echo "deb http://packages.ros.org/ros/ubuntu trusty main" > /etc/apt/sources.list.d/ros-latest.list'
sudo apt-get update
sudo apt-get install ros-indigo-desktop-full
sudo apt-get install ros-indigo-naoqi-driver
sudo apt-get install ros-indigo-moveit

Build ROS Workspace

mkdir -p ~/catkin_ws/src
cd ~/catkin_ws/src
catkin_init_workspace
ln -s /path/to/nao-autonomous-ball-retrieval .
cd ~/catkin_ws
catkin_make
source devel/setup.bash

Install Gazebo & NAO Models

sudo apt-get install gazebo2
cd ~/catkin_ws/src
git clone https://github.com/ros-naoqi/nao_meshes.git
git clone https://github.com/ros-naoqi/nao_robot.git
catkin_make

Autonomous Robot Stages

Initialize NAO robot connection
Start ball detection module
Wait for voice command "go get the ball"
Begin visual tracking and SLAM
Compute path using A*
Navigate to ball location
Execute kick when in range
Return to start position

7. Demo Videos

1. Ball Tracking Video

VID-20250329-WA0012.mp4

2. Kicking Movement Video

VID-20250329-WA0014.mp4

3. 2D simulation of A* algorithm with obstacle avoidance

Nao_Astar.mp4

4. Choreograph Simulation

kick_sim.mp4

5. Speech Recognition Showcase

Speech.Recognition.Showcase.mp4

8. Challenges & Solutions

Challenge 1: Legacy Python 2.7 Constraints

Problem:

NAO requires Python 2.7 and NAOqi SDK, incompatible with modern libraries (YOLO, TensorFlow)
pip package ecosystem deprecated for Python 2.7

Solution:

Use OpenCV 3.x (last version supporting Python 2.7) for ball detection
Implement ORB-SLAM2 pipeline from scratch using available libraries
Create dual-script architecture for speech recognition (Python 3.12 ↔ Python 2.7)

Impact:

Increased development complexity but ensured NAO compatibility

Challenge 2: Kick Stability

Problem:

NAO falls over after executing kick motion in real world
Center of gravity shifts excessively during single-leg balance

Solution Attempts:

Implemented weight shift to supporting leg using wbFootState
Added balance constraints with wbEnableBalanceConstraint
Manual joint fine-tuning (ongoing)
Future: Predictive balance model with IMU integration

Current Status:

Works in simulation, requires further real-robot tuning

Challenge 3: SLAM Trajectory Distortion

Problem:

Camera trajectory shows significant drift over time
Loop closure mechanism incomplete, causing accumulated error

Solution:

Implement bag-of-words (BoW) approach for better loop detection
Integrate IMU data for motion prediction (ORB-SLAM3 approach)
Add bundle adjustment optimization after loop closure

Workaround:

Sparse map still useful for local navigation (0-5m range)

Challenge 4: Gazebo-MoveIt Integration

Problem:

MoveIt unable to update NAO joint poses dynamically in Gazebo
Planned trajectories execute in Rviz but not in simulated robot

Root Cause:

ROS Indigo + Gazebo 2.x compatibility issues with NAO controller

Solution:

Test kick planning separately in Rviz (visual validation)
Execute pre-computed trajectories via Python scripts
Use Choregraphe for kinematic validation

Recommendation:

Upgrade to ROS Noetic + Gazebo 11 (requires NAO SDK update)

Challenge 5: Speech Recognition Integration

Problem:

NAO's onboard microphone undetectable by speech recognition libraries
Choregraphe audio modules incompatible with external Python scripts

Solution:

Use laptop microphone for speech capture (Python 3.12)
File-based communication between Python 3.12 and Python 2.7 scripts
NAO executes commands from parsed text file

Limitation:

Not fully autonomous (requires external laptop)

9. Future Work

Short-Term Improvements (1-3 months)

Kick Stability Enhancement

Integrate Kalman filter for balance prediction
Add ZMP (Zero Moment Point) calculation for dynamic stability
Implement adaptive kick force based on ball distance
Test with various ball positions and weights

SLAM Optimization

Implement bag-of-words for robust loop closure
Add bundle adjustment after every N keyframes
Integrate IMU data for motion prior (ORB-SLAM3 style)
Dense reconstruction using patch-based stereo

Path Planning Enhancements

Add dynamic replanning for moving obstacles
Implement RRT* for complex environments
Integrate SLAM map directly into A* cost function
Test in outdoor tennis court environment

Medium-Term Goals (3-6 months)

Multi-Ball Tracking

Extend detection to handle multiple balls simultaneously
Prioritize closest ball using depth estimation
Implement ball sorting strategy (e.g., nearest-first)

Human Interaction

Gesture recognition for commands (waving, pointing)
Ball handoff detection using pressure sensors
Natural language dialogue system

Energy Efficiency

Optimize gait for battery conservation
Sleep mode when idle
Periodic recharging behavior

Long-Term Vision (6+ months)

RoboCup Soccer Integration

Multi-agent coordination with other NAO robots
Opponent detection and avoidance
Goal recognition and scoring strategy

Deep Learning Integration

Replace OpenCV with YOLO v8 ball detection (requires Python 3.x migration)
Deep reinforcement learning for kick optimization
Neural SLAM (e.g., Neural Recon)

Full Autonomy

Eliminate external laptop dependency for speech
Onboard edge computing module (e.g., Jetson Nano)
5G connectivity for cloud offloading

Research Directions

Multi-modal Fusion: Combine vision, IMU, and pressure sensors for robust state estimation
Sim-to-Real Transfer: Train policies in simulation, deploy on real robot
Explainable AI: Visualize decision-making process for debugging and trust

10. Acknowledgments

King's College London

*Provided NAO robot and lab facilities (Quad Lab)*

Aldebaran Robotics (SoftBank)

*NAO robot platform and NAOqi SDK*

ROS Community

*ROS Indigo, Gazebo, MoveIt packages*

Team Members

*Harry Braganza, Hitesh Anavai, Mohammad Islam and Kriti Chauhan*

OpenCV Community

*Computer vision library*

ORB-SLAM2 Authors

*Raúl Mur-Artal and Juan D. Tardós for SLAM architecture inspiration*

Course Instructors

*Dr. Oya Celiktutan and teaching assistants for guidance*

RoboCup SPL Community

*Resources and documentation*

11. References

RoboCup Standard Platform League. https://spl.robocup.org/
Li, Q. & Zhao, Y. (2024). "Tennis Ball Recognition in Complex Scenes Based on Improved YOLOv5." ICAACE. DOI: 10.1109/icaace61206.2024.10548503
Leiva, L.A. et al. (2018). "Playing soccer without colors in the SPL: A convolutional neural network approach." arXiv:1811.12493.
Bradski, G. (2008). Learning OpenCV: Computer Vision with the OpenCV Library. O'Reilly.
Baevski, A. et al. (2020). "wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations." arXiv:2006.11477.
Hart, P., Nilsson, N., & Raphael, B. (1968). "A Formal Basis for the Heuristic Determination of Minimum Cost Paths." IEEE Transactions on Systems Science and Cybernetics, 4(2), 100-107.
Kalman, R.E. (1960). "A New Approach to Linear Filtering and Prediction Problems." Journal of Basic Engineering, 82(1), 35-45.

Name		Name	Last commit message	Last commit date
Latest commit History 47 Commits
nao_path_planning		nao_path_planning
.gitignore		.gitignore
LICENCE		LICENCE
README.md		README.md
requirements_py27.txt		requirements_py27.txt
requirements_py312.txt		requirements_py312.txt

License

Degas01/nao_robot

Folders and files

Latest commit

History

Repository files navigation

NAO Robot Autonomous Ball Retrieval System

Project Overview

Table of Contents

1. Project Objectives

2. System Architecture

1. Perception Layer

2. Planning Layer

3. Simulation Layer

4. Execution Layer

3. Simulation Environment

Webots

RViz

Foxglove

Gazebo + ROS Noetic + MoveIt

4. Technical Implementations

4.1 Ball Detection & Tracking

Mathematical Model:

Visual Results

Code Snippet

4.2 Path Planning (A* Algorithm)

Algorithm Overview

Key Features

Performance Metrics

Visual Results

Code Implementation

4.3 SLAM Implementation

ORB-SLAM2 Inspired Pipeline

Visual Results

Technical Details

4.4 Kick Kinematics

Development Methodology

Joint Configuration

Visual Results

Code Implementation

4.5 Speech Recognition

Dual-Script Architecture

Code Implementation

5. Results & Performance

Quantitative Results

Qualitative Analysis

6. Installation & Setup

Prerequisites

Hardware:

Software:

Installation Steps:

Autonomous Robot Stages

7. Demo Videos

1. Ball Tracking Video

2. Kicking Movement Video

3. 2D simulation of A* algorithm with obstacle avoidance

4. Choreograph Simulation

5. Speech Recognition Showcase

8. Challenges & Solutions

Challenge 1: Legacy Python 2.7 Constraints

Problem:

Solution:

Impact:

Challenge 2: Kick Stability

Problem:

Solution Attempts:

Current Status:

Challenge 3: SLAM Trajectory Distortion

Problem:

Solution:

Workaround:

Challenge 4: Gazebo-MoveIt Integration

Problem:

Root Cause:

Solution:

Recommendation:

Challenge 5: Speech Recognition Integration

Problem:

Solution:

Limitation:

Packages