This is a comprehensive 3D Human Pose Estimation Source plugin for MADS (Modular Architecture for Distributed Systems). The plugin performs real-time 3D human pose estimation using OpenVINO's optimized deep learning models, supporting multiple input sources including regular webcams, Raspberry Pi cameras, and Microsoft Azure Kinect sensors.
The HPE plugin implements a sophisticated computer vision pipeline that:
- Captures video input from various camera sources
- Processes RGB and depth data (when available)
- Runs OpenPose-based human pose estimation using Intel's OpenVINO framework
- Generates 3D skeletal representations with 18 key body joints
- Publishes results as JSON data for downstream processing in MADS ecosystem
- Multi-platform support: Linux, macOS, and Windows
- Multiple input sources: USB cameras, Raspberry Pi cameras, Azure Kinect
- Real-time processing: Optimized inference using OpenVINO
- 3D pose estimation: Combines RGB and depth data for spatial positioning
- Flexible configuration: Extensive parameter customization
- Debug capabilities: Comprehensive debugging options for development
- High accuracy: Uses Intel's pre-trained human-pose-estimation-0001 model
┌─────────────────────────────────────────────────────────────────────────────────┐
│ HPE Plugin Architecture │
└─────────────────────────────────────────────────────────────────────────────────┘
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ Camera Input │ │ Azure Kinect │ │ Dummy/File │
│ (USB/CSI) │ │ (RGB+Depth) │ │ Input │
└─────────┬───────┘ └─────────┬───────┘ └─────────┬───────┘
│ │ │
└──────────────────────┼──────────────────────┘
│
┌────────▼────────┐
│ Video Capture │
│ Setup │
│ (Resolution, │
│ FPS, Format) │
└────────┬────────┘
│
┌────────▼────────┐
│ Frame │
│ Acquisition │
│ & Preprocessing │
└────────┬────────┘
│
┌─────────────────┼─────────────────┐
│ │ │
┌────────▼────────┐ ┌─────▼──────┐ ┌──────▼──────┐
│ RGB Processing │ │ Depth │ │ Point Cloud │
│ Pipeline │ │ Processing │ │ Generation │
└────────┬────────┘ └─────┬──────┘ └──────┬──────┘
│ │ │
└─────────────────┼─────────────────┘
│
┌────────▼────────┐
│ OpenVINO │
│ Model Loading │
│ (HPE OpenPose) │
└────────┬────────┘
│
┌────────▼────────┐
│ Inference │
│ Pipeline │
│ (Async/Sync) │
└────────┬────────┘
│
┌──────────────────────┼──────────────────────┐
│ │ │
┌────────▼────────┐ ┌────────▼────────┐ ┌────────▼────────┐
│ Heat Maps │ │ PAFs │ │ Embeddings │
│ Generation │ │ (Part Affinity │ │ (Associative) │
│ │ │ Fields) │ │ │
└────────┬────────┘ └────────┬────────┘ └────────┬────────┘
│ │ │
└──────────────────────┼──────────────────────┘
│
┌────────▼────────┐
│ Peak/Joint │
│ Detection │
│ & Association │
└────────┬────────┘
│
┌────────▼────────┐
│ Pose Assembly │
│ & Refinement │
│ (18 Keypoints) │
└────────┬────────┘
│
┌─────────────────┼─────────────────┐
│ │ │
┌────────▼────────┐ ┌─────▼──────┐ ┌──────▼──────┐
│ 2D Skeleton │ │ 3D Spatial │ │ Confidence │
│ Rendering │ │ Mapping │ │ Scoring │
└────────┬────────┘ └─────┬──────┘ └──────┬──────┘
│ │ │
└─────────────────┼─────────────────┘
│
┌────────▼────────┐
│ JSON Output │
│ Generation │
│ (MADS Format) │
└────────┬────────┘
│
┌────────▼────────┐
│ Debug/Viewer │
│ (Optional) │
└─────────────────┘
Key Components:
- Input Sources: USB Camera, Raspberry Pi Camera, Azure Kinect
- Processing: OpenVINO inference engine with async pipeline
- Model: Intel's human-pose-estimation-0001 (OpenPose architecture)
- Output: 18-point skeletal representation with 3D coordinates
- Keypoints: Nose, Neck, Shoulders, Elbows, Wrists, Hips, Knees, Ankles, Eyes, Ears
- CPU: Intel x86_64 or ARM64 processor (Raspberry Pi 4+ supported)
- RAM: Minimum 4GB, recommended 8GB+
- Camera: USB webcam, Raspberry Pi camera, or Azure Kinect
- GPU: Optional but recommended for CUDA acceleration (when available)
- OpenCV (≥4.0): Computer vision library for image processing
- OpenVINO (≥2024.0): Intel's deep learning inference framework
- VTK (≥9.0): Visualization toolkit for 3D graphics
- PCL (≥1.13): Point Cloud Library for 3D data processing
- Eigen3 (≥3.4): Linear algebra library
- nlohmann/json (≥3.11): JSON parsing and generation
- pugg: Plugin architecture framework
Linux (including Raspberry Pi):
- libcamera-dev: Camera abstraction layer
- LCCV: LibCamera Computer Vision wrapper
- MPI: Message Passing Interface
Windows:
- Azure Kinect SDK (optional): For Kinect sensor support
- Visual Studio 2019+: Build tools
macOS:
- Homebrew: Package manager for dependencies
- Linux (Ubuntu 20.04+, Debian 11+, Raspberry Pi OS)
- macOS (10.15+)
- Windows (10/11 with Visual Studio 2019+)
# Install basic dependencies
sudo apt update
sudo apt install -y build-essential cmake pkg-config git
# Install OpenCV
sudo apt install -y libopencv-dev
# Install VTK
sudo apt install -y libvtk9-dev
# Install PCL
sudo apt install -y libpcl-dev
# Install libcamera (for Raspberry Pi)
sudo apt install -y libcamera-dev
# Install MPI
sudo apt install -y libopenmpi-dev# Download OpenVINO 2024.0+
wget https://storage.openvinotoolkit.org/repositories/openvino/packages/2024.0/linux/l_openvino_toolkit_ubuntu20_2024.0.0.14509.34caeefd078_x86_64.tgz
# Extract and install
tar -xzf l_openvino_toolkit_ubuntu20_2024.0.0.14509.34caeefd078_x86_64.tgz
sudo mv l_openvino_toolkit_ubuntu20_2024.0.0.14509.34caeefd078_x86_64 /opt/intel/openvino_2024.0.0
# Setup environment
echo 'source /opt/intel/openvino_2024.0.0/setupvars.sh' >> ~/.bashrc
source ~/.bashrc# Set environment variables for OpenCV and OpenVINO
$env:OpenCV_DIR = "C:\opencv\build"
$env:OpenVINO_DIR = "C:\Program Files (x86)\Intel\openvino_2024"
# Install Azure Kinect SDK (optional)
# Download from: https://github.com/microsoft/Azure-Kinect-Sensor-SDK# Install Homebrew dependencies
brew install opencv vtk pcl eigen pkg-config cmake
# Install OpenVINO
# Download from Intel's website and follow installation instructions# Clone the repository (if not already done)
cd /path/to/plugin_skeletonizer_3D_mads
# Configure build
cmake -Bbuild -DCMAKE_INSTALL_PREFIX="$(mads -p)"
# Build with parallel jobs
cmake --build build -j$(nproc)
# Install
sudo cmake --install build# Open PowerShell as Administrator
cd C:\path\to\plugin_skeletonizer_3D_mads
# Configure build
cmake -Bbuild -DCMAKE_INSTALL_PREFIX="$(mads -p)"
# Build release version
cmake --build build --config Release
# Install
cmake --install build --config ReleaseAfter building, ensure the plugin files are in the correct MADS directory:
- Copy
hpe.exeandhpe.pluginfromusr/bintousr/local/binin your MADS installation - Verify the OpenVINO model files are downloaded to the
models/directory
The plugin supports extensive configuration through INI files:
[hpe]
# Publication settings
pub_topic = "hpe" # MADS topic for publishing results
period = 30 # Processing period in milliseconds
# Camera configuration
camera_device = 0 # Camera device ID (0 for default)
azure_device = 0 # Azure Kinect device ID
fps = 25 # Target frame rate
resolution_rgb = "1280x720" # RGB camera resolution
# Model configuration
model_file = "models/human-pose-estimation-0001.xml" # OpenVINO model path
CUDA = false # Enable CUDA acceleration (if available)
dummy = false # Use dummy input for testing
# Debug configuration
debug = {
skeleton_from_depth_compute = false, # Enable depth-based skeleton computation
skeleton_from_rgb_compute = false, # Enable RGB-based skeleton computation
hessian_compute = false, # Enable Hessian matrix computation
cov3D_compute = false, # Enable 3D covariance computation
consistency_check = false, # Enable consistency checking
point_cloud_filter = false, # Enable point cloud filtering
coordinate_transfrom = false, # Enable coordinate transformation
viewer = false # Enable visualization window
}The plugin automatically downloads the Intel OpenVINO human pose estimation model:
- Model:
human-pose-estimation-0001 - Format: OpenVINO IR (XML + BIN)
- Input: 256x456 RGB image
- Output: 18 keypoint coordinates + confidence scores
The plugin detects 18 human body keypoints:
// OpenPose keypoint mapping
0: "NOS_" (Nose) 10: "ANKR" (Right Ankle)
1: "NEC_" (Neck) 11: "HIPL" (Left Hip)
2: "SHOR" (Right Shoulder) 12: "KNEL" (Left Knee)
3: "ELBR" (Right Elbow) 13: "ANKL" (Left Ankle)
4: "WRIR" (Right Wrist) 14: "EYER" (Right Eye)
5: "SHOL" (Left Shoulder) 15: "EYEL" (Left Eye)
6: "ELBL" (Left Elbow) 16: "EARR" (Right Ear)
7: "WRIL" (Left Wrist) 17: "EARL" (Left Ear)
8: "HIPR" (Right Hip)
9: "KNER" (Right Knee)# Start MADS with HPE plugin
mads -c config.ini
# The plugin will automatically:
# 1. Initialize camera capture
# 2. Load OpenVINO model
# 3. Start processing frames
# 4. Publish pose data to specified topic# Run as standalone application
./hpe
# Or with custom parameters
./hpe --config params.jsonThe plugin publishes JSON data with the following structure:
{
"timestamp": "2024-01-01T12:00:00.000Z",
"frame_id": 12345,
"poses": [
{
"person_id": 0,
"keypoints": [
{
"name": "NOS_",
"position_2d": {"x": 320.5, "y": 240.2},
"position_3d": {"x": 0.1, "y": 0.0, "z": 2.5},
"confidence": 0.95
},
// ... additional keypoints
],
"total_score": 0.85
}
// ... additional persons
],
"processing_time_ms": 15.2,
"camera_info": {
"resolution": "1280x720",
"fps": 25,
"device_id": 0
}
}Enable debug visualization to see real-time processing:
# Enable debug viewer
# Set debug.viewer = true in configurationThe debug viewer shows:
- Original camera feed
- Detected keypoints (colored circles)
- Skeletal connections (colored lines)
- Confidence scores
- Processing statistics
- Model Precision: Use FP16 precision for faster inference on supported hardware
- Batch Processing: Process multiple frames simultaneously when possible
- Async Pipeline: Utilizes OpenVINO's asynchronous inference for better throughput
# Enable CUDA acceleration (NVIDIA GPUs)
CUDA = true
# Or use Intel GPU acceleration
# Set OpenVINO device to GPU in model configuration- Resolution: Lower resolution improves speed but reduces accuracy
- FPS: Adjust target FPS based on hardware capabilities
- Threading: Utilize multiple CPU cores for parallel processing
- OpenVINO Model Not Found
# Manually download model
wget https://storage.openvinotoolkit.org/repositories/open_model_zoo/2023.0/models_bin/1/human-pose-estimation-0001/FP32/human-pose-estimation-0001.xml
wget https://storage.openvinotoolkit.org/repositories/open_model_zoo/2023.0/models_bin/1/human-pose-estimation-0001/FP32/human-pose-estimation-0001.bin- Camera Access Issues
# Linux: Check camera permissions
sudo usermod -a -G video $USER
# Logout and login again- Build Errors
# Verify OpenVINO path in CMakeLists.txt
# Update paths to match your installation- Performance Issues
# Check CPU/GPU utilization
htop
nvidia-smi # For NVIDIA GPUs
# Reduce resolution or FPS if neededEnable verbose logging for troubleshooting:
export OPENVINO_LOG_LEVEL=1
./hpe --verbose- Purpose: Main plugin class implementing MADS Source interface
- Key Methods:
setup_VideoCapture(): Initialize camera inputsetup_OpenPoseModel(): Load OpenVINO modelsetup_Pipeline(): Configure inference pipelineprocess_frame(): Process single framepublish_results(): Send results to MADS
- Purpose: OpenVINO model wrapper for pose estimation
- Key Methods:
preprocess(): Prepare input datapostprocess(): Parse model outputextractPoses(): Generate pose structures
| Parameter | Type | Default | Description |
|---|---|---|---|
camera_device |
int | 0 | Camera device ID |
azure_device |
int | 0 | Azure Kinect device ID |
fps |
int | 25 | Target frame rate |
resolution_rgb |
string | "1280x720" | Camera resolution |
model_file |
string | "models/..." | OpenVINO model path |
CUDA |
bool | false | Enable CUDA acceleration |
dummy |
bool | false | Use dummy input |
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.
Note: This plugin requires proper camera setup and OpenVINO installation. Ensure all dependencies are correctly installed before building.