Improvement and Implementation of Skeleton Data Extraction Algorithm Based on Kinect Depth Information
本项目针对 Kinect V2 深度相机在手臂自遮挡场景下骨骼关节点定位不准确的问题,提出了一种基于深度图像特性的改进算法。
核心贡献:
-
深度图像前景人体提取 — 结合 Kinect 用户索引图像(IBodyIndexFrame)与阈值分割,从深度图像中精确提取人体区域,并通过中值滤波和数学形态学操作进行图像预处理。
-
自遮挡手臂关节点提取 — 利用手臂与躯干之间的深度值差异提取自遮挡手臂,通过 Guo-Hall 图像细化算法获得手臂骨架,再借助肘部关节点的位置稳定性定位手部关节点。支持左手、右手及双手同时自遮挡三种情况。
针对 Kinect V2 基于随机决策森林的骨骼跟踪在手臂自遮挡时手部关节点发生漂移的问题,本项目提出一种不依赖模型重训、低算力的几何后处理方法:在保留原生骨骼输出的基础上,利用"自遮挡手臂相对躯干离相机更近"的物理先验,从深度图中分离手臂,并经骨架细化与肘部锚定完成手部关节点重定位。其主要特点为:
- 轻量可落地 — 不触及分类器训练,可实时运行,对硬件无额外要求;
- 物理先验驱动、可解释 — 流水线各步均有明确的几何与成像依据,结果可追溯、可调试;
- 量化验证闭环 — 自遮挡场景下手部关节点平均定位误差由约 7.8 像素降至约 3.2 像素。
Kinect 采集深度图像 + 用户索引图像
↓
基于用户索引的人体前景提取 → 二值化 → 图像预处理
↓
截取人体中段 → 计算平均深度 → 阈值分割提取手臂
↓
基于躯干矩形判断是否存在自遮挡
↓
是 → Guo-Hall 细化 → 骨架端点定位 → 手部关节点
否 → 直接使用 Kinect 原始骨骼数据
↓
合并关节点 → 完整人体骨架
| 文件 | 说明 |
|---|---|
main.cpp |
主程序,包含完整的改进算法(人体提取、手臂分离、自遮挡判断、关节点定位、性能计时) |
Depth_Information.cpp |
在彩色图像上绘制 Kinect 原生骨架的参考程序 |
markedCoordinate.cpp |
人工标注工具,用于在保存的图像上手动点击标注关节点坐标 |
norm_L2.cpp |
误差计算工具,计算人工标注点与算法定位点之间的 L2 范数误差 |
硬件要求:
- Kinect V2 传感器 + Kinect Adapter for Windows
- 支持 USB 3.0 的 Windows PC
软件依赖:
| 依赖 | 版本 | 说明 |
|---|---|---|
| Visual Studio | 2019+ | C++ 开发环境 |
| Kinect for Windows SDK 2.0 | v2.0 | 下载地址 |
| OpenCV | 4.x | 核心图像处理 |
| opencv_contrib | 与 OpenCV 同版本 | 提供 ximgproc::thinning(Guo-Hall 细化算法) |
VS 项目配置:
安装 Kinect SDK 后,系统环境变量 KINECTSDK20_DIR 会自动设置。在 VS 项目属性中配置:
- C/C++ → 附加包含目录:
$(KINECTSDK20_DIR)\inc;+ OpenCV 的 include 路径 - 链接器 → 附加库目录:
$(KINECTSDK20_DIR)\lib\x64;+ OpenCV 的 lib 路径 - 链接器 → 附加依赖项:
kinect20.lib+ OpenCV 相关 lib
注意: opencv_contrib 需要单独下载并用 CMake 与 OpenCV 一起重新编译。
实验时需满足以下条件以获得最佳效果:
- Kinect 传感器离地高度:约 1.1~1.15 米
- 人体与传感器距离:2.35~2.5 米
- 传感器水平放置,人体正面朝向传感器
改进算法在手臂自遮挡场景下,手部关节点定位误差平均约 3.2 像素(Kinect 原算法平均约 7.8 像素),定位准确率显著提升。
当前算法的躯干截取范围与深度分割阈值针对固定机位、固定测距的部署条件标定,以保证目标场景下的精度与稳定性。后续工作包括:依据肩、髋关节与躯干深度分布实现参数的场景自适应;扩大被试与姿态样本规模并引入统计显著性分析;沿手臂运动链联合优化腕、掌关节并将结果反投影至三维坐标系;以及与学习型姿态估计方法融合,兼顾效率、精度与泛化能力。
This project addresses the inaccurate joint localization of Kinect V2's built-in skeleton tracking when arm self-occlusion occurs. An improved algorithm based on depth image characteristics is proposed.
Key Contributions:
-
Foreground Human Body Extraction from Depth Images — Combines Kinect's user index image (IBodyIndexFrame) with threshold segmentation to extract the human body region, followed by median filtering and morphological preprocessing.
-
Self-Occluded Arm Joint Extraction — Leverages the depth difference between the arm and torso to isolate occluded arms, applies Guo-Hall image thinning to obtain the arm skeleton, and locates hand joints using the positional stability of elbow joints. Supports left-arm, right-arm, and both-arm occlusion scenarios.
To address the hand-joint drift of Kinect V2's random-decision-forest skeleton tracking under arm self-occlusion, this project proposes a lightweight, retraining-free geometric post-processing method. Building on the native skeleton output, it exploits the physical prior that a self-occluded arm is closer to the camera than the torso to segment the arm from the depth image, then relocates the hand joint via skeleton thinning and elbow anchoring. Key features:
- Lightweight and deployable — no classifier retraining, runs in real time, no extra hardware required;
- Physically grounded and interpretable — every stage has a clear geometric basis, making results traceable and debuggable;
- Closed-loop quantitative validation — average hand-joint localization error under self-occlusion drops from ~7.8 px to ~3.2 px.
Kinect captures depth image + body index image
↓
Body extraction via user index → Binarization → Preprocessing
↓
Crop torso region → Compute average depth → Threshold to extract arm
↓
Check if hand joint is inside torso rectangle (occlusion detection)
↓
Yes → Guo-Hall thinning → Skeleton endpoints → Hand joint localization
No → Use Kinect's original skeleton data
↓
Merge joints → Complete body skeleton
| File | Description |
|---|---|
main.cpp |
Main program — full pipeline: body extraction, arm separation, occlusion detection, joint localization, performance timing |
Depth_Information.cpp |
Reference program for drawing Kinect's native skeleton on color images |
markedCoordinate.cpp |
Manual annotation tool for clicking joint positions on saved images |
norm_L2.cpp |
Error calculation tool — computes L2 norm between manual annotations and algorithm outputs |
Hardware:
- Kinect V2 sensor + Kinect Adapter for Windows
- Windows PC with USB 3.0
Software Dependencies:
| Dependency | Version | Notes |
|---|---|---|
| Visual Studio | 2019+ | C++ development |
| Kinect for Windows SDK 2.0 | v2.0 | Download |
| OpenCV | 4.x | Core image processing |
| opencv_contrib | Same as OpenCV | Provides ximgproc::thinning (Guo-Hall algorithm) |
VS Project Configuration:
After installing Kinect SDK, the environment variable KINECTSDK20_DIR is set automatically. Configure in VS project properties:
- C/C++ → Additional Include Directories:
$(KINECTSDK20_DIR)\inc;+ OpenCV include path - Linker → Additional Library Directories:
$(KINECTSDK20_DIR)\lib\x64;+ OpenCV lib path - Linker → Additional Dependencies:
kinect20.lib+ OpenCV libs
Note: opencv_contrib must be downloaded separately and compiled together with OpenCV using CMake.
For optimal results, the following conditions should be met:
- Kinect sensor height: ~1.1–1.15 m above ground
- Subject distance: 2.35–2.5 m from the sensor
- Sensor placed horizontally, subject facing the sensor
Under arm self-occlusion, the improved algorithm achieves an average hand joint localization error of ~3.2 pixels, compared to ~7.8 pixels with Kinect's built-in algorithm.
The torso cropping range and depth segmentation threshold are currently calibrated for a fixed camera placement and subject distance, ensuring accuracy and stability in the target scenario. Future work includes: scene-adaptive parameterization driven by shoulder/hip joints and torso depth distribution; larger-scale validation across more subjects and poses with statistical significance analysis; joint optimization of the wrist and palm along the arm kinematic chain with reprojection to 3D coordinates; and integration with learning-based pose estimators to balance efficiency, accuracy, and generalization.
This project is for academic and educational purposes.