A real-time human action recognition pipeline focused on sport activities using skeleton-based graph neural networks.
This project develops a complete pipeline for detecting and classifying human actions in real-time video streams. The system is specifically designed for sport activities and emphasizes the balance between accuracy and inference efficiency on consumer-grade hardware.
- Real-time processing: Optimized for consumer-grade GPUs (e.g., Apple M-series chips)
- Skeleton-based approach: Uses 2D pose estimation followed by graph-based action classification
- Sport-focused: Specialized for athletic movements and exercises
- Single-person detection: Designed for scenarios with one person in the frame
- Video Input: Real-time video stream or recorded video
- Pose Estimation: Extract 2D skeleton keypoints from each frame
- Graph Construction: Convert skeleton sequences into spatial-temporal graphs
- Action Classification: Classify actions using graph neural networks
- Output: Real-time action predictions with confidence scores
We utilize a curated dataset combining three major action recognition datasets, focusing on the intersection of sport-related activities:
- PennAction Dataset: Human actions with detailed annotations
- UCF101: Large-scale action recognition dataset
- Kinetics400: Diverse human action video dataset (5% subset)
The following sport activities are included in our unified dataset:
| Action | PennAction | UCF101 | Kinetics400 | Count |
|---|---|---|---|---|
| squat | ✓ | ✓ | ✓ | 389 |
| bench_press | ✓ | ✓ | ✓ | 344 |
| pullup | ✓ | ✓ | ✓ | 341 |
| pushup | ✓ | ✓ | ✓ | 333 |
| jump_rope | ✓ | ✓ | ✗ | 241 |
| jumping_jacks | ✓ | ✓ | ✗ | 235 |
| clean_and_jerk | ✓ | ✓ | ✗ | 235 |
| lunges | ✗ | ✓ | ✓ | 154 |
| wall_pushups | ✗ | ✓ | ✗ | 130 |
| situp | ✓ | ✗ | ✓ | 130 |
| handstand_pushups | ✗ | ✓ | ✗ | 128 |
| handstand_walking | ✗ | ✓ | ✗ | 111 |
| snatch_weight_lifting | ✗ | ✗ | ✓ | 37 |
| running_on_treadmill | ✗ | ✗ | ✓ | 11 |
- Install all dependencies and the main package:
pip install -e .-
Run the data pipeline described in prepare_data.md.
-
Run the training pipeline described in train_pipeline.md.
Launch the FastAPI service that wraps the YOLO + SODE inference stack:
python -m act_rec.api.app --host 0.0.0.0 --port 8000With the API running, start the demo UI:
streamlit run src/act_rec/app/app.pyBy default, the client looks for the API at http://localhost:8000. Point it elsewhere by exporting ACT_REC_API_URL=http://your-host:8000 before launching, or by editing the base URL field in the app sidebar.
This project targets consumer-grade GPUs (Apple Silicon, desktop NVIDIA cards, etc.). Docker on macOS does not expose the host GPU, and the Linux GPU support matrix varies between vendors. To keep the pipeline runnable on any machine with an accessible GPU, we encourage running the code directly in the host environment instead of relying on Docker images.