Skip to content
/ WHC Public

Waving Hand Classification. Ultrafast 1x3x4x32x32 3DConv gesture estimation.

License

Notifications You must be signed in to change notification settings

PINTO0309/WHC

Repository files navigation

WHC

DOI GitHub License Ask DeepWiki

Waving Hand Classification. Ultrafast 1x3x4x32x32 3DConv gesture estimation.

output_1.mp4
output_2.mp4
Variant Size Seq F1 CPU
inference
latency
ONNX
static seq
ONNX
dynamic seq
S 1.1 MB 4 0.9821 0.31 ms Download Download
M 1.1 MB 6 0.9916 0.46 ms Download Download
L 1.1 MB 8 0.9940 0.37 ms Download Download

Data sample

1 2 3 4
image image image image

Setup

git clone https://github.com/PINTO0309/WHC.git && cd WHC
curl -LsSf https://astral.sh/uv/install.sh | sh
uv sync
source .venv/bin/activate

Inference

uv run python demo_whc.py \
-wm whc_seq_3dcnn_4x32x32.onnx \
-v 0 \
-ep cuda \
-dlr -dnm -dgm -dhm -dhd

uv run python demo_whc.py \
-wm whc_seq_3dcnn_4x32x32.onnx \
-v 0 \
-ep tensorrt \
-dlr -dnm -dgm -dhm -dhd

Dataset Preparation

uv run python 01_data_prep_realdata.py
class_distribution

Training Pipeline

3DCNN:

SEQ=4
SIZE=32x32
uv run python -m whc train \
--data_root data/dataset.parquet \
--output_dir runs/whc_seq_3dcnn_${SEQ}x${SIZE} \
--epochs 100 \
--batch_size 256 \
--train_resampling balanced \
--image_size ${SIZE} \
--base_channels 32 \
--seed 42 \
--device auto \
--use_amp \
--use_sequence 3dcnn \
--sequence_len ${SEQ}

LSTM:

SEQ=4
SIZE=32x32
uv run python -m whc train \
--data_root data/dataset.parquet \
--output_dir runs/whc_seq_lstm_${SEQ}x${SIZE} \
--epochs 100 \
--batch_size 256 \
--train_resampling balanced \
--image_size 32x32 \
--base_channels 32 \
--seed 42 \
--device auto \
--use_amp \
--use_sequence lstm \
--sequence_len ${SEQ}
  • Outputs include the latest 10 whc_epoch_*.pt, the latest 10 whc_best_epochXXXX_f1_YYYY.pt (highest validation F1, or training F1 when no validation split), history.json, summary.json, optional test_predictions.csv, and train.log.
  • After every epoch a confusion matrix and ROC curve are saved under runs/whc/diagnostics/<split>/confusion_<split>_epochXXXX.png and roc_<split>_epochXXXX.png.
  • --image_size accepts either a single integer for square crops (e.g. --image_size 32) or HEIGHTxWIDTH to resize non-square frames (e.g. --image_size 64x48).
  • Add --resume <checkpoint> to continue from an earlier epoch. Remember that --epochs indicates the desired total epoch count (e.g. resuming --epochs 40 after training to epoch 30 will run 10 additional epochs).
  • Launch TensorBoard with:
    tensorboard --logdir runs/whc

ONNX Export

uv run python -m whc exportonnx \
--checkpoint runs/whc_seq_3dcnn_${SEQ}x${SIZE}/whc_best_epoch0049_f1_0.9939.pt \
--output whc_seq_3dcnn_4x32x32.onnx \
--opset 17

Arch

whc_seq_3dcnn_4x32x32

Ultra-lightweight classification model series

  1. VSDLM: Visual-only speech detection driven by lip movements - MIT License
  2. OCEC: Open closed eyes classification. Ultra-fast wink and blink estimation model - MIT License
  3. PGC: Ultrafast pointing gesture classification - MIT License
  4. SC: Ultrafast sitting classification - MIT License
  5. PUC: Phone Usage Classifier is a three-class image classification pipeline for understanding how people interact with smartphones - MIT License
  6. HSC: Happy smile classifier - MIT License
  7. WHC: Waving Hand Classification - MIT License
  8. UHD: Ultra-lightweight human detection - MIT License

Citation

If you find this project useful, please consider citing:

@software{hyodo2025whc,
  author    = {Katsuya Hyodo},
  title     = {PINTO0309/WHC},
  month     = {11},
  year      = {2025},
  publisher = {Zenodo},
  doi       = {10.5281/zenodo.17690769},
  url       = {https://github.com/PINTO0309/whc},
  abstract  = {Waving Hand Classification.},
}

Acknowledgments

  • https://github.com/PINTO0309/PINTO_model_zoo/tree/main/472_DEIMv2-Wholebody34: Apache 2.0 License
    @software{DEIMv2-Wholebody34,
      author={Katsuya Hyodo},
      title={Lightweight human detection models generated on high-quality human data sets. It can detect objects with high accuracy and speed in a total of 28 classes: body, adult, child, male, female, body_with_wheelchair, body_with_crutches, head, front, right-front, right-side, right-back, back, left-back, left-side, left-front, face, eye, nose, mouth, ear, collarbone, shoulder, solar_plexus, elbow, wrist, hand, hand_left, hand_right, abdomen, hip_joint, knee, ankle, foot.},
      url={https://github.com/PINTO0309/PINTO_model_zoo/tree/main/472_DEIMv2-Wholebody34},
      year={2025},
      month={10},
      doi={10.5281/zenodo.17625710}
    }

About

Waving Hand Classification. Ultrafast 1x3x4x32x32 3DConv gesture estimation.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages