Skip to content
/ HSC Public

Happy smile classifier. The estimation is done on the entire head, 48x48 pixels, rather than the face.

License

Notifications You must be signed in to change notification settings

PINTO0309/HSC

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

25 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

HSC

DOI GitHub License Ask DeepWiki

Happy smile classifier. The estimation is done on the entire head, 48x48 pixels, rather than the face.

output_.mp4
Variant Size F1 CPU
inference
latency
ONNX
P 115 KB 0.4841 0.23 ms Download
N 176 KB 0.5849 0.41 ms Download
T 280 KB 0.6701 0.52 ms Download
S 495 KB 0.7394 0.64 ms Download
C 876 KB 0.7344 0.69 ms Download
M 1.7 MB 0.8144 0.85 ms Download
L 6.4 MB 0.8293 1.03 ms Download

Data sample

1 2 3 4
image image image image

Setup

git clone https://github.com/PINTO0309/HSC.git && cd HSC
curl -LsSf https://astral.sh/uv/install.sh | sh
uv sync
source .venv/bin/activate

Inference

uv run python demo_hsc.py \
-hm hsc_l_48x48.onnx \
-v 0 \
-ep cuda \
-dlr -dnm -dgm -dhm -dhd

uv run python demo_hsc.py \
-hm hsc_l_48x48.onnx \
-v 0 \
-ep tensorrt \
-dlr -dnm -dgm -dhm -dhd

Dataset Preparation

uv run python 01_build_smile_parquet.py

Training Pipeline

  • Use the images located under dataset/output/002_xxxx_front_yyyyyy together with their annotations in dataset/output/002_xxxx_front.csv.

  • Every augmented image that originates from the same still_image stays in the same split to prevent leakage.

  • The training loop relies on BCEWithLogitsLoss plus class-balanced pos_weight to stabilise optimisation under class imbalance; inference produces sigmoid probabilities. Use --train_resampling weighted to switch on the previous WeightedRandomSampler behaviour, or --train_resampling balanced to physically duplicate minority classes before shuffling.

  • Training history, validation metrics, optional test predictions, checkpoints, configuration JSON, and ONNX exports are produced automatically.

  • Per-epoch checkpoints named like hsc_epoch_0001.pt are retained (latest 10), as well as the best checkpoints named hsc_best_epoch0004_f1_0.9321.pt (also latest 10).

  • The backbone can be switched with --arch_variant. Supported combinations with --head_variant are:

    --arch_variant Default (--head_variant auto) Explicitly selectable heads Remarks
    baseline avg avg, avgmax_mlp When using transformer/mlp_mixer, you need to adjust the height and width of the feature map so that they are divisible by --token_mixer_grid (if left as is, an exception will occur during ONNX conversion or inference).
    inverted_se avgmax_mlp avg, avgmax_mlp When using transformer/mlp_mixer, it is necessary to adjust --token_mixer_grid as above.
    convnext transformer avg, avgmax_mlp, transformer, mlp_mixer For both heads, the grid must be divisible by the feature map (default 3x2 fits with 30x48 input).
  • The classification head is selected with --head_variant (avg, avgmax_mlp, transformer, mlp_mixer, or auto which derives a sensible default from the backbone).

  • Pass --rgb_to_yuv_to_y to convert RGB crops to YUV, keep only the Y (luma) channel inside the network, and train a single-channel stem without modifying the dataloader.

  • Alternatively, use --rgb_to_lab or --rgb_to_luv to convert inputs to CIE Lab/Luv (3-channel) before the stem; these options are mutually exclusive with each other and with --rgb_to_yuv_to_y.

  • Mixed precision can be enabled with --use_amp when CUDA is available.

  • Resume training with --resume path/to/hsc_epoch_XXXX.pt; all optimiser/scheduler/AMP states and history are restored.

  • Loss/accuracy/F1 metrics are logged to TensorBoard under output_dir, and tqdm progress bars expose per-epoch progress for train/val/test loops.

Baseline depthwise-separable CNN:

SIZE=48x48
uv run python -m hsc train \
--data_root data/dataset.parquet \
--output_dir runs/hsc_${SIZE} \
--epochs 100 \
--batch_size 256 \
--train_resampling balanced \
--image_size ${SIZE} \
--base_channels 32 \
--num_blocks 4 \
--arch_variant baseline \
--seed 42 \
--device auto \
--use_amp

Inverted residual + SE variant (recommended for higher capacity):

SIZE=48x48
VAR=s
uv run python -m hsc train \
--data_root data/dataset.parquet \
--output_dir runs/hsc_is_${VAR}_${SIZE} \
--epochs 100 \
--batch_size 256 \
--train_resampling balanced \
--image_size ${SIZE} \
--base_channels 32 \
--num_blocks 4 \
--arch_variant inverted_se \
--head_variant avgmax_mlp \
--seed 42 \
--device auto \
--use_amp

ConvNeXt-style backbone with transformer head over pooled tokens:

SIZE=48x48
uv run python -m hsc train \
--data_root data/dataset.parquet \
--output_dir runs/hsc_convnext_${SIZE} \
--epochs 100 \
--batch_size 256 \
--train_resampling balanced \
--image_size ${SIZE} \
--base_channels 32 \
--num_blocks 4 \
--arch_variant convnext \
--head_variant transformer \
--token_mixer_grid 3x3 \
--seed 42 \
--device auto \
--use_amp
  • Outputs include the latest 10 hsc_epoch_*.pt, the latest 10 hsc_best_epochXXXX_f1_YYYY.pt (highest validation F1, or training F1 when no validation split), history.json, summary.json, optional test_predictions.csv, and train.log.
  • After every epoch a confusion matrix and ROC curve are saved under runs/hsc/diagnostics/<split>/confusion_<split>_epochXXXX.png and roc_<split>_epochXXXX.png.
  • --image_size accepts either a single integer for square crops (e.g. --image_size 48) or HEIGHTxWIDTH to resize non-square frames (e.g. --image_size 64x48).
  • Add --resume <checkpoint> to continue from an earlier epoch. Remember that --epochs indicates the desired total epoch count (e.g. resuming --epochs 40 after training to epoch 30 will run 10 additional epochs).
  • Launch TensorBoard with:
    tensorboard --logdir runs/hsc

ONNX Export

uv run python -m hsc exportonnx \
--checkpoint runs/hsc_is_s_48x48/hsc_best_epoch0049_f1_0.9939.pt \
--output hsc_s_48x48.onnx \
--opset 17
  • The saved graph exposes images as input and prob_smiling as output (batch dimension is dynamic); probabilities can be consumed directly.
  • After exporting, the tool runs onnxsim for simplification and rewrites any remaining BatchNormalization nodes into affine Mul/Add primitives. If simplification fails, a warning is emitted and the unsimplified model is preserved.

Arch

hsc_p_48x48

Ultra-lightweight classification model series

  1. VSDLM: Visual-only speech detection driven by lip movements - MIT License
  2. OCEC: Open closed eyes classification. Ultra-fast wink and blink estimation model - MIT License
  3. PGC: Ultrafast pointing gesture classification - MIT License
  4. SC: Ultrafast sitting classification - MIT License
  5. PUC: Phone Usage Classifier is a three-class image classification pipeline for understanding how people interact with smartphones - MIT License
  6. HSC: Happy smile classifier - MIT License

Citation

If you find this project useful, please consider citing:

@software{hyodo2025hsc,
  author    = {Katsuya Hyodo},
  title     = {PINTO0309/HSC},
  month     = {11},
  year      = {2025},
  publisher = {Zenodo},
  doi       = {10.5281/zenodo.17670546},
  url       = {https://github.com/PINTO0309/hsc},
  abstract  = {Happy smile classifier.},
}

Acknowledgments

  • https://github.com/microsoft/FERPlus: MIT License
    @inproceedings{BarsoumICMI2016,
        title={Training Deep Networks for Facial Expression Recognition with Crowd-Sourced Label Distribution},
        author={Barsoum, Emad and Zhang, Cha and Canton Ferrer, Cristian and Zhang, Zhengyou},
        booktitle={ACM International Conference on Multimodal Interaction (ICMI)},
        year={2016}
    }
  • https://github.com/PINTO0309/PINTO_model_zoo/tree/main/472_DEIMv2-Wholebody34: Apache 2.0 License
    @software{DEIMv2-Wholebody34,
      author={Katsuya Hyodo},
      title={Lightweight human detection models generated on high-quality human data sets. It can detect objects with high accuracy and speed in a total of 28 classes: body, adult, child, male, female, body_with_wheelchair, body_with_crutches, head, front, right-front, right-side, right-back, back, left-back, left-side, left-front, face, eye, nose, mouth, ear, collarbone, shoulder, solar_plexus, elbow, wrist, hand, hand_left, hand_right, abdomen, hip_joint, knee, ankle, foot.},
      url={https://github.com/PINTO0309/PINTO_model_zoo/tree/main/472_DEIMv2-Wholebody34},
      year={2025},
      month={10},
      doi={10.5281/zenodo.17625710}
    }

About

Happy smile classifier. The estimation is done on the entire head, 48x48 pixels, rather than the face.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages