Open closed eyes classification. Ultra-fast wink and blink estimation model.
In the real world, attempting to detect eyes larger than 20 pixels high and 40 pixels wide is a waste of computational resources.
output_.mp4
| Variant | Size | F1 | CPU inference latency |
ONNX |
|---|---|---|---|---|
| P | 112 KB | 0.9924 | 0.16 ms | Download |
| N | 176 KB | 0.9933 | 0.25 ms | Download |
| S | 494 KB | 0.9943 | 0.41 ms | Download |
| C | 875 KB | 0.9947 | 0.49 ms | Download |
| M | 1.7 MB | 0.9949 | 0.57 ms | Download |
| L | 6.4 MB | 0.9954 | 0.80 ms | Download |
git clone https://github.com/PINTO0309/OCEC.git && cd OCEC
curl -LsSf https://astral.sh/uv/install.sh | sh
uv sync
source .venv/bin/activateuv run python demo_ocec.py \
-v 0 \
-m deimv2_dinov3_s_wholebody34_1750query_n_batch_640x640.onnx \
-om ocec_l.onnx \
-ep cuda
uv run python demo_ocec.py \
-v 0 \
-m deimv2_dinov3_s_wholebody34_1750query_n_batch_640x640.onnx \
-om ocec_l.onnx \
-ep tensorrtuv run python 01_dataset_viewer.py --split train
uv run python 01_dataset_viewer.py --split train --visualizeuv run python 01_dataset_viewer.py --split train --extractuv run python 02_real_data_size_hist.py \
-v real_data/open.mp4 \
-oep tensorrt \
-dvw
# [Eye Analysis] open
# Total frames processed: 930
# Frames with Eye detections: 930
# Frames without Eye detections: 0
# Frames with ≥3 Eye detections: 2
# Total Eye detections: 1818
# Histogram PNG: output_eye_analysis/open_eye_size_hist.png
# Width -> mean=20.94, median=22.00
# Height -> mean=11.39, median=11.00
uv run python 02_real_data_size_hist.py \
-v real_data/closed.mp4 \
-oep tensorrt \
-dvw
# [Eye Analysis] closed
# Total frames processed: 1016
# Frames with Eye detections: 1016
# Frames without Eye detections: 0
# Frames with ≥3 Eye detections: 38
# Total Eye detections: 1872
# Histogram PNG: output_eye_analysis/closed_eye_size_hist.png
# Width -> mean=15.25, median=14.00
# Height -> mean=8.17, median=7.00Considering practical real-world sizes,
I adopt an input resolution of height x width = 24 x 40.
uv run python 03_wholebody34_data_extractor.py \
-ea \
-m deimv2_dinov3_x_wholebody34_680query_n_batch_640x640.onnx \
-oep tensorrt
# Eye-only detection summary
# Total images: 131174
# Images with detection: 130596
# Images without detection: 578
# Images with >=3 detections: 1278
# Crops per label:
# closed: 134522
# open: 110796
# Eye-only detection summary
# Total images: 144146
# Images with detection: 143364
# Images without detection: 782
# Images with >=3 detections: 1221
# Crops per label:
# closed: 136347
# open: 135319| Label | ex1 | ex2 | ex3 | ex4 | ex5 | ex6 | ex7 | ex8 | ex9 | ex10 |
|---|---|---|---|---|---|---|---|---|---|---|
| open | ||||||||||
| closed |
uv run python 04_dataset_convert_to_parquet.py \
--annotation data/cropped/annotation.csv \
--output data/dataset.parquet \
--train-ratio 0.8 \
--seed 42 \
--embed-images
#Split summary: {'train_total': 196253, 'train_closed': 107617, 'train_open': 88636, 'val_total': 49065, 'val_closed': 26905, 'val_open': 22160}
#Saved dataset to data/dataset.parquet (245318 rows).
# Split summary: {'train_total': 217332, 'train_closed': 109077, 'train_open': 108255, 'val_total': 54334, 'val_closed': 27270, 'val_open': 27064}
# Saved dataset to data/dataset.parquet (271666 rows).Generated parquet schema (split, label, class_id, image_path, source):
split:trainorval, assigned with an 80/20 stratified split per label.label: string eye state (open,closed); inferred from filename or class id.class_id: integer class id (0closed,1open) maintained from the annotation.image_path: path to the cropped PNG stored underdata/cropped/....source:train_datasetfor000000001-prefixed folders,real_datafor100000001+,unknownotherwise.image_bytes(optional): raw PNG bytes for each crop when--embed-imagesis supplied.
Rows are stratified within each label before concatenation, so both splits keep similar open/closed proportions. Class counts per split are printed when the conversion script runs.
-
Use the images located under
dataset/output/002_xxxx_front_yyyyyytogether with their annotations indataset/output/002_xxxx_front.csv. -
Every augmented image that originates from the same
still_imagestays in the same split to prevent leakage. -
The training loop relies on
BCEWithLogitsLoss,pos_weight, and aWeightedRandomSamplerto stabilise optimisation under class imbalance; inference produces sigmoid probabilities. -
Training history, validation metrics, optional test predictions, checkpoints, configuration JSON, and ONNX exports are produced automatically.
-
Per-epoch checkpoints named like
ocec_epoch_0001.ptare retained (latest 10), as well as the best checkpoints namedocec_best_epoch0004_f1_0.9321.pt(also latest 10). -
The backbone can be switched with
--arch_variant. Supported combinations with--head_variantare:--arch_variantDefault ( --head_variant auto)Explicitly selectable heads Remarks baselineavgavg,avgmax_mlpWhen using transformer/mlp_mixer, you need to adjust the height and width of the feature map so that they are divisible by--token_mixer_grid(if left as is, an exception will occur during ONNX conversion or inference).inverted_seavgmax_mlpavg,avgmax_mlpWhen using transformer/mlp_mixer, it is necessary to adjust--token_mixer_gridas above.convnexttransformeravg,avgmax_mlp,transformer,mlp_mixerFor both heads, the grid must be divisible by the feature map (default 3x2fits with 30x48 input). -
The classification head is selected with
--head_variant(avg,avgmax_mlp,transformer,mlp_mixer, orautowhich derives a sensible default from the backbone). -
Mixed precision can be enabled with
--use_ampwhen CUDA is available. -
Resume training with
--resume path/to/ocec_epoch_XXXX.pt; all optimiser/scheduler/AMP states and history are restored. -
Loss/accuracy/F1 metrics are logged to TensorBoard under
output_dir, andtqdmprogress bars expose per-epoch progress for train/val/test loops.
Baseline depthwise-separable CNN:
uv run python -m ocec train \
--data_root data/dataset.parquet \
--output_dir runs/ocec \
--epochs 50 \
--batch_size 256 \
--train_ratio 0.8 \
--val_ratio 0.2 \
--image_size 24x40 \
--base_channels 32 \
--num_blocks 4 \
--arch_variant baseline \
--seed 42 \
--device auto \
--use_ampInverted residual + SE variant (recommended for higher capacity):
uv run python -m ocec train \
--data_root data/dataset.parquet \
--output_dir runs/ocec_is_s \
--epochs 50 \
--batch_size 256 \
--train_ratio 0.8 \
--val_ratio 0.2 \
--image_size 24x40 \
--base_channels 32 \
--num_blocks 4 \
--arch_variant inverted_se \
--head_variant avgmax_mlp \
--seed 42 \
--device auto \
--use_ampConvNeXt-style backbone with transformer head over pooled tokens:
uv run python -m ocec train \
--data_root data/dataset.parquet \
--output_dir runs/ocec_convnext \
--epochs 50 \
--batch_size 256 \
--train_ratio 0.8 \
--val_ratio 0.2 \
--image_size 24x40 \
--base_channels 32 \
--num_blocks 4 \
--arch_variant convnext \
--head_variant transformer \
--token_mixer_grid 3x2 \
--seed 42 \
--device auto \
--use_amp- Outputs include the latest 10
ocec_epoch_*.pt, the latest 10ocec_best_epochXXXX_f1_YYYY.pt(highest validation F1, or training F1 when no validation split),history.json,summary.json, optionaltest_predictions.csv, andtrain.log. - After every epoch a confusion matrix and ROC curve are saved under
runs/ocec/diagnostics/<split>/confusion_<split>_epochXXXX.pngandroc_<split>_epochXXXX.png. --image_sizeaccepts either a single integer for square crops (e.g.--image_size 48) orHEIGHTxWIDTHto resize non-square frames (e.g.--image_size 64x48).- Add
--resume <checkpoint>to continue from an earlier epoch. Remember that--epochsindicates the desired total epoch count (e.g. resuming--epochs 40after training to epoch 30 will run 10 additional epochs). - Launch TensorBoard with:
tensorboard --logdir runs/ocec
uv run python -m ocec exportonnx \
--checkpoint runs/ocec_is_s/ocec_best_epoch0049_f1_0.9939.pt \
--output ocec_s.onnx \
--opset 17- The saved graph exposes
imagesas input andprob_openas output (batch dimension is dynamic); probabilities can be consumed directly. - After exporting, the tool runs
onnxsimfor simplification and rewrites any remaining BatchNormalization nodes into affineMul/Addprimitives. If simplification fails, a warning is emitted and the unsimplified model is preserved.
- VSDLM: Visual-only speech detection driven by lip movements - MIT License
- OCEC: Open closed eyes classification. Ultra-fast wink and blink estimation model - MIT License
- PGC: Ultrafast pointing gesture classification - MIT License
- SC: Ultrafast sitting classification - MIT License
- PUC: Phone Usage Classifier is a three-class image classification pipeline for understanding how people interact with smartphones - MIT License
- HSC: Happy smile classifier - MIT License
If you find this project useful, please consider citing:
@software{hyodo2025ocec,
author = {Katsuya Hyodo},
title = {PINTO0309/OCEC},
month = {10},
year = {2025},
publisher = {Zenodo},
doi = {10.5281/zenodo.17505461},
url = {https://github.com/PINTO0309/ocec},
abstract = {Open closed eyes classification. Ultra-fast wink/blink estimation model.},
}- https://huggingface.co/datasets/MichalMlodawski/closed-open-eyes: Open Data Commons Attribution License (ODC-By) v1.0
@misc{open_closed_eyes2024, author = {Michał Młodawski}, title = {Open and Closed Eyes Dataset}, month = July, year = 2024, url = {https://huggingface.co/datasets/MichalMlodawski/closed-open-eyes}, }
- https://github.com/PINTO0309/PINTO_model_zoo/tree/main/472_DEIMv2-Wholebody34 - Apache 2.0