OCEC

Open closed eyes classification. Ultra-fast wink and blink estimation model.

In the real world, attempting to detect eyes larger than 20 pixels high and 40 pixels wide is a waste of computational resources.

output_.mp4

Variant	Size	F1	CPU inference latency	ONNX
P	112 KB	0.9924	0.16 ms	Download
N	176 KB	0.9933	0.25 ms	Download
S	494 KB	0.9943	0.41 ms	Download
C	875 KB	0.9947	0.49 ms	Download
M	1.7 MB	0.9949	0.57 ms	Download
L	6.4 MB	0.9954	0.80 ms	Download

Setup

git clone https://github.com/PINTO0309/OCEC.git && cd OCEC
curl -LsSf https://astral.sh/uv/install.sh | sh
uv sync
source .venv/bin/activate

Inference

uv run python demo_ocec.py \
-v 0 \
-m deimv2_dinov3_s_wholebody34_1750query_n_batch_640x640.onnx \
-om ocec_l.onnx \
-ep cuda

uv run python demo_ocec.py \
-v 0 \
-m deimv2_dinov3_s_wholebody34_1750query_n_batch_640x640.onnx \
-om ocec_l.onnx \
-ep tensorrt

Dataset Preparation

uv run python 01_dataset_viewer.py --split train
uv run python 01_dataset_viewer.py --split train --visualize

uv run python 01_dataset_viewer.py --split train --extract

uv run python 02_real_data_size_hist.py \
-v real_data/open.mp4 \
-oep tensorrt \
-dvw

# [Eye Analysis] open
#   Total frames processed: 930
#   Frames with Eye detections: 930
#   Frames without Eye detections: 0
#   Frames with ≥3 Eye detections: 2
#   Total Eye detections: 1818
#   Histogram PNG: output_eye_analysis/open_eye_size_hist.png
#     Width  -> mean=20.94, median=22.00
#     Height -> mean=11.39, median=11.00

uv run python 02_real_data_size_hist.py \
-v real_data/closed.mp4 \
-oep tensorrt \
-dvw

# [Eye Analysis] closed
#   Total frames processed: 1016
#   Frames with Eye detections: 1016
#   Frames without Eye detections: 0
#   Frames with ≥3 Eye detections: 38
#   Total Eye detections: 1872
#   Histogram PNG: output_eye_analysis/closed_eye_size_hist.png
#     Width  -> mean=15.25, median=14.00
#     Height -> mean=8.17, median=7.00

Considering practical real-world sizes,
I adopt an input resolution of height x width = 24 x 40.

uv run python 03_wholebody34_data_extractor.py \
-ea \
-m deimv2_dinov3_x_wholebody34_680query_n_batch_640x640.onnx \
-oep tensorrt

# Eye-only detection summary
#   Total images: 131174
#   Images with detection: 130596
#   Images without detection: 578
#   Images with >=3 detections: 1278
#   Crops per label:
#     closed: 134522
#     open: 110796

# Eye-only detection summary
#   Total images: 144146
#   Images with detection: 143364
#   Images without detection: 782
#   Images with >=3 detections: 1221
#   Crops per label:
#     closed: 136347
#     open: 135319

Data sample

Label	ex1	ex2	ex3	ex4	ex5	ex6	ex7	ex8	ex9	ex10
open
closed

uv run python 04_dataset_convert_to_parquet.py \
--annotation data/cropped/annotation.csv \
--output data/dataset.parquet \
--train-ratio 0.8 \
--seed 42 \
--embed-images

#Split summary: {'train_total': 196253, 'train_closed': 107617, 'train_open': 88636, 'val_total': 49065, 'val_closed': 26905, 'val_open': 22160}
#Saved dataset to data/dataset.parquet (245318 rows).

# Split summary: {'train_total': 217332, 'train_closed': 109077, 'train_open': 108255, 'val_total': 54334, 'val_closed': 27270, 'val_open': 27064}
# Saved dataset to data/dataset.parquet (271666 rows).

Generated parquet schema (split, label, class_id, image_path, source):

split: train or val, assigned with an 80/20 stratified split per label.
label: string eye state (open, closed); inferred from filename or class id.
class_id: integer class id (0 closed, 1 open) maintained from the annotation.
image_path: path to the cropped PNG stored under data/cropped/....
source: train_dataset for 000000001-prefixed folders, real_data for 100000001+, unknown otherwise.
image_bytes (optional): raw PNG bytes for each crop when --embed-images is supplied.

Rows are stratified within each label before concatenation, so both splits keep similar open/closed proportions. Class counts per split are printed when the conversion script runs.

Training Pipeline

Use the images located under dataset/output/002_xxxx_front_yyyyyy together with their annotations in dataset/output/002_xxxx_front.csv.
Every augmented image that originates from the same still_image stays in the same split to prevent leakage.
The training loop relies on BCEWithLogitsLoss, pos_weight, and a WeightedRandomSampler to stabilise optimisation under class imbalance; inference produces sigmoid probabilities.
Training history, validation metrics, optional test predictions, checkpoints, configuration JSON, and ONNX exports are produced automatically.
Per-epoch checkpoints named like ocec_epoch_0001.pt are retained (latest 10), as well as the best checkpoints named ocec_best_epoch0004_f1_0.9321.pt (also latest 10).

The backbone can be switched with --arch_variant. Supported combinations with --head_variant are:

`--arch_variant`	Default (`--head_variant auto`)	Explicitly selectable heads	Remarks
`baseline`	`avg`	`avg`, `avgmax_mlp`	When using `transformer`/`mlp_mixer`, you need to adjust the height and width of the feature map so that they are divisible by `--token_mixer_grid` (if left as is, an exception will occur during ONNX conversion or inference).
`inverted_se`	`avgmax_mlp`	`avg`, `avgmax_mlp`	When using `transformer`/`mlp_mixer`, it is necessary to adjust `--token_mixer_grid` as above.
`convnext`	`transformer`	`avg`, `avgmax_mlp`, `transformer`, `mlp_mixer`	For both heads, the grid must be divisible by the feature map (default `3x2` fits with 30x48 input).

The classification head is selected with --head_variant (avg, avgmax_mlp, transformer, mlp_mixer, or auto which derives a sensible default from the backbone).
Mixed precision can be enabled with --use_amp when CUDA is available.
Resume training with --resume path/to/ocec_epoch_XXXX.pt; all optimiser/scheduler/AMP states and history are restored.
Loss/accuracy/F1 metrics are logged to TensorBoard under output_dir, and tqdm progress bars expose per-epoch progress for train/val/test loops.

Baseline depthwise-separable CNN:

uv run python -m ocec train \
--data_root data/dataset.parquet \
--output_dir runs/ocec \
--epochs 50 \
--batch_size 256 \
--train_ratio 0.8 \
--val_ratio 0.2 \
--image_size 24x40 \
--base_channels 32 \
--num_blocks 4 \
--arch_variant baseline \
--seed 42 \
--device auto \
--use_amp

Inverted residual + SE variant (recommended for higher capacity):

uv run python -m ocec train \
--data_root data/dataset.parquet \
--output_dir runs/ocec_is_s \
--epochs 50 \
--batch_size 256 \
--train_ratio 0.8 \
--val_ratio 0.2 \
--image_size 24x40 \
--base_channels 32 \
--num_blocks 4 \
--arch_variant inverted_se \
--head_variant avgmax_mlp \
--seed 42 \
--device auto \
--use_amp

ConvNeXt-style backbone with transformer head over pooled tokens:

uv run python -m ocec train \
--data_root data/dataset.parquet \
--output_dir runs/ocec_convnext \
--epochs 50 \
--batch_size 256 \
--train_ratio 0.8 \
--val_ratio 0.2 \
--image_size 24x40 \
--base_channels 32 \
--num_blocks 4 \
--arch_variant convnext \
--head_variant transformer \
--token_mixer_grid 3x2 \
--seed 42 \
--device auto \
--use_amp

Outputs include the latest 10 ocec_epoch_*.pt, the latest 10 ocec_best_epochXXXX_f1_YYYY.pt (highest validation F1, or training F1 when no validation split), history.json, summary.json, optional test_predictions.csv, and train.log.
After every epoch a confusion matrix and ROC curve are saved under runs/ocec/diagnostics/<split>/confusion_<split>_epochXXXX.png and roc_<split>_epochXXXX.png.
--image_size accepts either a single integer for square crops (e.g. --image_size 48) or HEIGHTxWIDTH to resize non-square frames (e.g. --image_size 64x48).
Add --resume <checkpoint> to continue from an earlier epoch. Remember that --epochs indicates the desired total epoch count (e.g. resuming --epochs 40 after training to epoch 30 will run 10 additional epochs).
Launch TensorBoard with:
```
tensorboard --logdir runs/ocec
```

ONNX Export

uv run python -m ocec exportonnx \
--checkpoint runs/ocec_is_s/ocec_best_epoch0049_f1_0.9939.pt \
--output ocec_s.onnx \
--opset 17

The saved graph exposes images as input and prob_open as output (batch dimension is dynamic); probabilities can be consumed directly.
After exporting, the tool runs onnxsim for simplification and rewrites any remaining BatchNormalization nodes into affine Mul/Add primitives. If simplification fails, a warning is emitted and the unsimplified model is preserved.

Arch

Ultra-lightweight classification model series

VSDLM: Visual-only speech detection driven by lip movements - MIT License
OCEC: Open closed eyes classification. Ultra-fast wink and blink estimation model - MIT License
PGC: Ultrafast pointing gesture classification - MIT License
SC: Ultrafast sitting classification - MIT License
PUC: Phone Usage Classifier is a three-class image classification pipeline for understanding how people interact with smartphones - MIT License
HSC: Happy smile classifier - MIT License

Citation

If you find this project useful, please consider citing:

@software{hyodo2025ocec,
  author    = {Katsuya Hyodo},
  title     = {PINTO0309/OCEC},
  month     = {10},
  year      = {2025},
  publisher = {Zenodo},
  doi       = {10.5281/zenodo.17505461},
  url       = {https://github.com/PINTO0309/ocec},
  abstract  = {Open closed eyes classification. Ultra-fast wink/blink estimation model.},
}

Acknowledgements

https://huggingface.co/datasets/MichalMlodawski/closed-open-eyes: Open Data Commons Attribution License (ODC-By) v1.0

@misc{open_closed_eyes2024,
  author = {Michał Młodawski},
  title = {Open and Closed Eyes Dataset},
  month = July,
  year = 2024,
  url = {https://huggingface.co/datasets/MichalMlodawski/closed-open-eyes},
}

https://github.com/PINTO0309/PINTO_model_zoo/tree/main/472_DEIMv2-Wholebody34 - Apache 2.0

Name		Name	Last commit message	Last commit date
Latest commit History 63 Commits
data		data
ocec		ocec
.gitignore		.gitignore
.python-version		.python-version
01_dataset_viewer.py		01_dataset_viewer.py
02_real_data_size_hist.py		02_real_data_size_hist.py
03_wholebody34_data_extractor.py		03_wholebody34_data_extractor.py
04_dataset_convert_to_parquet.py		04_dataset_convert_to_parquet.py
LICENSE		LICENSE
README.md		README.md
demo_ocec.py		demo_ocec.py
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

OCEC

Setup

Inference

Dataset Preparation

Data sample

Training Pipeline

ONNX Export

Arch

Ultra-lightweight classification model series

Citation

Acknowledgements

About

Uh oh!

Releases 1

Packages

Languages

License

PINTO0309/OCEC

Folders and files

Latest commit

History

Repository files navigation

OCEC

Setup

Inference

Dataset Preparation

Data sample

Training Pipeline

ONNX Export

Arch

Ultra-lightweight classification model series

Citation

Acknowledgements

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages