Vision Transformer (ViT) & EfficientNetV2 Models for Age Classification

Overview

This repository contains implementations of Vision Transformer (ViT) and EfficientNetV2 models for binary age classification. The models are trained to classify whether an individual is below 18 years old (0) or 18 and above (1) using panoramic dental x-rays.

These scripts use PyTorch and Torchvision to leverage pre-trained models with custom classification heads, and they train the models with binary cross-entropy loss (BCEWithLogitsLoss) over 25 epochs.

Features

Uses ViT-B-16 and EfficientNetV2-S from torchvision.models
Supports training with and without data augmentation
Applies data transformations and normalization for training stability
Supports GPU acceleration for faster training
Implements data filtering to train only on ages 14-24
Uses AdamW optimizer with weight decay for stability
Stores model checkpoints during training

Installation

Ensure you have the necessary dependencies installed:

pip install torch torchvision numpy pandas scikit-learn pillow matplotlib

How to Run the Scripts

Prepare your dataset: Place your dataset under the ./data/DentAgePooledDatav2/ directory or set the DATA_PATH environment variable.
Run the desired training script:
- Vision Transformer (ViT) - No Augmentation
```
python model_vit_no_augmentation.py
```
- Vision Transformer (ViT) - With Augmentation
```
python model_vit_augmented.py
```
- EfficientNetV2 - No Augmentation
```
python model_efficientnetv2_no_augmentation.py
```
- EfficientNetV2 - With Augmentation
```
python model_efficientnetv2_augmented.py
```
Monitor training output: Loss and progress will be displayed on the console.
Check model checkpoints: Trained models are saved in the corresponding ./models/ directory.

Model Architectures