Built for the Kaggle Facial Keypoints Detection competition. Implements both a PyTorch ResNet with custom NaN-aware loss and a Keras CNN with two-phase training, achieving RMSE 2.10 on the leaderboard.
| Model | Framework | Kaggle RMSE | Parameters | Strategy |
|---|---|---|---|---|
| ResNet | PyTorch | 2.10 | ~4.2M | Adam + StepLR + NaN-aware MSE |
| CNN | TensorFlow/Keras | 2.55 | ~1.5M | Two-phase: Adam → SGD + Huber |
Only ~2,140 of 7,049 training samples have all 15 keypoints labeled. Rather than discarding ~70% of the data, the custom MSELossIgnoreNan masks missing targets and computes gradients only on available labels:
class MSELossIgnoreNan(nn.Module):
def forward(self, pred, target):
mask = torch.isfinite(target)
count = mask.sum()
if count == 0:
return torch.tensor(0.0, requires_grad=True)
return ((pred[mask] - target[mask]) ** 2).sum() / count- Dual-framework architecture — PyTorch ResNet and Keras CNN sharing data loading and configuration
- NaN-aware loss — Custom
MSELossIgnoreNantrains on all 7,049 samples including partially-labeled ones - Two-phase CNN training — Adam for fast convergence, then SGD with ReduceLROnPlateau for fine-tuning
- Deep residual learning — 6-stage ResNet (32→512 channels) with batch normalization and skip connections
- Centralized YAML config — All hyperparameters in
config/default.yamlwith typed frozen dataclass validation - Unit tests — pytest suite covering models, loss functions, dataset utilities, and config loading
├── config/default.yaml # Hyperparameters for both models
├── keypoints/
│ ├── config.py # Frozen dataclasses + YAML loader
│ ├── models/
│ │ ├── cnn.py # Keras CNN (3 conv blocks + 2 FC layers)
│ │ └── resnet.py # PyTorch ResNet (12 residual blocks)
│ └── utils/
│ ├── dataset.py # Data loading, preprocessing, PyTorch Dataset
│ ├── losses.py # MSELossIgnoreNan
│ └── visualization.py # Keypoint plotting, training curves
├── tests/
│ └── test_models.py # Unit tests for models, loss, dataset, config
├── train_cnn.py # CNN training entry point
├── train_resnet.py # ResNet training entry point
├── predict.py # Inference + Kaggle submission
└── assets/ # Sample output visualizations
git clone https://github.com/allureking/FacialKeypointsDetection.git
cd FacialKeypointsDetection
# Install with both frameworks
pip install -e ".[all]"
# Or install for one framework only
pip install -e ".[pytorch]" # PyTorch ResNet only
pip install -e ".[tensorflow]" # Keras CNN onlyDownload the Kaggle dataset and place files in data/:
data/
├── training.csv
├── test.csv
└── IdLookupTable.csv
# Train ResNet (PyTorch)
python train_resnet.py --config config/default.yaml
# Train CNN (Keras/TensorFlow)
python train_cnn.py --config config/default.yaml# Generate Kaggle submission
python predict.py --model resnet --weights best_model.pth
python predict.py --model cnn --weights sgd_best.h5pytest tests/ -vInput (1×96×96)
└── Stem: Conv(1→32) + BN + ReLU + MaxPool
└── Stage 1: ResBlock×2 (32→32)
└── Stage 2: ResBlock×2 (32→64)
└── Stage 3: ResBlock×2 (64→128)
└── Stage 4: ResBlock×2 (128→256)
└── Stage 5: ResBlock×2 (256→512)
└── AdaptiveAvgPool → Linear(512→30)
Each residual block: x → Conv3×3 → BN → ReLU → Conv3×3 → BN → (+shortcut) → ReLU
Input (96×96×1)
└── Conv2D(32) → LeakyReLU → MaxPool → Dropout(0.05)
└── Conv2D(64) → LeakyReLU → MaxPool → Dropout(0.01)
└── Conv2D(128) → LeakyReLU → MaxPool → Dropout(0.15)
└── Dense(500) → Dense(500) → Dense(30)
Two-phase training: Adam (lr=5e-4) with Huber loss → SGD (lr=1e-3) with ReduceLROnPlateau.
All hyperparameters are centralized in config/default.yaml and loaded into frozen dataclasses:
resnet:
batch_size: 16
learning_rate: 0.0001
epochs: 100
patience: 5
step_lr_size: 5
step_lr_gamma: 0.1- He, K. et al. (2016). Deep Residual Learning for Image Recognition. CVPR. arXiv:1512.03385
- Kaggle. Facial Keypoints Detection. Competition Page
CNN implementation developed in collaboration with Felix-hyy for the USF 276DS course final project. ResNet architecture inspired by the Advanced Machine Learning course at USF.
MIT — see LICENSE for details.

