This repository contains a PyTorch implementation of an enhanced Deep Attention Network (DAN) for facial expression recognition. While many approaches leverage attention for this task, this implementation introduces several key architectural improvements to boost performance.
Our model builds upon the common paradigm of using attention mechanisms for facial expression recognition but introduces a more robust and effective architecture. The key upgrades in this implementation are:
- Upgraded Backbone: The previous ResNet-18 backbone has been replaced with a modern
ConvNeXt Tinybackbone, providing a significant boost in feature extraction capabilities. - Multi-Head Cross-Attention: Instead of a single attention mechanism, our model uses multiple cross-attention heads. This allows the network to focus on different facial regions simultaneously, capturing a richer set of features for more accurate classification.
- Composite Loss Function: To further enhance performance, we employ a composite loss function:
- PartitionLoss: This novel loss function encourages diversity among the attention heads, ensuring that each head learns unique and complementary features.
- CenterLoss: This loss function improves the discriminative power of the learned features by minimizing intra-class variations.
These architectural enhancements have led to a 12% increase in accuracy compared to baseline implementations.
-
Clone the repository:
git clone https://github.com/yourusername/dan-improvement.git cd dan-improvement -
Install the required dependencies:
pip install torch torchvision pandas numpy tqdm Pillow
dan-improvement/
├── datasets/ # Dataset directory
├── networks/ # Model architecture definitions
│ └── dan.py # DAN model with CenterLoss and PartitionLoss
├── utils/ # Utility scripts
├── affectnet.py # Main training script for AffectNet
├── demo.py # Inference script for single images
├── evaluate_backbones.py # Script to compare different backbones
├── run_grad_cam.py # Grad-CAM visualization script
├── verify_dan.py # Verification script for the model
└── README.md # Project documentation
To train the model, you will need the AffectNet dataset. Once the dataset is in place, you can run the training script:
python affectnet.py --aff_path datasets/AffectNet/ --epochs 40 --batch_size 128To run inference on a single image, use the demo.py script. Please note that you need a trained model checkpoint.
python demo.py --image <path_to_your_image>While the current implementation is robust, there are several areas for future improvement:
- Command-Line Arguments: The current scripts use hardcoded paths. These will be replaced with command-line arguments for better flexibility.
- Modern Face Detector: The face detector in
demo.pywill be upgraded to a more modern, deep learning-based detector. - Unified Data Loading: The data loading logic will be unified and better documented.