The primary objective of this research is to bridge the capacity gap between a heavy Teacher model and a lightweight Student model without sacrificing accuracy to solve the problem. The proposed method integrates three complementary distillation modules: Smooth L1, SSIM, and Asymmetric Contrastive Loss (ACL). By utilizing a CBAM (Convolutional Block Attention Module), the model intelligently mitigates complex background constraints by focusing feature extraction solely on relevant crowd areas. The Asymmetric Projector is employed to align features from both architectures into a cohesive latent space. Experimental results on ShanghaiTech and UCF-QNRF datasets demonstrate that the 1/4 Student model achieves an optimal sweet spot. The proposed ASAC framework successfully reduces the parameters by over 90% and drastically lowers GFLOPs. Furthermore, it outperforms the Teacher model's accuracy. This superior performance is attributed to the integration of the Convolutional Block Attention Module (CBAM), which enables the model to focus on human head regions while effectively suppressing complex background noise.
The best trained teacher networks and the distilled student networks can be accessed at GoogleDrive. If you use this code and the released models for your research, please cite our paper:
@inproceedings{wicaksana2026,
title={ASAC: Attention-Based Structural Asymmetric Contrastive for Efficient Crowd Counting},
author={Saiful Irham Wicaksana, I Made Artha Agastya},
booktitle={JUTIF},
year={2026}
}
ShanghaiTech: Google Drive
UCF-QNRF: Link
We strongly recommend Anaconda/Minicoda as the virtual environment.
Python: 3.12.3
Use miniconda
$ conda create --name asac python==3.12.3
$ conda activate asac
$ pip install -r requirements.txt