Open
Description
Feature type?
Algorithm request
A proposal draft
The proposal is to create a custom trainer extending from Ultralytics BaseTrainer within Nyuntam that can distill different models from the Ultralytics library, including YOLOv5, YOLOv8, and other variants. Model distillation is an important technique to compress larger models (teachers) into smaller, more efficient models (students), while retaining high performance. This can be especially useful when deploying models in resource-constrained environments like edge devices or mobile applications.
The trainer should:
- Support both teacher-student architectures for distillation.
- Allow flexibility in defining the distillation loss function (e.g., KL divergence, MSE).
- Be compatible with existing Ultralytics models, making it easy to swap in any YOLO model for distillation.
Additional context
YoloV8 might have differing feature channels between various variants, I suggest looking at MMYOLO RTMDet Distillation , where they have combined CWD and PKD to account for varying dimensionality. You can also read their papers: PKD and CWD