Description
Search before asking
- I have searched the YOLOv5 issues and found no similar bug report.
YOLOv5 Component
Training, Export
Bug
If you use the leaky relu activation function (or a just a relu), specifying it in the .yaml, the training goes well, but the tflite exported model is broken:
Running:
python3 train.py --data coco.yaml --epochs 50 --weights '' --cfg ./hub/yolov5n-LeakyReLU.yaml --batch-size 204
where yolov5n-LeakyReLU.yaml is the same of yolov5s-LeakyReLU.yaml, with the difference:
width_multiple: 0.25 # layer channel multiple
The performance I get after training are the following, all good:
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.206
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.359
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.209
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.106
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.231
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.265
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.211
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.374
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.430
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.240
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.477
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.561
After exporting:
python3 export.py --weights runs/train/exp20/weights/best.pt --include tflite --int8
and testing the tflite model with :
python3 val.py --weights runs/train/exp20/weights/best-int8.tflite
I get these performances:
Class Images Instances P R mAP50 mAP50-95: 100%|██████████| 128/128 [00:16<00:00,7.54it/s]
all 128 929 0.104 0.0495 0.00417 0.000997
I know quantization should reduce the accuracy, but here it is breaking somehow the network.
Exporting to tflite in fp or onnx doesn't hurt the model.
Any idea what is going on?
In particular, by looking at the output I see that the output of the network that are for the width and height of the box, are always zero after conversion to tflite. The rest of the output seems okay.
Environment
Yolov5 latest
Ubuntu 22.04
python3.10
Nvidia A10G Driver Version: 535.129.03 CUDA Version: 12.2
tensorflow-cpu==2.15.0
torch==2.1.1
torchvision==0.16.1
Minimal Reproducible Example
( I did it on a yolov5n, but it is the same on "s")
python3 train.py --data coco.yaml --epochs 30 --weights '' --cfg ./hub/yolov5s-LeakyReLU.yaml --batch-size 128
(replace exp20 with your folder)
python3 export.py --weights runs/train/exp20/weights/best.pt --include tflite --int8
python3 val.py --weights runs/train/exp20/weights/best-int8.tflite
Additional
No response
Are you willing to submit a PR?
- Yes I'd like to help by submitting a PR!
Activity