-
-
Notifications
You must be signed in to change notification settings - Fork 16.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TensorFlow SegmentationModel support #9472
Conversation
@zldrobit I'm working on adding TF support for Segmentation models, and I had a question. In TFDetect we normalize the bounding box coordinates here: Lines 308 to 309 in 92b5242
And then we have to denormalize them here during inference. This is a problem now because I want to use DetectMultiBackend for ClassificationModel and SegmentationModel support in addition to DetectionModels, so the denormalization op will hurt ClassificationModels. Line 546 in 92b5242
Can we remove the normalize-denormalize op? Is it only there for quantization improvement? Are you sure it's helping the quantization? |
Signed-off-by: Glenn Jocher <glenn.jocher@ultralytics.com>
If YOLOv5 has to support detection model export for int8 TFLite models, the normalize/denormalize code has to be kept.
Yes. Removing the normalize/denormalize code does not affect export/inference for TFLite models on fp32/fp16 precision.
Yes, I could confirm that. For TF 2.4/2.5/2.6, removing the normalize/denormalize code, the accuracy drops drastically after int8 quantization. I also tested it with TF 2.9.2/2.10.0, int8 TFLite models without the normalization code have zero mAP. The reason of normalization is that TensorFlow keeps only one set of bias/multiplication factors in (de)quantization for tensor input/output. Thus, all input/output values of a tensor have to be normalized to the same range (e.g. 0-1). YOLOv5 currently concatenates bbox coordinates, bbox confidence and class probability into one tensor by |
@zldrobit I see. It's difficult then, not sure what to do. I was looking at this. Are we using per-axis or per-tensor quantization? If we moved to per-axis would this help? I've seen even with the current TFLite INT8 method we lose significant mAP vs FP16, which doesn't happen with CoreML. I don't have the validation results handy but I'll re-run them now and post. I think the drop may be about 20%. |
@zldrobit ok here's my test: PyTorch
CoreML
TFLite (TODO)!git clone https://github.com/ultralytics/yolov5 # clone
%cd yolov5
%pip install -qr requirements.txt # install
!python export.py --include tflite
!python export.py --include tflite --int8
!python val.py weights yolov5s-fp16.tflite --batch 1
val: data=data/coco128.yaml, weights=['yolov5s-fp16.tflite'], batch_size=1, imgsz=640, conf_thres=0.001, iou_thres=0.6, max_det=300, task=val, device=, workers=8, single_cls=False, augment=False, verbose=False, save_txt=False, save_hybrid=False, save_conf=False, save_json=False, project=runs/val, name=exp, exist_ok=False, half=False, dnn=False
YOLOv5 π v6.2-149-g77dcf55 Python-3.7.14 torch-1.12.1+cu113 CUDA:0 (Tesla V100-SXM2-16GB, 16160MiB)
Loading yolov5s-fp16.tflite for TensorFlow Lite inference...
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
Forcing --batch-size 1 square inference (1,3,640,640) for non-PyTorch models
val: Scanning '/content/datasets/coco128/labels/train2017.cache' images and labels... 126 found, 2 missing, 0 empty, 0 corrupt: 100% 128/128 [00:00<?, ?it/s]
Class Images Instances P R mAP50 mAP50-95: 100% 128/128 [00:40<00:00, 3.15it/s]
all 128 929 0.68 0.653 0.71 0.471
Speed: 0.4ms pre-process, 303.4ms inference, 1.6ms NMS per image at shape (1, 3, 640, 640)
Results saved to runs/val/exp2
val: data=data/coco128.yaml, weights=['yolov5s-int8.tflite'], batch_size=1, imgsz=640, conf_thres=0.001, iou_thres=0.6, max_det=300, task=val, device=, workers=8, single_cls=False, augment=False, verbose=False, save_txt=False, save_hybrid=False, save_conf=False, save_json=False, project=runs/val, name=exp, exist_ok=False, half=False, dnn=False
YOLOv5 π v6.2-149-g77dcf55 Python-3.7.14 torch-1.12.1+cu113 CUDA:0 (Tesla V100-SXM2-16GB, 16160MiB)
!python val.py weights yolov5s-int8.tflite --batch 1
Loading yolov5s-int8.tflite for TensorFlow Lite inference...
Forcing --batch-size 1 square inference (1,3,640,640) for non-PyTorch models
val: Scanning '/content/datasets/coco128/labels/train2017.cache' images and labels... 126 found, 2 missing, 0 empty, 0 corrupt: 100% 128/128 [00:00<?, ?it/s]
Class Images Instances P R mAP50 mAP50-95: 100% 128/128 [48:12<00:00, 22.60s/it]
all 128 929 0.709 0.58 0.68 0.425
Speed: 0.4ms pre-process, 22581.7ms inference, 1.6ms NMS per image at shape (1, 3, 640, 640)
Results saved to runs/val/exp3 |
TensorFlow may use per-axis quantization for weights, but it uses per-tensor quantization for computation input/output according to https://www.tensorflow.org/lite/performance/quantization_spec. I excerpt the paragraph as follows,
It's a limitation of TensorFlow, and we could not move the computation input/output to a per-axis manner.
According to Performance Evaluation of INT8 Quantized Inference on Mobile GPUs, |
@zldrobit understood, thanks for the analysis. I guess I'll leave the normalisation alone for now until I can find a better solution. For PyTorch model inference we can determine the type of model, i.e. ClassificationModel, SegmentationModel, DetectionModel, and use this to determine if we should de-normalize boxes, but if someone loads an exported TF model I'm not exactly sure how to do that since they are all TFModel types. Do you know if we can search the model for TFDetect or TFSegment classes that would confirm it's in need of denormalization? If they're missing then it's likely a classification model and we can skip the denormalization. |
@glenn-jocher The class information (e.g. TFDetect or TFSegment) could be saved in TF SavedModel format, just like a Pytorch model. The TF GraphDef (.pb) format does not hold class information. One can save class information in TFLite models' metadata (https://www.tensorflow.org/lite/models/convert/metadata). I am wondering if using a filename with the |
@zldrobit That's a great suggestion. Adding a naming convention like using a filename with the I'll explore this approach further and see how we can incorporate it into the model loading process. Thanks for the input! |
π οΈ PR Summary
Made with β€οΈ by Ultralytics Actions
π Summary
Improvements in TensorFlow model export and inference functionalities in the YOLOv5 repository.
π Key Changes
ci-testing.yml
to add a--hard-fail
threshold for segmentation model benchmarking.export.py
to handle TensorFlow SavedModel outputs more robustly and support models with variable output structures.common.py
andtf.py
inference code to improve the processing of TensorFlow model outputs and adapt to different types of model architectures.π― Purpose & Impact
--hard-fail
in the benchmark testing ensures that the segmentation models meet a minimum performance threshold, enhancing quality control.