Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add EdgeTPU support #3630

Merged
merged 107 commits into from
Dec 31, 2021
Merged

Add EdgeTPU support #3630

merged 107 commits into from
Dec 31, 2021

Conversation

zldrobit
Copy link
Contributor

@zldrobit zldrobit commented Jun 16, 2021

tf-edgetpu branch is created from tf-android-tfl-detect branch by deleting the android directory.
tf-android-tfl-detect is deprecated since tf-android branch could be used for deploying YOLOv5 models on Android devices.

deprecated EdgeTPU compiler could not convert the detection box reconstruction part, so `--no-tfl-detect` is mandatory while using `tf.py`. With `--tf-raw-resize`, resize ops can be mapped to EdgeTPU and the inference time reduces from ~60ms to ~50ms (with an Intel i9 3.3G CPU and a USB-connected EdgeTPU). By substituting `Focus` layers with `Conv` layers, the inference time decreases to ~30ms with v6.0 models.

Before export, install edgetpu-compiler (https://coral.ai/software/#debian-packages), and install pycoral (https://coral.ai/software/#pycoral-api).

Export EdgeTPU models using:

python export.py --weights yolov5s.pt --include edgetpu --img 320 --data data/coco128.yaml

--data is used for calibration in int8 quantization.

Detect objects using:

python detect.py --weights yolov5s-int8_edgetpu.tflite --img 320 --data data/coco128.yaml

--data is used to show real class names.

A colab note book for exporting Edge TPU models provided by @phodgers
# Install the Compiler
! curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add -
! echo "deb https://packages.cloud.google.com/apt coral-edgetpu-stable main" | sudo tee /etc/apt/sources.list.d/coral-edgetpu.list
! sudo apt-get update
! sudo apt-get install edgetpu-compiler	

# Install Branch https://github.com/zldrobit/yolov5/tree/tf-edgetpu 
!git clone https://github.com/zldrobit/yolov5.git
%cd yolov5
!git checkout tf-edgetpu
%pip install -qr requirements.txt
from yolov5 import utils
display = utils.notebook_init()  # checks

# Export file Compiled for Coral Edge TPU
%pip install --upgrade flatbuffers==1.12 # downgrade from v2 to v1.12
!python export.py --weights yolov5s.pt --include edgetpu --img 320 --data data/coco128.yaml
On an Intel i9 3.3G CPU and a USB-connected EdgeTPU, the inference speeds of different input resolutions are as follows:
input resolution infercence time (batch 1)
320 31.7 ms
448 74.3 ms
576 155 ms
640 271.6 ms

EDIT:

  • Merge master, so Edge TPU models can be exported with export.py and validated with detect.py
  • Add --data option to show real class names in detection.
  • 640x640 input resolution is supported.
  • Edge TPU models contain bbox reconstruction, and this feature is not backward-compatible. Edge TPU models have to be re-exported!

TODO:

  • Add TensorFlow Raw Resize back Edge TPU compiler v16 supports native Resize ops.

🛠️ PR Summary

Made with ❤️ by Ultralytics Actions

🌟 Summary

Enhanced model loading and export functionality in YOLOv5.

📊 Key Changes

  • Added data parameter to load custom class names from a .yaml file during model initialization.
  • Introduced Edge TPU export support to convert models for use with Google's Coral devices.
  • Integrated the passing of the data argument to backend detection methods for model initialization, allowing custom class names to be used across different scripts (detect.py, val.py).

🎯 Purpose & Impact

  • 🎨 Allows users to easily load custom class names during model inference, which enhances usability for different datasets.
  • 🚀 Supports conversion to Edge TPU format, extending the range of devices YOLOv5 can run on, particularly those designed for high-speed, low-power on-device ML inference.
  • 🤝 Ensures consistency of class names throughout various scripts -- making it easier for users to work with different datasets and maintain the integrity of their model's output across use cases.

zldrobit and others added 30 commits November 28, 2020 13:53
TensorFlow 2.3.1 -> 2.4.0 to avoid int8 quantization error
Move C3 from models.experimental to models.common
@keesschollaart81
Copy link

Poah! Cheers to this one @zldrobit @glenn-jocher, great effort and wonderful to have YoloV5 on TF Lite ánd EdgeTPU! Thank you and happy 2022 🍻!

@glenn-jocher
Copy link
Member

glenn-jocher commented Jan 2, 2022

@zldrobit I've added EdgeTPU documentation in the comments in #6151 and updated the Export tutorial table in https://docs.ultralytics.com/yolov5/tutorials/model_export

@zldrobit
Copy link
Contributor Author

zldrobit commented Jan 2, 2022

tensorflow.tflite would reduce the dependency requirements also right? Does it have a performance impact? What's the benefit of tflite_runtime vs tensorflow.tflite? Is it just a lighter deployment package?

@glenn-jocher tflite_runtime is a separate build of tflite from Tensorflow. It only contains the code to run tflite and has a significant smaller package size. Compared with tensorflow.tflite, it has a shorter load time. The inference latencies between them are the same.

@glenn-jocher @keesschollaart81 Happy new year!

@glenn-jocher
Copy link
Member

@glenn-jocher tflite_runtime is a separate build of tflite from Tensorflow. It only contains the code to run tflite and has a significant smaller package size. Compared with tensorflow.tflite, it has a shorter load time. The inference latencies between them are the same.

@zldrobit ok got it, thanks! Yes that makes sense then to use tf.lite.Interpreter for TFLite inference and tflite_runtime for Edge TPU, I guess we should leave as is. If we wanted to make a change perhaps we could fallback to tf.lite.Interpreter if tflite_runtime import fails, but I'm not sure if this is necessary.

@zldrobit
Copy link
Contributor Author

zldrobit commented Jan 5, 2022

@glenn-jocher Neither am I. On the other hand, users could use https://github.com/jveitchmichaelis/edgetpu-yolo to deploy Edge TPU models after conversion and avoid the segfault.

@dariogonle
Copy link

dariogonle commented Jan 7, 2022

Hi @glenn-jocher and @zldrobit, have you try it on Google Coral Dev Board? I'm trying and I'm having some trouble (see #6234). At the end I managed to run it with https://github.com/jveitchmichaelis/edgetpu-yolo
image

But not with the YOLOv5's detect.py script. I think it my be because with the YOLOv5's detect script it is necessary to import torch, but it is not available for the Coral.
image

I've also seen that he inference times are:
image

But with an input size of 320, I'm getting 58ms (an not 31ms). Is this because you are not running it on the Google Coral Board and you runned it with Coral USB Accelerator? Are that times for model S?
image

EDIT:
Can this be use in other plataforms suchs as Jetson Nano? I suppose that not, because it has not TPU.

@keesschollaart81
Copy link

@zldrobit why did you remove the -a flag to the edgetpu_compiler in zldrobit@177f01e#diff-74fc7d08922278c1afa6f5b36d1450c965185dbcbdb77a99e432b0f4edaaade2L246

When running the edgetpu_compiler with the -a much more operations run on the TPU (left is with -a):
image
In my few experiments it seems a few percentages faster (which obviously depends on your CPU), for me 512x512 input it was like 75 vs 83ms.

@keesschollaart81
Copy link

Second question about the latest version (with detect included), I see you were able to export to 576 and 640 input resolution, for me the edgetpu_compiler throws an error for that size (and up):

~/edgetpu_compiler -s ./best-int8.tflite 
Edge TPU Compiler version 16.0.384591198
Started a compilation timeout timer of 180 seconds.
Compilation child process completed within timeout period.
Compilation failed! 

512px is the highest I can get... Is this specific to my RAM/Machine? Or should I address this question to the Tensorflow team (as the error is quite vague). This is the model that I'm trying to export link with this command:
python export.py --weights best.pt --img 640 --include edgetpu

@NandhiniN85
Copy link

@zldrobit I was trying to convert the model yolov5n to Tflite format and then using the edgetpu compiler with -a experimental flag compiled the tflite model. But not all the ops are mapped to TPU and even it varies when we change the input size. For instance if I use 416 size, more operations are not mapped whereas 512 only one op is not mapped to the TPU.

Below are the commands I followed,
!python /content/yolov5/export.py --weights /content/yolov5/runs/train/exp2/weights/best.pt --img 416 --include tflite --int8 --data /content/yolov5/data.yaml

!python /content/yolov5/export.py --weights /content/yolov5/runs/train/exp/weights/best.pt --img 512 --include tflite --int8 --data /content/yolov5/data.yaml

!edgetpu_compiler --min_runtime_version 13 -s -a -d /content/yolov5/runs/train/exp2/weights/best-int8.tflite

best-int8_edgetpu_input512.log
best-int8_edgetpu_input416.log

The quantized models are present here

@zldrobit
Copy link
Contributor Author

zldrobit commented Jan 10, 2022

@dariogonle I haven't tried it on Coral Dev Board. You're right, and detetect.py uses PyTorch for inference. The detection speed also depends on the CPU, because some ops run outside of TPU. I tested the time with the yolov5s model. Jetson Nano has no Edge TPU, so it cannot use the Edge TPU model. For Jetson nano, you could try exporting a tensorrt model with python export.py --weights yolov5s.pt --include engine (https://docs.ultralytics.com/yolov5/tutorials/model_export).

@zldrobit
Copy link
Contributor Author

zldrobit commented Jan 10, 2022

@keesschollaart81 I removed the -a option to avoid Edge TPU compilation error. With the -a option, I cannot compile the yolov5s model even with 320x320 input. With the v16 Edge TPU compiler and without the -a option, I can export a yolov5s model up to 640x640 input.

As I understand, the limitation is in the Edge TPU and its compiler. I downloaded and inspected your model, and it has only one class. I succeed in compiling a 512 input model and fail to compile a 640 input model (with -s and -a options). I could successfully compile the model using

edgetpu_compiler -s -i "model/tf_detect/Reshape,model/tf_detect/Reshape_2,model/tf_detect/Reshape_4" best-int8.tflite

(https://coral.ai/docs/edgetpu/compiler/#usage)

image

Using the -i option tells Edge TPU compiler to separate the detect layer in compilation so that the reshape and transpose ops of large tensors are not contained in the Edge TPU custom ops. This does help the compilation yet will generate a larger model if the reshape and transpose ops in the detect layer is small enough to be embedded in an Edge TPU custom op, e.g. with 320 input.

EDIT:
Wow, I found using the -a option generates a more compact Edge TPU model

edgetpu_compiler -s -a -i "model/tf_detect/Reshape,model/tf_detect/Reshape_2,model/tf_detect/Reshape_4" best-int8.tflite

image

@zldrobit
Copy link
Contributor Author

@NandhiniN85 I used Edge TPU compiler several times with -s, -a, and -d options for the yolov5s model, and it only produces sub-optimal solutions. It's hard to inspect the Edge TPU models. Could you share your TFLite int8 models (python export.py --include tflite --int8) before compilation, so I could do some experiments for the Edge TPU compiler?

@NandhiniN85
Copy link

!edgetpu_compiler --min_runtime_version 13 -s -a -d /content/yolov5/runs/train/exp2/weights/best-int8.tflite

@zldrobit I have placed the Tflite int8 models before edge compilation https://drive.google.com/drive/folders/1CVezjb_kEgOmyrKFz9yJlyN6hY2v3XLO. Kindly check it. Thanks!

@zldrobit
Copy link
Contributor Author

@NandhiniN85 I could manually convert your 416 input model to an Edge TPU model with only one op operating on CPU with the log:

Edge TPU Compiler version 16.0.384591198                                                                                                                   
Started a compilation timeout timer of 180 seconds.                                                                                                        
                                                                                                                                                           
Model compiled successfully in 1753 ms.                                                                                                                    
                                                                                                                                                           
Input model: best-int8_input416.tflite                                                                                                                     
Input size: 1.94MiB                                                                                                                                        
Output model: best-int8_input416_edgetpu.tflite                                                                                                            
Output size: 2.54MiB                                                                                                                                       
On-chip memory used for caching model parameters: 1.88MiB                                                                                                  
On-chip memory remaining for caching model parameters: 5.14MiB                                                                                             
Off-chip memory used for streaming uncached model parameters: 41.94KiB                                                                                     
Number of Edge TPU subgraphs: 2                                                                                                                            
Total number of operations: 283                                                                                                                            
Operation log: best-int8_input416_edgetpu.log                                                                                                              
                                                                                                                                                           
Model successfully compiled but not all operations are supported by the Edge TPU. A percentage of the model will instead run on the CPU, which is slower. I
f possible, consider updating your model to use only operations supported by the Edge TPU. For details, visit g.co/coral/model-reqs.                       
Number of operations that will run on Edge TPU: 282                                                                                                        
Number of operations that will run on CPU: 1                                                                                                               
                                                                                                                                                           
Operator                       Count      Status                                                                                                           
                                                                                                                                                           
CONV_2D                        60         Mapped to Edge TPU                                                                                               
MUL                            78         Mapped to Edge TPU                                                                                               
ADD                            10         Mapped to Edge TPU                                                                                               
RESHAPE                        6          Mapped to Edge TPU                                                                                               
SUB                            3          Mapped to Edge TPU                                                                                               
PAD                            7          Mapped to Edge TPU                                                                                               
LOGISTIC                       60         Mapped to Edge TPU                                                                                               
STRIDED_SLICE                  9          Mapped to Edge TPU                                                                                               
QUANTIZE                       25         Mapped to Edge TPU                                                                                               
MAX_POOL_2D                    3          Mapped to Edge TPU                                                                                               
TRANSPOSE                      2          Mapped to Edge TPU                                                                                               
TRANSPOSE                      1          More than one subgraph is not supported                                                                          
CONCATENATION                  17         Mapped to Edge TPU                                                                                               
RESIZE_NEAREST_NEIGHBOR        2          Mapped to Edge TPU                                                                                               
Compilation child process completed within timeout period.                                                                                                 
Compilation succeeded!                                                                                                                                     

The command for compilation is

edgetpu_compiler -s -a -i "model/tf_detect/Sigmoid;model/tf_detect/transpose;model/tf_detect/Reshape1" best-int8_input416.tflite

image

The intermediate tensors (-i) are manually copied from that of the 512 input Edge TPU model, and it's hard to say if there is any programmatical way to find them.

@NandhiniN85
Copy link

Thanks @zldrobit for the update. "The intermediate tensors (-i) are manually copied from that of the 512 input Edge TPU model", does the weight parameters correspond to 416 or 512 in this case?

@zldrobit
Copy link
Contributor Author

@NandhiniN85 The weight parameters correspond to the 416 input model. I should have been clearer: the names of the intermediate tensors are copied from that of 512 input Edge TPU model, i.e. "model/tf_detect/Sigmoid;model/tf_detect/transpose;model/tf_detect/Reshape1".

@NandhiniN85
Copy link

@zldrobit Thanks a lot!

@dariogonle
Copy link

dariogonle commented Jan 12, 2022

edgetpu_compiler -s -a -i "model/tf_detect/Sigmoid;model/tf_detect/transpose;model/tf_detect/Reshape1" best-int8_input416.tflite

@zldrobit I'm trying to compile yolov5n weights with -a -i but I get the following error:

image

The command I used is:

edgetpu_compiler -s -a -i "model/tf_detect/Reshape,model/tf_detect/Reshape_2,model/tf_detect/Reshape_4" yolov5n-320-int8.tflite

First, I convert the yolov5n.pt model to tflite

python3 export.py --weights yolov5n.pt --img-size 320 --include tflite --int8 --data data/coco128.yaml

Do you know what's happening?

I can compile it with

edgetpu_compiler -s yolov5n-320-int8.tflite

But there are mmany operation not mapped to the TPU

image

@zldrobit
Copy link
Contributor Author

@dariogonle It seems that adding the -a option do raise the difficulty for compilation, and the -a option is an experimental feature (see edgetpu_compiler --help).

-a, --enable_multiple_subgraphs
Enable multiple(all) subgraphs (experimental flag).

This is not a stable feature provided by the compiler, though I have seen some smaller models which have less branches, layers, number of classes and/or input size, can be compiled with the -a options, e.g. 512-input yolov5n model with 3 classes. Maybe you could train a new yolov5n model with a less number of classes and try compilation with the -a option again.

@jveitchmichaelis
Copy link
Contributor

@dariogonle while the Jetson doesn't have a TPU, you can install one. I used one for benchmarking and testing in my repo. The trouble is that right now the M.2 cards are out of stock everywhere due to the global chip shortage.

See https://github.com/jveitchmichaelis/edgetpu-yolo/blob/main/hardware.md for info on the hardware setup, if you do have a Jetson and want to install a TPU on it.

bfineran added a commit to neuralmagic/yolov5 that referenced this pull request Apr 8, 2022
* Fix TensorRT potential unordered binding addresses (ultralytics#5826)

* feat: change file suffix in pythonic way

* fix: enforce binding addresses order

* fix: enforce binding addresses order

* Handle non-TTY `wandb.errors.UsageError` (ultralytics#5839)

* `try: except (..., wandb.errors.UsageError)`

* bug fix

* Avoid inplace modifying`imgs` in `LoadStreams` (ultralytics#5850)

When OpenCV retrieving image fail, original code would modify source images **inplace**, which may result in plotting bounding boxes on a black image. That is, before inference, source image `im0s[i]` is OK, but after inference before `Process predictions`,  `im0s[i]` may have been changed.

* Update `LoadImages` `ret_val=False` handling (ultralytics#5852)

Video errors may occur.

* Update val.py (ultralytics#5838)

* Update val.py

Solving Non-ASCII character '\xf0' error during runtime

* Update val.py

Co-authored-by: Glenn Jocher <glenn.jocher@ultralytics.com>

* Update TorchScript suffix to `*.torchscript` (ultralytics#5856)

* Add `--workers 8` argument to val.py (ultralytics#5857)

* Update val.py

Add an option to choose number of workers if not called by train.py

* Update comment

* 120 char line width

Co-authored-by: Glenn Jocher <glenn.jocher@ultralytics.com>

* Update `plot_lr_scheduler()` (ultralytics#5864)

shallow copy modify originals

* Update `nl` after `cutout()` (ultralytics#5873)

* `AutoShape()` models as `DetectMultiBackend()` instances (ultralytics#5845)

* Update AutoShape()

* autodownload ONNX

* Cleanup

* Finish updates

* Add Usage

* Update

* Update

* Update

* Update

* Update

* Update

* Update

* Update

* Update

* Update

* Update

* Update

* Update

* Update

* Update

* Update

* Update

* fix device

* Update hubconf.py

* Update common.py

* smart param selection

* autodownload all formats

* autopad only pytorch models

* new_shape edits

* stride tensor fix

* Cleanup

* Single-command multiple-model export (ultralytics#5882)

* Export multiple models in series

Export multiple models in series by adding additional `*.pt` files to the `--weights` argument, i.e.:

```bash
python export.py --include tflite --weights yolov5n.pt  # export 1 model
python export.py --include tflite --weights yolov5n.pt yolov5s.pt yolov5m.pt yolov5l.pt yolov5x.pt  # export 5 models
```

* Update export.py

* Update README.md

* `Detections().tolist()` explicit argument fix (ultralytics#5907)

debugged for missigned Detections attributes

* Update wandb_utils.py (ultralytics#5908)

* Add *.engine (TensorRT extensions) to .gitignore (ultralytics#5911)

* Add *.engine (TensorRT extensions) to .gitignore

* Update .dockerignore

Co-authored-by: Glenn Jocher <glenn.jocher@ultralytics.com>

* Add ONNX inference providers (ultralytics#5918)

* Add ONNX inference providers

Fix for ultralytics#5916

* Update common.py

* Add hardware checks to `notebook_init()` (ultralytics#5919)

* Update notebook

* Update notebook

* update string

* update string

* Updates

* Updates

* Updates

* check both ipython and psutil

* remove sample_data if is_colab

* cleanup

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Revert "Update `plot_lr_scheduler()` (ultralytics#5864)" (ultralytics#5920)

This reverts commit 360eec6.

* Absolute '/content/sample_data' (ultralytics#5922)

* Default PyTorch Hub to `autocast(False)` (ultralytics#5926)

* Fix ONNX opset inconsistency with parseargs and run args (ultralytics#5937)

* Make `select_device()` robust to `batch_size=-1` (ultralytics#5940)

* Find out a bug. When set batch_size = -1 to use the autobatch.

reproduce:

* Fix type conflict

Co-authored-by: Glenn Jocher <glenn.jocher@ultralytics.com>

* fix .gitignore not tracking existing folders (ultralytics#5946)

* fix .gitignore not tracking existing folders

fix .gitignore so that the files that are in the repository are actually being tracked.

Everything in the data/ folder is ignored, which also means the subdirectories are ignored. Fix so that the subdirectories and their contents are still tracked.

* Remove data/trainings

Co-authored-by: Glenn Jocher <glenn.jocher@ultralytics.com>

* Update `strip_optimizer()` (ultralytics#5949)

Replace 'training_result' with 'best_fitness' in strip_optimizer() to match key with ckpt from train.py

* Add nms and agnostic nms to export.py (ultralytics#5938)

* add nms and agnostic nms to export.py

* fix agnostic implies nms

* reorder args to group TF args

* PEP8 120 char

Co-authored-by: Glenn Jocher <glenn.jocher@ultralytics.com>

* Refactor NUM_THREADS (ultralytics#5954)

* Fix Detections class `tolist()` method (ultralytics#5945)

* Fix tolist() to add the file for each Detection

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix PEP8 requirement for 2 spaces before an inline comment

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Cleanup

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Glenn Jocher <glenn.jocher@ultralytics.com>

* Fix `imgsz` bug (ultralytics#5948)

* fix imgsz bug

* Update detect.py

Co-authored-by: Glenn Jocher <glenn.jocher@ultralytics.com>

* `pretrained=False` fix (ultralytics#5966)

* `pretriained=False` fix

Fix for ultralytics#5964

* CI speed improvement

* make parameter ignore epochs (ultralytics#5972)

* make parameter ignore epochs

ignore epochs functionality add to prevent spikes at the beginning when fitness spikes and decreases after.
Discussed at ultralytics#5971

* Update train.py

Co-authored-by: Glenn Jocher <glenn.jocher@ultralytics.com>

* YOLOv5s6 params and FLOPs fix (ultralytics#5977)

* Update callbacks.py with `__init__()` (ultralytics#5979)

Add __init__() function.

* Increase `ar_thr` from 20 to 100 for better detection on slender (high aspect ratio) objects (ultralytics#5556)

* Making `ar_thr` available as a hyperparameter

* Disabling ar_thr as hyperparameter and computing from the dataset instead

* Fixing bug in ar_thr computation

* Fix `ar_thr` to 100

* Allow `--weights URL` (ultralytics#5991)

* Recommend `jar xf file.zip` for zips (ultralytics#5993)

* *.torchscript inference `self.jit` fix (ultralytics#6007)

* Check TensorRT>=8.0.0 version (ultralytics#6021)

* Check TensorRT>=8.0.0 version

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Multi-layer capable `--freeze` argument (ultralytics#6019)

* support specfiy multiple frozen layers

* fix bug

* Cleanup Freeze section

* Cleanup argument

Co-authored-by: Glenn Jocher <glenn.jocher@ultralytics.com>

* train -> val comment fix (ultralytics#6024)

* Add dataset source citations (ultralytics#6032)

* Kaggle `LOGGER` fix (ultralytics#6041)

* Simplify `set_logging()` indexing (ultralytics#6042)

* `--freeze` fix (ultralytics#6044)

Fix for ultralytics#6038

* OpenVINO Export (ultralytics#6057)

* OpenVINO export

* Remove timeout

* Add 3 files

* str

* Constrain opset to 12

* Default ONNX opset to 12

* Make dir

* Make dir

* Cleanup

* Cleanup

* check_requirements(('openvino-dev',))

* Reduce G/D/CIoU logic operations (ultralytics#6074)

Consider that the default value is CIOU,adjust the order of judgment could reduce the number of judgments.
And “elif CIoU:” didn't need 'if'.

Co-authored-by: 李杰 <360751194@qq.comqq.com>

* Init tensor directly on device (ultralytics#6068)

Slightly more efficient than .to(device)

* W&B: track batch size after autobatch (ultralytics#6039)

* track batch size after autobatch

* remove redundant import

* Update __init__.py

* Update __init__.py

Co-authored-by: Glenn Jocher <glenn.jocher@ultralytics.com>

* W&B: Log best results after training ends (ultralytics#6120)

* log best.pt metrics at train end

* update

* Update __init__.py

Co-authored-by: Glenn Jocher <glenn.jocher@ultralytics.com>

* Log best results (ultralytics#6085)

* log best result in summary

* comment added

* add space for `flake8`

* log `best/epoch`

* fix `dimension` for epoch

ValueError: all the input arrays must have same number of dimensions

* log `best/` in `utils.logger.__init__`

* fix pre-commit

1. missing whitespace around operator
2.  over-indented

* Refactor/reduce G/C/D/IoU `if: else` statements (ultralytics#6087)

* Refactor the code to reduece else

* Update metrics.py

* Cleanup

Co-authored-by: Cmos <gen.chen@ubisoft.com>
Co-authored-by: Glenn Jocher <glenn.jocher@ultralytics.com>

* Add EdgeTPU support (ultralytics#3630)

* Add models/tf.py for TensorFlow and TFLite export

* Set auto=False for int8 calibration

* Update requirements.txt for TensorFlow and TFLite export

* Read anchors directly from PyTorch weights

* Add --tf-nms to append NMS in TensorFlow SavedModel and GraphDef export

* Remove check_anchor_order, check_file, set_logging from import

* Reformat code and optimize imports

* Autodownload model and check cfg

* update --source path, img-size to 320, single output

* Adjust representative_dataset

* Put representative dataset in tfl_int8 block

* detect.py TF inference

* weights to string

* weights to string

* cleanup tf.py

* Add --dynamic-batch-size

* Add xywh normalization to reduce calibration error

* Update requirements.txt

TensorFlow 2.3.1 -> 2.4.0 to avoid int8 quantization error

* Fix imports

Move C3 from models.experimental to models.common

* Add models/tf.py for TensorFlow and TFLite export

* Set auto=False for int8 calibration

* Update requirements.txt for TensorFlow and TFLite export

* Read anchors directly from PyTorch weights

* Add --tf-nms to append NMS in TensorFlow SavedModel and GraphDef export

* Remove check_anchor_order, check_file, set_logging from import

* Reformat code and optimize imports

* Autodownload model and check cfg

* update --source path, img-size to 320, single output

* Adjust representative_dataset

* detect.py TF inference

* Put representative dataset in tfl_int8 block

* weights to string

* weights to string

* cleanup tf.py

* Add --dynamic-batch-size

* Add xywh normalization to reduce calibration error

* Update requirements.txt

TensorFlow 2.3.1 -> 2.4.0 to avoid int8 quantization error

* Fix imports

Move C3 from models.experimental to models.common

* implement C3() and SiLU()

* Add TensorFlow and TFLite Detection

* Add --tfl-detect for TFLite Detection

* Add int8 quantized TFLite inference in detect.py

* Add --edgetpu for Edge TPU detection

* Fix --img-size to add rectangle TensorFlow and TFLite input

* Add --no-tf-nms to detect objects using models combined with TensorFlow NMS

* Fix --img-size list type input

* Update README.md

* Add Android project for TFLite inference

* Upgrade TensorFlow v2.3.1 -> v2.4.0

* Disable normalization of xywh

* Rewrite names init in detect.py

* Change input resolution 640 -> 320 on Android

* Disable NNAPI

* Update README.me --img 640 -> 320

* Update README.me for Edge TPU

* Update README.md

* Fix reshape dim to support dynamic batching

* Fix reshape dim to support dynamic batching

* Add epsilon argument in tf_BN, which is different between TF and PT

* Set stride to None if not using PyTorch, and do not warmup without PyTorch

* Add list support in check_img_size()

* Add list input support in detect.py

* sys.path.append('./') to run from yolov5/

* Add int8 quantization support for TensorFlow 2.5

* Add get_coco128.sh

* Remove --no-tfl-detect in models/tf.py (Use tf-android-tfl-detect branch for EdgeTPU)

* Update requirements.txt

* Replace torch.load() with attempt_load()

* Update requirements.txt

* Add --tf-raw-resize to set half_pixel_centers=False

* Remove android directory

* Update README.md

* Update README.md

* Add multiple OS support for EdgeTPU detection

* Fix export and detect

* Export 3 YOLO heads with Edge TPU models

* Remove xywh denormalization with Edge TPU models in detect.py

* Fix saved_model and pb detect error

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix pre-commit.ci failure

* Add edgetpu in export.py docstring

* Fix Edge TPU model detection exported by TF 2.7

* Add class names for TF/TFLite in DetectMultibackend

* Fix assignment with nl in TFLite Detection

* Add check when getting Edge TPU compiler version

* Add UTF-8 encoding in opening --data file for Windows

* Remove redundant TensorFlow import

* Add Edge TPU in export.py's docstring

* Add the detect layer in Edge TPU model conversion

* Default `dnn=False`

* Cleanup data.yaml loading

* Update detect.py

* Update val.py

* Comments and generalize data.yaml names

Co-authored-by: Glenn Jocher <glenn.jocher@ultralytics.com>
Co-authored-by: unknown <fangjiacong@ut.cn>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Enable AdamW optimizer (ultralytics#6152)

* Update export format docstrings (ultralytics#6151)

* Update export documentation

* Cleanup

* Update export.py

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update README.md

* Update README.md

* Update README.md

* Update train.py

* Update train.py

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Update greetings.yml (ultralytics#6165)

* [pre-commit.ci] pre-commit suggestions (ultralytics#6177)

updates:
- [github.com/pre-commit/pre-commit-hooks: v4.0.1 → v4.1.0](pre-commit/pre-commit-hooks@v4.0.1...v4.1.0)
- [github.com/asottile/pyupgrade: v2.23.1 → v2.31.0](asottile/pyupgrade@v2.23.1...v2.31.0)
- [github.com/PyCQA/isort: 5.9.3 → 5.10.1](PyCQA/isort@5.9.3...5.10.1)
- [github.com/PyCQA/flake8: 3.9.2 → 4.0.1](PyCQA/flake8@3.9.2...4.0.1)

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Update NMS `max_wh=7680` for 8k images (ultralytics#6178)

* Add OpenVINO inference (ultralytics#6179)

* Ignore `*_openvino_model/` dir (ultralytics#6180)

* Global export format sort (ultralytics#6182)

* Global export sort

* Cleanup

* Fix TorchScript on mobile export (ultralytics#6183)

* fix export of TorchScript on mobile

* Cleanup

Co-authored-by: yinrong <yinrong@xiaomi.com>
Co-authored-by: Glenn Jocher <glenn.jocher@ultralytics.com>

* TensorRT 7 `anchor_grid` compatibility fix (ultralytics#6185)

* fix: TensorRT 7 incompatiable

* Add comment

* Add if: else and comment

Co-authored-by: Glenn Jocher <glenn.jocher@ultralytics.com>

* Add `tensorrt>=7.0.0` checks (ultralytics#6193)

* Add `tensorrt>=7.0.0` checks

* Update export.py

* Update common.py

* Update export.py

* Add CoreML inference (ultralytics#6195)

* Add Apple CoreML inference

* Cleanup

* Fix `nan`-robust stream FPS (ultralytics#6198)

Fix for Webcam stop working suddenly (Issue ultralytics#6197)

* Edge TPU compiler comment (ultralytics#6196)

* Edge TPU compiler comment

* 7 to 8 fix

* TFLite `--int8` 'flatbuffers==1.12' fix (ultralytics#6216)

* TFLite `--int8` 'flatbuffers==1.12' fix

Temporary workaround for TFLite INT8 export.

* Update export.py

* Update export.py

* TFLite `--int8` 'flatbuffers==1.12' fix 2 (ultralytics#6217)

* TFLite `--int8` 'flatbuffers==1.12' fix 2

Reorganizes ultralytics#6216 fix to update before `tensorflow` import so no restart required.

* Update export.py

* Add `edgetpu_compiler` checks (ultralytics#6218)

* Add `edgetpu_compiler` checks

* Update export.py

* Update export.py

* Update export.py

* Update export.py

* Update export.py

* Update export.py

* Attempt `edgetpu-compiler` autoinstall (ultralytics#6223)

* Attempt `edgetpu-compiler` autoinstall

Attempt to install edgetpu-compiler dependency if missing on Linux.

* Update export.py

* Update export.py

* Update README speed reproduction command (ultralytics#6228)

* Update P2-P7 `models/hub` variants (ultralytics#6230)

* Update p2-p7 `models/hub` variants

* Update common.py

* AutoAnchor camelcase corrections

* TensorRT 7 export fix (ultralytics#6235)

* Fix `cmd` string on `tfjs` export (ultralytics#6243)

* Fix cmd string on tfjs export

* Cleanup

Co-authored-by: Glenn Jocher <glenn.jocher@ultralytics.com>

* TensorRT pip install

* Enable ONNX `--half` FP16 inference (ultralytics#6268)

* Enable ONNX ``--half` FP16 inference

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Update `export.py` with Detect, Validate usages (ultralytics#6280)

* Add `is_kaggle()` function (ultralytics#6285)

* Add `is_kaggle()` function

Return True if environment is Kaggle Notebook.

* Remove root loggers only if is_kaggle() == True

* Update general.py

* Fix `device` count check (ultralytics#6290)

* Fix device count check()

* Update torch_utils.py

* Update torch_utils.py

* Update hubconf.py

* Fixing bug multi-gpu training (ultralytics#6299)

* Fixing bug multi-gpu training

This solves this issue: ultralytics#6297 (comment)

* Update torch_utils.py for pep8

* `select_device()` cleanup (ultralytics#6302)

* `select_device()` cleanup

* Update torch_utils.py

* Update torch_utils.py

* Update torch_utils.py

* Update torch_utils.py

* Update torch_utils.py

* Fix `train.py` parameter groups desc error (ultralytics#6318)

* Fix `train.py` parameter groups desc error

* Cleanup

Co-authored-by: Glenn Jocher <glenn.jocher@ultralytics.com>

* Remove `dataset_stats()` autodownload capability (ultralytics#6303)

* Remove `dataset_stats()` autodownload capability

@kalenmike security update per Slack convo

* Update datasets.py

* Console corrupted -> corrupt (ultralytics#6338)

* Console corrupted -> corrupt 

Minor style changes.

* Update export.py

* TensorRT `assert im.device.type != 'cpu'` on export (ultralytics#6340)

* TensorRT `assert im.device.type != 'cpu'` on export

* Update export.py

* `export.py` return exported files/dirs (ultralytics#6343)

* `export.py` return exported files/dirs

* Path to str

* Created using Colaboratory

* `export.py` automatic `forward_export` (ultralytics#6352)

* `export.py` automatic `forward_export`

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* New environment variable `VERBOSE` (ultralytics#6353)

New environment variable `VERBOSE`

* Reuse `de_parallel()` rather than `is_parallel()` (ultralytics#6354)

* `DEVICE_COUNT` instead of `WORLD_SIZE` to calculate `nw` (ultralytics#6324)

* Flush callbacks when on `--evolve` (ultralytics#6374)

* log best.pt metrics at train end

* update

* Update __init__.py

* flush callbacks when using evolve

Co-authored-by: Glenn Jocher <glenn.jocher@ultralytics.com>

* FROM nvcr.io/nvidia/pytorch:21.12-py3 (ultralytics#6377)

* FROM nvcr.io/nvidia/pytorch:21.10-py3 (ultralytics#6379)

21.12 generates dockerhub errors so rolling back to 21.10 with latest pytorch install. Not sure if this torch install will work on non-GPU dockerhub autobuild so this is an experiment.

* Add `albumentations` to Dockerfile (ultralytics#6392)

* Add `stop_training=False` flag to callbacks (ultralytics#6365)

* New flag 'stop_training' in util.callbacks.Callbacks class to prematurely stop training from callback handler

* Removed most of the new  checks, leaving only the one after calling 'on_train_batch_end'

* Cleanup

Co-authored-by: Glenn Jocher <glenn.jocher@ultralytics.com>

* Add `detect.py` GIF video inference (ultralytics#6410)

* Add detect.py GIF video inference

* Cleanup

* Update `greetings.yaml` email address (ultralytics#6412)

* Update `greetings.yaml` email address

* Update greetings.yml

* Rename logger from 'utils.logger' to 'yolov5' (ultralytics#6421)

* Gave a more explicit name to the logger

* Cleanup

Co-authored-by: Glenn Jocher <glenn.jocher@ultralytics.com>

* Prefer `tflite_runtime` for TFLite inference if installed (ultralytics#6406)

* import tflite_runtime if tensorflow not installed

* rename tflite to tfli

* Attempt tflite_runtime for all TFLite workflows

Also rename tfli to tfl

Co-authored-by: Glenn Jocher <glenn.jocher@ultralytics.com>

* Update workflows (ultralytics#6427)

* Workflow updates

* quotes fix

* best to weights fix

* Namespace `VERBOSE` env variable to `YOLOv5_VERBOSE` (ultralytics#6428)

* Verbose updates

* Verbose updates

* Add `*.asf` video support (ultralytics#6436)

* Revert "Remove `dataset_stats()` autodownload capability (ultralytics#6303)" (ultralytics#6442)

This reverts commit 3119b2f.

* Fix `select_device()` for Multi-GPU (ultralytics#6434)

* Fix `select_device()` for Multi-GPU

Possible fix for ultralytics#6431

* Update torch_utils.py

* Update torch_utils.py

* Update torch_utils.py

* Update torch_utils.py

* Update

* Update

* Update

* Update

* Update

* Update

* Update

* Update

* Update

* Fix2 `select_device()` for Multi-GPU (ultralytics#6461)

* Fix2 select_device() for Multi-GPU

* Cleanup

* Cleanup

* Simplify error message

* Improve assert

* Update torch_utils.py

* Add Product Hunt social media icon (ultralytics#6464)

* Social media icons update

* fix URL

* Update README.md

* Resolve dataset paths (ultralytics#6489)

* Simplify TF normalized to pixels (ultralytics#6494)

* Improved `export.py` usage examples (ultralytics#6495)

* Improved `export.py` usage examples

* Cleanup

* CoreML inference fix `list()` -> `sorted()` (ultralytics#6496)

* Suppress `torch.jit.TracerWarning` on export (ultralytics#6498)

* Suppress torch.jit.TracerWarning

TracerWarnings can be safely ignored.

* Cleanup

* Suppress export.run() TracerWarnings (ultralytics#6499)

Suppresses warnings when calling export.run() directly, not just CLI python export.py.

Also adds Requirements examples for CPU and GPU backends

* W&B: Remember batchsize on resuming (ultralytics#6512)

* log best.pt metrics at train end

* update

* Update __init__.py

* flush callbacks when using evolve

* remember batch size on resuming

* Update train.py

Co-authored-by: Glenn Jocher <glenn.jocher@ultralytics.com>

* Update hyp.scratch-high.yaml (ultralytics#6525)

Update `lrf: 0.1`, tested on YOLOv5x6 to 55.0 mAP@0.5:0.95, slightly higher than current.

* TODO issues exempt from stale action (ultralytics#6530)

* Update val_batch*.jpg for Chinese fonts (ultralytics#6526)

* Update plots for Chinese fonts

* make is_chinese() non-str safe

* Add global FONT

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update general.py

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Social icons after text (ultralytics#6473)

* Social icons after text

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update README.md

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Edge TPU compiler `sudo` fix (ultralytics#6531)

* Edge TPU compiler sudo fix

Allows for auto-install of Edge TPU compiler on non-sudo systems like the YOLOv5 Docker image.

@kalenmike

* Update export.py

* Update export.py

* Update export.py

* Edge TPU export 'list index out of range' fix (ultralytics#6533)

* Edge TPU `tf.lite.experimental.load_delegate` fix (ultralytics#6536)

* Edge TPU `tf.lite.experimental.load_delegate` fix

Fix attempt for ultralytics#6535

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Fixing minor multi-streaming issues with TensoRT engine (ultralytics#6504)

* Update batch-size in model.warmup() + indentation for logging inference results

* These changes are in response to PR comments

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Load checkpoint on CPU instead of on GPU (ultralytics#6516)

* Load checkpoint on CPU instead of on GPU

* refactor: simplify code

* Cleanup

* Update train.py

Co-authored-by: Glenn Jocher <glenn.jocher@ultralytics.com>

* flake8: code meanings (ultralytics#6481)

* Fix 6 Flake8 issues (ultralytics#6541)

* F541

* F821

* F841

* E741

* E302

* E722

* Apply suggestions from code review

* Update general.py

* Update datasets.py

* Update export.py

* Update plots.py

* Update plots.py

Co-authored-by: Glenn Jocher <glenn.jocher@ultralytics.com>

* Edge TPU TF imports fix (ultralytics#6542)

* Edge TPU TF imports fix

Fix for ultralytics#6535 (comment)

* Update common.py

* Move trainloader functions to class methods (ultralytics#6559)

* Move trainloader functions to class methods

* results = ThreadPool(NUM_THREADS).imap(self.load_image, range(n))

* Cleanup

* Improved AutoBatch DDP error message (ultralytics#6568)

* Improved AutoBatch DDP error message

* Cleanup

* Fix zero-export handling with `if any(f):` (ultralytics#6569)

* Fix zero-export handling with `if any(f):`

Partial fix for ultralytics#6563

* Cleanup

* Fix `plot_labels()` colored histogram bug (ultralytics#6574)

* Fix `plot_labels()` colored histogram bug

* Cleanup

* Allow custom` --evolve` project names (ultralytics#6567)

* Update train.py

As see in ultralytics#6463, modification on train in evolve process to allow custom save directory.

* fix val

* PEP8

whitespace around operator

* Cleanup

Co-authored-by: Glenn Jocher <glenn.jocher@ultralytics.com>

* Add `DATASETS_DIR` global in general.py (ultralytics#6578)

* return `opt` from `train.run()` (ultralytics#6581)

* Fix YouTube dislike button bug in `pafy` package (ultralytics#6603)

Per ultralytics#6583 (comment) by @alicera

* Update train.py

* Fix `hyp_evolve.yaml` indexing bug (ultralytics#6604)

* Fix `hyp_evolve.yaml` indexing bug

Bug caused hyp_evolve.yaml to display latest generation result rather than best generation result.

* Update plots.py

* Update general.py

* Update general.py

* Update general.py

* Fix `ROOT / data` when running W&B `log_dataset()` (ultralytics#6606)

* Fix missing data folder when running log_dataset

* Use ROOT/'data'

* PEP8 whitespace

* YouTube dependency fix `youtube_dl==2020.12.2` (ultralytics#6612)

Per ultralytics#5860 (comment) by @hdnh2006

* Add YOLOv5n to Reproduce section (ultralytics#6619)

* W&B: Improve resume stability (ultralytics#6611)

* log best.pt metrics at train end

* update

* Update __init__.py

* flush callbacks when using evolve

* remember batch size on resuming

* Update train.py

* improve stability of resume

Co-authored-by: Glenn Jocher <glenn.jocher@ultralytics.com>

* W&B: don't log media in evolve (ultralytics#6617)

* YOLOv5 Export Benchmarks (ultralytics#6613)

* Add benchmarks.py

* Update

* Add requirements

* Updates

* Updates

* Updates

* Updates

* Updates

* Updates

* dataset autodownload from root

* Update

* Redirect to /dev/null

* sudo --help

* Cleanup

* Add exports pd df

* Updates

* Updates

* Updates

* Cleanup

* dir handling fix

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Cleanup

* Cleanup2

* Cleanup3

* Cleanup model_type

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Fix ConfusionMatrix scale `vmin=0.0` (ultralytics#6638)

Fix attempt for ultralytics#6626

* Fixed wandb logger KeyError (ultralytics#6637)

* Fix yolov3.yaml remove list (ultralytics#6655)

Per ultralytics/yolov3#1887 (comment)

* Validate with 2x `--workers` (ultralytics#6658)

* Validate with 2x `--workers` single-GPU/CPU fix (ultralytics#6659)

Fix for ultralytics#6658 for single-GPU and CPU training use cases

* Add `--cache val` (ultralytics#6663)

New `--cache val` argument will cache validation set only into RAM. Should help multi-GPU training speeds without consuming as much RAM as full `--cache ram`.

* Robust `scipy.cluster.vq.kmeans` too few points (ultralytics#6668)

* Handle `scipy.cluster.vq.kmeans` too few points

Resolves ultralytics#6664

* Update autoanchor.py

* Cleanup

* Update Dockerfile `torch==1.10.2+cu113` (ultralytics#6669)

* FROM nvcr.io/nvidia/pytorch:22.01-py3 (ultralytics#6670)

* FROM nvcr.io/nvidia/pytorch:21.10-py3 (ultralytics#6671)

22.10 returns 'no space left on device' error message.

Seems like a bug at docker. Raised issue in docker/hub-feedback#2209

* Update Dockerfile reorder installs (ultralytics#6672)

Also `nvidia-tensorboard-plugin-dlprof`, `nvidia-tensorboard` are no longer installed in NVCR base.

* FROM nvcr.io/nvidia/pytorch:21.10-py3 (ultralytics#6673)

Reordered installation may help reduce resource usage in autobuild

* FROM nvcr.io/nvidia/pytorch:21.10-py3 (ultralytics#6677)

Revert to 21.10 on autobuild fail

* Fix TF exports >= 2GB (ultralytics#6292)

* Fix exporting saved_model: pb exceeds 2GB

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Replace TF v1.x API with TF v2.x API for saved_model export

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Clean up

* Remove lambda in tf.function()

* Revert "Remove lambda in tf.function()" to be compatible with TF v2.4

This reverts commit 46c7931f11dfdea6ae340c77287c35c30b9e0779.

* Fix for pre-commit.ci

* Cleanup1

* Cleanup2

* Backwards compatibility update

* Update common.py

* Update common.py

* Cleanup3

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Glenn Jocher <glenn.jocher@ultralytics.com>

* Fix `--evolve --bucket gs://...` (ultralytics#6698)

* Fix CoreML P6 inference (ultralytics#6700)

* Fix CoreML P6 inference

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Fix floating point in number of workers `nw` (ultralytics#6701)

Integer division by a float yields a (rounded) float. This causes
the dataloader to crash when creating a range.

* Edge TPU inference fix (ultralytics#6686)

* refactor: use edgetpu flag

* fix: remove bitwise and assignation to tflite

* Cleanup and fix tflite

* Cleanup

Co-authored-by: Glenn Jocher <glenn.jocher@ultralytics.com>

* Use `export_formats()` in export.py (ultralytics#6705)

* Use `export_formats()` in export.py

* list fix

* Suppress `torch` AMP-CPU warnings (ultralytics#6706)

This is a torch bug, but they seem unable or unwilling to fix it so I'm creating a suppression in YOLOv5. 

Resolves ultralytics#6692

* Update `nw` to `max(nd, 1)` (ultralytics#6714)

* GH: add PR template (ultralytics#6482)

* GH: add PR template

* Update CONTRIBUTING.md

* Update PULL_REQUEST_TEMPLATE.md

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update PULL_REQUEST_TEMPLATE.md

* Update PULL_REQUEST_TEMPLATE.md

* Update PULL_REQUEST_TEMPLATE.md

* Update PULL_REQUEST_TEMPLATE.md

Co-authored-by: Glenn Jocher <glenn.jocher@ultralytics.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Switch default LR scheduler from cos to linear (ultralytics#6729)

* Switch default LR scheduler from cos to linear

Based on empirical results of training both ways on all YOLOv5 models.

* linear bug fix

* Updated VOC hyperparameters (ultralytics#6732)

* Update hyps

* Update hyp.VOC.yaml

* Update pathlib

* Update hyps

* Update hyps

* Update hyps

* Update hyps

* YOLOv5 v6.1 release (ultralytics#6739)

* Pre-commit table fix (ultralytics#6744)

* Update tutorial.ipynb (2 CPUs, 12.7 GB RAM, 42.2/166.8 GB disk) (ultralytics#6767)

* Update min warmup iterations from 1k to 100 (ultralytics#6768)

* Default `OMP_NUM_THREADS=8` (ultralytics#6770)

* Update tutorial.ipynb (ultralytics#6771)

* Update hyp.VOC.yaml (ultralytics#6772)

* Fix export for 1-channel images (ultralytics#6780)

Export failed for 1-channel input shape, 1-liner fix

* Update EMA decay `tau` (ultralytics#6769)

* Update EMA

* Update EMA

* ratio invert

* fix ratio invert

* fix2 ratio invert

* warmup iterations to 100

* ema_k

* implement tau

* implement tau

* YOLOv5s6 params FLOPs fix (ultralytics#6782)

* Update PULL_REQUEST_TEMPLATE.md (ultralytics#6783)

* Update autoanchor.py (ultralytics#6794)

* Update autoanchor.py

* Update autoanchor.py

* Update sweep.yaml (ultralytics#6825)

* Update sweep.yaml

Changed focal loss gamma search range between 1 and 4

* Update sweep.yaml

lowered the min value to match default

* AutoAnchor improved initialization robustness (ultralytics#6854)

* Update AutoAnchor

* Update AutoAnchor

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Add `*.ts` to `VID_FORMATS` (ultralytics#6859)

* Update `--cache disk` deprecate `*_npy/` dirs (ultralytics#6876)

* Updates

* Updates

* Updates

* Updates

* Updates

* Updates

* Updates

* Updates

* Updates

* Updates

* Cleanup

* Cleanup

* Update yolov5s.yaml (ultralytics#6865)

* Update yolov5s.yaml

* Update yolov5s.yaml

Co-authored-by: Glenn Jocher <glenn.jocher@ultralytics.com>

* Default FP16 TensorRT export (ultralytics#6798)

* Assert engine precision ultralytics#6777

* Default to FP32 inputs for TensorRT engines

* Default to FP16 TensorRT exports ultralytics#6777

* Remove wrong line ultralytics#6777

* Automatically adjust detect.py input precision ultralytics#6777

* Automatically adjust val.py input precision ultralytics#6777

* Add missing colon

* Cleanup

* Cleanup

* Remove default trt_fp16_input definition

* Experiment

* Reorder detect.py if statement to after half checks

* Update common.py

* Update export.py

* Cleanup

Co-authored-by: Glenn Jocher <glenn.jocher@ultralytics.com>

* Bump actions/setup-python from 2 to 3 (ultralytics#6880)

Bumps [actions/setup-python](https://github.com/actions/setup-python) from 2 to 3.
- [Release notes](https://github.com/actions/setup-python/releases)
- [Commits](actions/setup-python@v2...v3)

---
updated-dependencies:
- dependency-name: actions/setup-python
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Bump actions/checkout from 2 to 3 (ultralytics#6881)

Bumps [actions/checkout](https://github.com/actions/checkout) from 2 to 3.
- [Release notes](https://github.com/actions/checkout/releases)
- [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md)
- [Commits](actions/checkout@v2...v3)

---
updated-dependencies:
- dependency-name: actions/checkout
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Fix TRT `max_workspace_size` deprecation notice (ultralytics#6856)

* Fix TRT `max_workspace_size` deprecation notice

* Update export.py

* Update export.py

* Update bytes to GB with bitshift (ultralytics#6886)

* Move `git_describe()` to general.py (ultralytics#6918)

* Move `git_describe()` to general.py

* Move `git_describe()` to general.py

* PyTorch 1.11.0 compatibility updates (ultralytics#6932)

Resolves `AttributeError: 'Upsample' object has no attribute 'recompute_scale_factor'` first raised in ultralytics#5499

* Optimize PyTorch 1.11.0 compatibility update (ultralytics#6933)

* Allow 3-point segments (ultralytics#6938)

May resolve ultralytics#6931

* Fix PyTorch Hub export inference shapes (ultralytics#6949)

May resolve ultralytics#6947

* DetectMultiBackend() `--half` handling (ultralytics#6945)

* DetectMultiBackend() `--half` handling

* CI fixes

* rename .half to .fp16 to avoid conflict

* warmup fix

* val update

* engine update

* engine update

* Update Dockerfile `torch==1.11.0+cu113` (ultralytics#6954)

* New val.py `cuda` variable (ultralytics#6957)

* New val.py `cuda` variable

Fix for ONNX GPU val.

* Update val.py

* DetectMultiBackend() return `device` update (ultralytics#6958)

Fixes ONNX validation that returns outputs on CPU.

* Tensor initialization on device improvements (ultralytics#6959)

* Update common.py speed improvements

Eliminate .to() ops where possible for reduced data transfer overhead. Primarily affects warmup and PyTorch Hub inference.

* Updates

* Updates

* Update detect.py

* Update val.py

* EdgeTPU optimizations (ultralytics#6808)

* removed transpose op for better edgetpu support

* fix for training case

* enabled experimental new quantizer flag

* precalculate add and mul ops at compile time

Co-authored-by: Glenn Jocher <glenn.jocher@ultralytics.com>

* Model `ema` key backward compatibility fix (ultralytics#6972)

Fix for older model loading issue in ultralytics@d3d9cbc#commitcomment-68622388

* pt model to cpu on TF export

* YOLOv5 Export Benchmarks for GPU (ultralytics#6963)

* Add benchmarks.py GPU support

* Updates

* Updates

* Updates

* Updates

* Add --half

* Add TRT requirements

* Cleanup

* Add TF to warmup types

* Update export.py

* Update export.py

* Update benchmarks.py

* Update TQDM bar format (ultralytics#6988)

* Conditional `Timeout()` by OS (disable on Windows) (ultralytics#7013)

* Conditional `Timeout()` by OS (disable on Windows)

* Update general.py

* fix: add default PIL font as fallback  (ultralytics#7010)

* fix: add default font as fallback

Add default font as fallback if the downloading of the Arial.ttf font
fails for some reason, e.g. no access to public internet.

* Update plots.py

Co-authored-by: Maximilian Strobel <Maximilian.Strobel@infineon.com>
Co-authored-by: Glenn Jocher <glenn.jocher@ultralytics.com>

* Consistent saved_model output format (ultralytics#7032)

* `ComputeLoss()` indexing/speed improvements (ultralytics#7048)

* device as class attribute

* Update loss.py

* Update loss.py

* improve zeros

* tensor split

* Update Dockerfile to `git clone` instead of `COPY` (ultralytics#7053)

Resolves git command errors that currently happen in image, i.e.:

```bash
root@382ae64aeca2:/usr/src/app# git pull
Warning: Permanently added the ECDSA host key for IP address '140.82.113.3' to the list of known hosts.
git@github.com: Permission denied (publickey).
fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.
```

* Create SECURITY.md (ultralytics#7054)

* Create SECURITY.md

Resolves ultralytics#7052

* Move into ./github

* Update SECURITY.md

* Fix incomplete URL substring sanitation (ultralytics#7056)

Resolves code scanning alert in ultralytics#7055

* Use PIL to eliminate chroma subsampling in crops (ultralytics#7008)

* use pillow to save higher-quality jpg (w/o color subsampling)

* Cleanup and doc issue

Co-authored-by: Glenn Jocher <glenn.jocher@ultralytics.com>

* Fix `check_anchor_order()` in pixel-space not grid-space (ultralytics#7060)

* Update `check_anchor_order()`

Use mean area per output layer for added stability.

* Check in pixel-space not grid-space fix

* Update detect.py non-inplace with `y.tensor_split()` (ultralytics#7062)

* Update common.py lists for tuples (ultralytics#7063)

Improved profiling.

* Update W&B message to `LOGGER.info()` (ultralytics#7064)

* Update __init__.py (ultralytics#7065)

* Add non-zero `da` `check_anchor_order()` condition (ultralytics#7066)

* Fix2 `check_anchor_order()` in pixel-space not grid-space (ultralytics#7067)

Follows ultralytics#7060 which provided only a partial solution to this issue. ultralytics#7060 resolved occurences in yolo.py, this applies the same fix in autoanchor.py.

* Revert "Update detect.py non-inplace with `y.tensor_split()` (ultralytics#7062)" (ultralytics#7074)

This reverts commit d5e363f.

* Update loss.py with `if self.gr < 1:` (ultralytics#7087)

* Update loss.py with `if self.gr < 1:`

* Update loss.py

* Update loss for FP16 `tobj` (ultralytics#7088)

* Update model summary to display model name (ultralytics#7101)

* `torch.split()` 1.7.0 compatibility fix (ultralytics#7102)

* Update loss.py

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update loss.py

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Update benchmarks significant digits (ultralytics#7103)

* Model summary `pathlib` fix (ultralytics#7104)

Stems not working correctly for YOLOv5l with current .rstrip() implementation. After fix:
```
YOLOv5l summary: 468 layers, 46563709 parameters, 46563709 gradients, 109.3 GFLOPs
```

* Remove named arguments where possible (ultralytics#7105)

* Remove named arguments where possible

Speed improvements.

* Update yolo.py

* Update yolo.py

* Update yolo.py

* Multi-threaded VisDrone and VOC downloads (ultralytics#7108)

* Multi-threaded VOC download

* Update VOC.yaml

* Update

* Update general.py

* Update general.py

* `np.fromfile()` Chinese image paths fix (ultralytics#6979)

* 🎉 🆕 now can read Chinese image path. 

use "cv2.imdecode(np.fromfile(f, np.uint8), cv2.IMREAD_COLOR)" instead of "cv2.imread(f)" for Chinese image path.

* Update datasets.py

* Update __init__.py

Co-authored-by: Glenn Jocher <glenn.jocher@ultralytics.com>

* Add PyTorch Hub `results.save(labels=False)` option (ultralytics#7129)

Resolves ultralytics#388 (comment)

* SparseML integration

* Add SparseML dependancy

* Update: add missing files

* Update requirements.txt

* Update: sparseml-nightly support

* Update: remove model versioning

* Partial update for multi-stage recipes

* Update: multi-stage recipe support

* Update: remove sparseml dep

* Fix: multi-stage recipe handeling

* Fix: multi stage support

* Fix: non-recipe runs

* Add: legacy hyperparam files

* Fix: add copy-paste to hyps

* Fix: nit

* apply structure fixes

* Squashed rebase to v6.1 upstream

* Update SparseML Integration to V6.1 (#26)

* SparseML integration

* Add SparseML dependancy

* Update: add missing files

* Update requirements.txt

* Update: sparseml-nightly support

* Update: remove model versioning

* Partial update for multi-stage recipes

* Update: multi-stage recipe support

* Update: remove sparseml dep

* Fix: multi-stage recipe handeling

* Fix: multi stage support

* Fix: non-recipe runs

* Add: legacy hyperparam files

* Fix: add copy-paste to hyps

* Fix: nit

* apply structure fixes

* manager fixes

* Update function name

Co-authored-by: Konstantin <konstantin@neuralmagic.com>
Co-authored-by: Konstantin Gulin <66528950+KSGulin@users.noreply.github.com>
KSGulin added a commit to neuralmagic/yolov5 that referenced this pull request Apr 14, 2022
* Fix TensorRT potential unordered binding addresses (ultralytics#5826)

* feat: change file suffix in pythonic way

* fix: enforce binding addresses order

* fix: enforce binding addresses order

* Handle non-TTY `wandb.errors.UsageError` (ultralytics#5839)

* `try: except (..., wandb.errors.UsageError)`

* bug fix

* Avoid inplace modifying`imgs` in `LoadStreams` (ultralytics#5850)

When OpenCV retrieving image fail, original code would modify source images **inplace**, which may result in plotting bounding boxes on a black image. That is, before inference, source image `im0s[i]` is OK, but after inference before `Process predictions`,  `im0s[i]` may have been changed.

* Update `LoadImages` `ret_val=False` handling (ultralytics#5852)

Video errors may occur.

* Update val.py (ultralytics#5838)

* Update val.py

Solving Non-ASCII character '\xf0' error during runtime

* Update val.py

Co-authored-by: Glenn Jocher <glenn.jocher@ultralytics.com>

* Update TorchScript suffix to `*.torchscript` (ultralytics#5856)

* Add `--workers 8` argument to val.py (ultralytics#5857)

* Update val.py

Add an option to choose number of workers if not called by train.py

* Update comment

* 120 char line width

Co-authored-by: Glenn Jocher <glenn.jocher@ultralytics.com>

* Update `plot_lr_scheduler()` (ultralytics#5864)

shallow copy modify originals

* Update `nl` after `cutout()` (ultralytics#5873)

* `AutoShape()` models as `DetectMultiBackend()` instances (ultralytics#5845)

* Update AutoShape()

* autodownload ONNX

* Cleanup

* Finish updates

* Add Usage

* Update

* Update

* Update

* Update

* Update

* Update

* Update

* Update

* Update

* Update

* Update

* Update

* Update

* Update

* Update

* Update

* Update

* fix device

* Update hubconf.py

* Update common.py

* smart param selection

* autodownload all formats

* autopad only pytorch models

* new_shape edits

* stride tensor fix

* Cleanup

* Single-command multiple-model export (ultralytics#5882)

* Export multiple models in series

Export multiple models in series by adding additional `*.pt` files to the `--weights` argument, i.e.:

```bash
python export.py --include tflite --weights yolov5n.pt  # export 1 model
python export.py --include tflite --weights yolov5n.pt yolov5s.pt yolov5m.pt yolov5l.pt yolov5x.pt  # export 5 models
```

* Update export.py

* Update README.md

* `Detections().tolist()` explicit argument fix (ultralytics#5907)

debugged for missigned Detections attributes

* Update wandb_utils.py (ultralytics#5908)

* Add *.engine (TensorRT extensions) to .gitignore (ultralytics#5911)

* Add *.engine (TensorRT extensions) to .gitignore

* Update .dockerignore

Co-authored-by: Glenn Jocher <glenn.jocher@ultralytics.com>

* Add ONNX inference providers (ultralytics#5918)

* Add ONNX inference providers

Fix for ultralytics#5916

* Update common.py

* Add hardware checks to `notebook_init()` (ultralytics#5919)

* Update notebook

* Update notebook

* update string

* update string

* Updates

* Updates

* Updates

* check both ipython and psutil

* remove sample_data if is_colab

* cleanup

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Revert "Update `plot_lr_scheduler()` (ultralytics#5864)" (ultralytics#5920)

This reverts commit 360eec6.

* Absolute '/content/sample_data' (ultralytics#5922)

* Default PyTorch Hub to `autocast(False)` (ultralytics#5926)

* Fix ONNX opset inconsistency with parseargs and run args (ultralytics#5937)

* Make `select_device()` robust to `batch_size=-1` (ultralytics#5940)

* Find out a bug. When set batch_size = -1 to use the autobatch.

reproduce:

* Fix type conflict

Co-authored-by: Glenn Jocher <glenn.jocher@ultralytics.com>

* fix .gitignore not tracking existing folders (ultralytics#5946)

* fix .gitignore not tracking existing folders

fix .gitignore so that the files that are in the repository are actually being tracked.

Everything in the data/ folder is ignored, which also means the subdirectories are ignored. Fix so that the subdirectories and their contents are still tracked.

* Remove data/trainings

Co-authored-by: Glenn Jocher <glenn.jocher@ultralytics.com>

* Update `strip_optimizer()` (ultralytics#5949)

Replace 'training_result' with 'best_fitness' in strip_optimizer() to match key with ckpt from train.py

* Add nms and agnostic nms to export.py (ultralytics#5938)

* add nms and agnostic nms to export.py

* fix agnostic implies nms

* reorder args to group TF args

* PEP8 120 char

Co-authored-by: Glenn Jocher <glenn.jocher@ultralytics.com>

* Refactor NUM_THREADS (ultralytics#5954)

* Fix Detections class `tolist()` method (ultralytics#5945)

* Fix tolist() to add the file for each Detection

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix PEP8 requirement for 2 spaces before an inline comment

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Cleanup

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Glenn Jocher <glenn.jocher@ultralytics.com>

* Fix `imgsz` bug (ultralytics#5948)

* fix imgsz bug

* Update detect.py

Co-authored-by: Glenn Jocher <glenn.jocher@ultralytics.com>

* `pretrained=False` fix (ultralytics#5966)

* `pretriained=False` fix

Fix for ultralytics#5964

* CI speed improvement

* make parameter ignore epochs (ultralytics#5972)

* make parameter ignore epochs

ignore epochs functionality add to prevent spikes at the beginning when fitness spikes and decreases after.
Discussed at ultralytics#5971

* Update train.py

Co-authored-by: Glenn Jocher <glenn.jocher@ultralytics.com>

* YOLOv5s6 params and FLOPs fix (ultralytics#5977)

* Update callbacks.py with `__init__()` (ultralytics#5979)

Add __init__() function.

* Increase `ar_thr` from 20 to 100 for better detection on slender (high aspect ratio) objects (ultralytics#5556)

* Making `ar_thr` available as a hyperparameter

* Disabling ar_thr as hyperparameter and computing from the dataset instead

* Fixing bug in ar_thr computation

* Fix `ar_thr` to 100

* Allow `--weights URL` (ultralytics#5991)

* Recommend `jar xf file.zip` for zips (ultralytics#5993)

* *.torchscript inference `self.jit` fix (ultralytics#6007)

* Check TensorRT>=8.0.0 version (ultralytics#6021)

* Check TensorRT>=8.0.0 version

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Multi-layer capable `--freeze` argument (ultralytics#6019)

* support specfiy multiple frozen layers

* fix bug

* Cleanup Freeze section

* Cleanup argument

Co-authored-by: Glenn Jocher <glenn.jocher@ultralytics.com>

* train -> val comment fix (ultralytics#6024)

* Add dataset source citations (ultralytics#6032)

* Kaggle `LOGGER` fix (ultralytics#6041)

* Simplify `set_logging()` indexing (ultralytics#6042)

* `--freeze` fix (ultralytics#6044)

Fix for ultralytics#6038

* OpenVINO Export (ultralytics#6057)

* OpenVINO export

* Remove timeout

* Add 3 files

* str

* Constrain opset to 12

* Default ONNX opset to 12

* Make dir

* Make dir

* Cleanup

* Cleanup

* check_requirements(('openvino-dev',))

* Reduce G/D/CIoU logic operations (ultralytics#6074)

Consider that the default value is CIOU,adjust the order of judgment could reduce the number of judgments.
And “elif CIoU:” didn't need 'if'.

Co-authored-by: 李杰 <360751194@qq.comqq.com>

* Init tensor directly on device (ultralytics#6068)

Slightly more efficient than .to(device)

* W&B: track batch size after autobatch (ultralytics#6039)

* track batch size after autobatch

* remove redundant import

* Update __init__.py

* Update __init__.py

Co-authored-by: Glenn Jocher <glenn.jocher@ultralytics.com>

* W&B: Log best results after training ends (ultralytics#6120)

* log best.pt metrics at train end

* update

* Update __init__.py

Co-authored-by: Glenn Jocher <glenn.jocher@ultralytics.com>

* Log best results (ultralytics#6085)

* log best result in summary

* comment added

* add space for `flake8`

* log `best/epoch`

* fix `dimension` for epoch

ValueError: all the input arrays must have same number of dimensions

* log `best/` in `utils.logger.__init__`

* fix pre-commit

1. missing whitespace around operator
2.  over-indented

* Refactor/reduce G/C/D/IoU `if: else` statements (ultralytics#6087)

* Refactor the code to reduece else

* Update metrics.py

* Cleanup

Co-authored-by: Cmos <gen.chen@ubisoft.com>
Co-authored-by: Glenn Jocher <glenn.jocher@ultralytics.com>

* Add EdgeTPU support (ultralytics#3630)

* Add models/tf.py for TensorFlow and TFLite export

* Set auto=False for int8 calibration

* Update requirements.txt for TensorFlow and TFLite export

* Read anchors directly from PyTorch weights

* Add --tf-nms to append NMS in TensorFlow SavedModel and GraphDef export

* Remove check_anchor_order, check_file, set_logging from import

* Reformat code and optimize imports

* Autodownload model and check cfg

* update --source path, img-size to 320, single output

* Adjust representative_dataset

* Put representative dataset in tfl_int8 block

* detect.py TF inference

* weights to string

* weights to string

* cleanup tf.py

* Add --dynamic-batch-size

* Add xywh normalization to reduce calibration error

* Update requirements.txt

TensorFlow 2.3.1 -> 2.4.0 to avoid int8 quantization error

* Fix imports

Move C3 from models.experimental to models.common

* Add models/tf.py for TensorFlow and TFLite export

* Set auto=False for int8 calibration

* Update requirements.txt for TensorFlow and TFLite export

* Read anchors directly from PyTorch weights

* Add --tf-nms to append NMS in TensorFlow SavedModel and GraphDef export

* Remove check_anchor_order, check_file, set_logging from import

* Reformat code and optimize imports

* Autodownload model and check cfg

* update --source path, img-size to 320, single output

* Adjust representative_dataset

* detect.py TF inference

* Put representative dataset in tfl_int8 block

* weights to string

* weights to string

* cleanup tf.py

* Add --dynamic-batch-size

* Add xywh normalization to reduce calibration error

* Update requirements.txt

TensorFlow 2.3.1 -> 2.4.0 to avoid int8 quantization error

* Fix imports

Move C3 from models.experimental to models.common

* implement C3() and SiLU()

* Add TensorFlow and TFLite Detection

* Add --tfl-detect for TFLite Detection

* Add int8 quantized TFLite inference in detect.py

* Add --edgetpu for Edge TPU detection

* Fix --img-size to add rectangle TensorFlow and TFLite input

* Add --no-tf-nms to detect objects using models combined with TensorFlow NMS

* Fix --img-size list type input

* Update README.md

* Add Android project for TFLite inference

* Upgrade TensorFlow v2.3.1 -> v2.4.0

* Disable normalization of xywh

* Rewrite names init in detect.py

* Change input resolution 640 -> 320 on Android

* Disable NNAPI

* Update README.me --img 640 -> 320

* Update README.me for Edge TPU

* Update README.md

* Fix reshape dim to support dynamic batching

* Fix reshape dim to support dynamic batching

* Add epsilon argument in tf_BN, which is different between TF and PT

* Set stride to None if not using PyTorch, and do not warmup without PyTorch

* Add list support in check_img_size()

* Add list input support in detect.py

* sys.path.append('./') to run from yolov5/

* Add int8 quantization support for TensorFlow 2.5

* Add get_coco128.sh

* Remove --no-tfl-detect in models/tf.py (Use tf-android-tfl-detect branch for EdgeTPU)

* Update requirements.txt

* Replace torch.load() with attempt_load()

* Update requirements.txt

* Add --tf-raw-resize to set half_pixel_centers=False

* Remove android directory

* Update README.md

* Update README.md

* Add multiple OS support for EdgeTPU detection

* Fix export and detect

* Export 3 YOLO heads with Edge TPU models

* Remove xywh denormalization with Edge TPU models in detect.py

* Fix saved_model and pb detect error

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix pre-commit.ci failure

* Add edgetpu in export.py docstring

* Fix Edge TPU model detection exported by TF 2.7

* Add class names for TF/TFLite in DetectMultibackend

* Fix assignment with nl in TFLite Detection

* Add check when getting Edge TPU compiler version

* Add UTF-8 encoding in opening --data file for Windows

* Remove redundant TensorFlow import

* Add Edge TPU in export.py's docstring

* Add the detect layer in Edge TPU model conversion

* Default `dnn=False`

* Cleanup data.yaml loading

* Update detect.py

* Update val.py

* Comments and generalize data.yaml names

Co-authored-by: Glenn Jocher <glenn.jocher@ultralytics.com>
Co-authored-by: unknown <fangjiacong@ut.cn>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Enable AdamW optimizer (ultralytics#6152)

* Update export format docstrings (ultralytics#6151)

* Update export documentation

* Cleanup

* Update export.py

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update README.md

* Update README.md

* Update README.md

* Update train.py

* Update train.py

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Update greetings.yml (ultralytics#6165)

* [pre-commit.ci] pre-commit suggestions (ultralytics#6177)

updates:
- [github.com/pre-commit/pre-commit-hooks: v4.0.1 → v4.1.0](pre-commit/pre-commit-hooks@v4.0.1...v4.1.0)
- [github.com/asottile/pyupgrade: v2.23.1 → v2.31.0](asottile/pyupgrade@v2.23.1...v2.31.0)
- [github.com/PyCQA/isort: 5.9.3 → 5.10.1](PyCQA/isort@5.9.3...5.10.1)
- [github.com/PyCQA/flake8: 3.9.2 → 4.0.1](PyCQA/flake8@3.9.2...4.0.1)

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Update NMS `max_wh=7680` for 8k images (ultralytics#6178)

* Add OpenVINO inference (ultralytics#6179)

* Ignore `*_openvino_model/` dir (ultralytics#6180)

* Global export format sort (ultralytics#6182)

* Global export sort

* Cleanup

* Fix TorchScript on mobile export (ultralytics#6183)

* fix export of TorchScript on mobile

* Cleanup

Co-authored-by: yinrong <yinrong@xiaomi.com>
Co-authored-by: Glenn Jocher <glenn.jocher@ultralytics.com>

* TensorRT 7 `anchor_grid` compatibility fix (ultralytics#6185)

* fix: TensorRT 7 incompatiable

* Add comment

* Add if: else and comment

Co-authored-by: Glenn Jocher <glenn.jocher@ultralytics.com>

* Add `tensorrt>=7.0.0` checks (ultralytics#6193)

* Add `tensorrt>=7.0.0` checks

* Update export.py

* Update common.py

* Update export.py

* Add CoreML inference (ultralytics#6195)

* Add Apple CoreML inference

* Cleanup

* Fix `nan`-robust stream FPS (ultralytics#6198)

Fix for Webcam stop working suddenly (Issue ultralytics#6197)

* Edge TPU compiler comment (ultralytics#6196)

* Edge TPU compiler comment

* 7 to 8 fix

* TFLite `--int8` 'flatbuffers==1.12' fix (ultralytics#6216)

* TFLite `--int8` 'flatbuffers==1.12' fix

Temporary workaround for TFLite INT8 export.

* Update export.py

* Update export.py

* TFLite `--int8` 'flatbuffers==1.12' fix 2 (ultralytics#6217)

* TFLite `--int8` 'flatbuffers==1.12' fix 2

Reorganizes ultralytics#6216 fix to update before `tensorflow` import so no restart required.

* Update export.py

* Add `edgetpu_compiler` checks (ultralytics#6218)

* Add `edgetpu_compiler` checks

* Update export.py

* Update export.py

* Update export.py

* Update export.py

* Update export.py

* Update export.py

* Attempt `edgetpu-compiler` autoinstall (ultralytics#6223)

* Attempt `edgetpu-compiler` autoinstall

Attempt to install edgetpu-compiler dependency if missing on Linux.

* Update export.py

* Update export.py

* Update README speed reproduction command (ultralytics#6228)

* Update P2-P7 `models/hub` variants (ultralytics#6230)

* Update p2-p7 `models/hub` variants

* Update common.py

* AutoAnchor camelcase corrections

* TensorRT 7 export fix (ultralytics#6235)

* Fix `cmd` string on `tfjs` export (ultralytics#6243)

* Fix cmd string on tfjs export

* Cleanup

Co-authored-by: Glenn Jocher <glenn.jocher@ultralytics.com>

* TensorRT pip install

* Enable ONNX `--half` FP16 inference (ultralytics#6268)

* Enable ONNX ``--half` FP16 inference

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Update `export.py` with Detect, Validate usages (ultralytics#6280)

* Add `is_kaggle()` function (ultralytics#6285)

* Add `is_kaggle()` function

Return True if environment is Kaggle Notebook.

* Remove root loggers only if is_kaggle() == True

* Update general.py

* Fix `device` count check (ultralytics#6290)

* Fix device count check()

* Update torch_utils.py

* Update torch_utils.py

* Update hubconf.py

* Fixing bug multi-gpu training (ultralytics#6299)

* Fixing bug multi-gpu training

This solves this issue: ultralytics#6297 (comment)

* Update torch_utils.py for pep8

* `select_device()` cleanup (ultralytics#6302)

* `select_device()` cleanup

* Update torch_utils.py

* Update torch_utils.py

* Update torch_utils.py

* Update torch_utils.py

* Update torch_utils.py

* Fix `train.py` parameter groups desc error (ultralytics#6318)

* Fix `train.py` parameter groups desc error

* Cleanup

Co-authored-by: Glenn Jocher <glenn.jocher@ultralytics.com>

* Remove `dataset_stats()` autodownload capability (ultralytics#6303)

* Remove `dataset_stats()` autodownload capability

@kalenmike security update per Slack convo

* Update datasets.py

* Console corrupted -> corrupt (ultralytics#6338)

* Console corrupted -> corrupt 

Minor style changes.

* Update export.py

* TensorRT `assert im.device.type != 'cpu'` on export (ultralytics#6340)

* TensorRT `assert im.device.type != 'cpu'` on export

* Update export.py

* `export.py` return exported files/dirs (ultralytics#6343)

* `export.py` return exported files/dirs

* Path to str

* Created using Colaboratory

* `export.py` automatic `forward_export` (ultralytics#6352)

* `export.py` automatic `forward_export`

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* New environment variable `VERBOSE` (ultralytics#6353)

New environment variable `VERBOSE`

* Reuse `de_parallel()` rather than `is_parallel()` (ultralytics#6354)

* `DEVICE_COUNT` instead of `WORLD_SIZE` to calculate `nw` (ultralytics#6324)

* Flush callbacks when on `--evolve` (ultralytics#6374)

* log best.pt metrics at train end

* update

* Update __init__.py

* flush callbacks when using evolve

Co-authored-by: Glenn Jocher <glenn.jocher@ultralytics.com>

* FROM nvcr.io/nvidia/pytorch:21.12-py3 (ultralytics#6377)

* FROM nvcr.io/nvidia/pytorch:21.10-py3 (ultralytics#6379)

21.12 generates dockerhub errors so rolling back to 21.10 with latest pytorch install. Not sure if this torch install will work on non-GPU dockerhub autobuild so this is an experiment.

* Add `albumentations` to Dockerfile (ultralytics#6392)

* Add `stop_training=False` flag to callbacks (ultralytics#6365)

* New flag 'stop_training' in util.callbacks.Callbacks class to prematurely stop training from callback handler

* Removed most of the new  checks, leaving only the one after calling 'on_train_batch_end'

* Cleanup

Co-authored-by: Glenn Jocher <glenn.jocher@ultralytics.com>

* Add `detect.py` GIF video inference (ultralytics#6410)

* Add detect.py GIF video inference

* Cleanup

* Update `greetings.yaml` email address (ultralytics#6412)

* Update `greetings.yaml` email address

* Update greetings.yml

* Rename logger from 'utils.logger' to 'yolov5' (ultralytics#6421)

* Gave a more explicit name to the logger

* Cleanup

Co-authored-by: Glenn Jocher <glenn.jocher@ultralytics.com>

* Prefer `tflite_runtime` for TFLite inference if installed (ultralytics#6406)

* import tflite_runtime if tensorflow not installed

* rename tflite to tfli

* Attempt tflite_runtime for all TFLite workflows

Also rename tfli to tfl

Co-authored-by: Glenn Jocher <glenn.jocher@ultralytics.com>

* Update workflows (ultralytics#6427)

* Workflow updates

* quotes fix

* best to weights fix

* Namespace `VERBOSE` env variable to `YOLOv5_VERBOSE` (ultralytics#6428)

* Verbose updates

* Verbose updates

* Add `*.asf` video support (ultralytics#6436)

* Revert "Remove `dataset_stats()` autodownload capability (ultralytics#6303)" (ultralytics#6442)

This reverts commit 3119b2f.

* Fix `select_device()` for Multi-GPU (ultralytics#6434)

* Fix `select_device()` for Multi-GPU

Possible fix for ultralytics#6431

* Update torch_utils.py

* Update torch_utils.py

* Update torch_utils.py

* Update torch_utils.py

* Update

* Update

* Update

* Update

* Update

* Update

* Update

* Update

* Update

* Fix2 `select_device()` for Multi-GPU (ultralytics#6461)

* Fix2 select_device() for Multi-GPU

* Cleanup

* Cleanup

* Simplify error message

* Improve assert

* Update torch_utils.py

* Add Product Hunt social media icon (ultralytics#6464)

* Social media icons update

* fix URL

* Update README.md

* Resolve dataset paths (ultralytics#6489)

* Simplify TF normalized to pixels (ultralytics#6494)

* Improved `export.py` usage examples (ultralytics#6495)

* Improved `export.py` usage examples

* Cleanup

* CoreML inference fix `list()` -> `sorted()` (ultralytics#6496)

* Suppress `torch.jit.TracerWarning` on export (ultralytics#6498)

* Suppress torch.jit.TracerWarning

TracerWarnings can be safely ignored.

* Cleanup

* Suppress export.run() TracerWarnings (ultralytics#6499)

Suppresses warnings when calling export.run() directly, not just CLI python export.py.

Also adds Requirements examples for CPU and GPU backends

* W&B: Remember batchsize on resuming (ultralytics#6512)

* log best.pt metrics at train end

* update

* Update __init__.py

* flush callbacks when using evolve

* remember batch size on resuming

* Update train.py

Co-authored-by: Glenn Jocher <glenn.jocher@ultralytics.com>

* Update hyp.scratch-high.yaml (ultralytics#6525)

Update `lrf: 0.1`, tested on YOLOv5x6 to 55.0 mAP@0.5:0.95, slightly higher than current.

* TODO issues exempt from stale action (ultralytics#6530)

* Update val_batch*.jpg for Chinese fonts (ultralytics#6526)

* Update plots for Chinese fonts

* make is_chinese() non-str safe

* Add global FONT

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update general.py

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Social icons after text (ultralytics#6473)

* Social icons after text

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update README.md

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Edge TPU compiler `sudo` fix (ultralytics#6531)

* Edge TPU compiler sudo fix

Allows for auto-install of Edge TPU compiler on non-sudo systems like the YOLOv5 Docker image.

@kalenmike

* Update export.py

* Update export.py

* Update export.py

* Edge TPU export 'list index out of range' fix (ultralytics#6533)

* Edge TPU `tf.lite.experimental.load_delegate` fix (ultralytics#6536)

* Edge TPU `tf.lite.experimental.load_delegate` fix

Fix attempt for ultralytics#6535

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Fixing minor multi-streaming issues with TensoRT engine (ultralytics#6504)

* Update batch-size in model.warmup() + indentation for logging inference results

* These changes are in response to PR comments

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Load checkpoint on CPU instead of on GPU (ultralytics#6516)

* Load checkpoint on CPU instead of on GPU

* refactor: simplify code

* Cleanup

* Update train.py

Co-authored-by: Glenn Jocher <glenn.jocher@ultralytics.com>

* flake8: code meanings (ultralytics#6481)

* Fix 6 Flake8 issues (ultralytics#6541)

* F541

* F821

* F841

* E741

* E302

* E722

* Apply suggestions from code review

* Update general.py

* Update datasets.py

* Update export.py

* Update plots.py

* Update plots.py

Co-authored-by: Glenn Jocher <glenn.jocher@ultralytics.com>

* Edge TPU TF imports fix (ultralytics#6542)

* Edge TPU TF imports fix

Fix for ultralytics#6535 (comment)

* Update common.py

* Move trainloader functions to class methods (ultralytics#6559)

* Move trainloader functions to class methods

* results = ThreadPool(NUM_THREADS).imap(self.load_image, range(n))

* Cleanup

* Improved AutoBatch DDP error message (ultralytics#6568)

* Improved AutoBatch DDP error message

* Cleanup

* Fix zero-export handling with `if any(f):` (ultralytics#6569)

* Fix zero-export handling with `if any(f):`

Partial fix for ultralytics#6563

* Cleanup

* Fix `plot_labels()` colored histogram bug (ultralytics#6574)

* Fix `plot_labels()` colored histogram bug

* Cleanup

* Allow custom` --evolve` project names (ultralytics#6567)

* Update train.py

As see in ultralytics#6463, modification on train in evolve process to allow custom save directory.

* fix val

* PEP8

whitespace around operator

* Cleanup

Co-authored-by: Glenn Jocher <glenn.jocher@ultralytics.com>

* Add `DATASETS_DIR` global in general.py (ultralytics#6578)

* return `opt` from `train.run()` (ultralytics#6581)

* Fix YouTube dislike button bug in `pafy` package (ultralytics#6603)

Per ultralytics#6583 (comment) by @alicera

* Update train.py

* Fix `hyp_evolve.yaml` indexing bug (ultralytics#6604)

* Fix `hyp_evolve.yaml` indexing bug

Bug caused hyp_evolve.yaml to display latest generation result rather than best generation result.

* Update plots.py

* Update general.py

* Update general.py

* Update general.py

* Fix `ROOT / data` when running W&B `log_dataset()` (ultralytics#6606)

* Fix missing data folder when running log_dataset

* Use ROOT/'data'

* PEP8 whitespace

* YouTube dependency fix `youtube_dl==2020.12.2` (ultralytics#6612)

Per ultralytics#5860 (comment) by @hdnh2006

* Add YOLOv5n to Reproduce section (ultralytics#6619)

* W&B: Improve resume stability (ultralytics#6611)

* log best.pt metrics at train end

* update

* Update __init__.py

* flush callbacks when using evolve

* remember batch size on resuming

* Update train.py

* improve stability of resume

Co-authored-by: Glenn Jocher <glenn.jocher@ultralytics.com>

* W&B: don't log media in evolve (ultralytics#6617)

* YOLOv5 Export Benchmarks (ultralytics#6613)

* Add benchmarks.py

* Update

* Add requirements

* Updates

* Updates

* Updates

* Updates

* Updates

* Updates

* dataset autodownload from root

* Update

* Redirect to /dev/null

* sudo --help

* Cleanup

* Add exports pd df

* Updates

* Updates

* Updates

* Cleanup

* dir handling fix

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Cleanup

* Cleanup2

* Cleanup3

* Cleanup model_type

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Fix ConfusionMatrix scale `vmin=0.0` (ultralytics#6638)

Fix attempt for ultralytics#6626

* Fixed wandb logger KeyError (ultralytics#6637)

* Fix yolov3.yaml remove list (ultralytics#6655)

Per ultralytics/yolov3#1887 (comment)

* Validate with 2x `--workers` (ultralytics#6658)

* Validate with 2x `--workers` single-GPU/CPU fix (ultralytics#6659)

Fix for ultralytics#6658 for single-GPU and CPU training use cases

* Add `--cache val` (ultralytics#6663)

New `--cache val` argument will cache validation set only into RAM. Should help multi-GPU training speeds without consuming as much RAM as full `--cache ram`.

* Robust `scipy.cluster.vq.kmeans` too few points (ultralytics#6668)

* Handle `scipy.cluster.vq.kmeans` too few points

Resolves ultralytics#6664

* Update autoanchor.py

* Cleanup

* Update Dockerfile `torch==1.10.2+cu113` (ultralytics#6669)

* FROM nvcr.io/nvidia/pytorch:22.01-py3 (ultralytics#6670)

* FROM nvcr.io/nvidia/pytorch:21.10-py3 (ultralytics#6671)

22.10 returns 'no space left on device' error message.

Seems like a bug at docker. Raised issue in docker/hub-feedback#2209

* Update Dockerfile reorder installs (ultralytics#6672)

Also `nvidia-tensorboard-plugin-dlprof`, `nvidia-tensorboard` are no longer installed in NVCR base.

* FROM nvcr.io/nvidia/pytorch:21.10-py3 (ultralytics#6673)

Reordered installation may help reduce resource usage in autobuild

* FROM nvcr.io/nvidia/pytorch:21.10-py3 (ultralytics#6677)

Revert to 21.10 on autobuild fail

* Fix TF exports >= 2GB (ultralytics#6292)

* Fix exporting saved_model: pb exceeds 2GB

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Replace TF v1.x API with TF v2.x API for saved_model export

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Clean up

* Remove lambda in tf.function()

* Revert "Remove lambda in tf.function()" to be compatible with TF v2.4

This reverts commit 46c7931f11dfdea6ae340c77287c35c30b9e0779.

* Fix for pre-commit.ci

* Cleanup1

* Cleanup2

* Backwards compatibility update

* Update common.py

* Update common.py

* Cleanup3

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Glenn Jocher <glenn.jocher@ultralytics.com>

* Fix `--evolve --bucket gs://...` (ultralytics#6698)

* Fix CoreML P6 inference (ultralytics#6700)

* Fix CoreML P6 inference

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Fix floating point in number of workers `nw` (ultralytics#6701)

Integer division by a float yields a (rounded) float. This causes
the dataloader to crash when creating a range.

* Edge TPU inference fix (ultralytics#6686)

* refactor: use edgetpu flag

* fix: remove bitwise and assignation to tflite

* Cleanup and fix tflite

* Cleanup

Co-authored-by: Glenn Jocher <glenn.jocher@ultralytics.com>

* Use `export_formats()` in export.py (ultralytics#6705)

* Use `export_formats()` in export.py

* list fix

* Suppress `torch` AMP-CPU warnings (ultralytics#6706)

This is a torch bug, but they seem unable or unwilling to fix it so I'm creating a suppression in YOLOv5. 

Resolves ultralytics#6692

* Update `nw` to `max(nd, 1)` (ultralytics#6714)

* GH: add PR template (ultralytics#6482)

* GH: add PR template

* Update CONTRIBUTING.md

* Update PULL_REQUEST_TEMPLATE.md

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update PULL_REQUEST_TEMPLATE.md

* Update PULL_REQUEST_TEMPLATE.md

* Update PULL_REQUEST_TEMPLATE.md

* Update PULL_REQUEST_TEMPLATE.md

Co-authored-by: Glenn Jocher <glenn.jocher@ultralytics.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Switch default LR scheduler from cos to linear (ultralytics#6729)

* Switch default LR scheduler from cos to linear

Based on empirical results of training both ways on all YOLOv5 models.

* linear bug fix

* Updated VOC hyperparameters (ultralytics#6732)

* Update hyps

* Update hyp.VOC.yaml

* Update pathlib

* Update hyps

* Update hyps

* Update hyps

* Update hyps

* YOLOv5 v6.1 release (ultralytics#6739)

* Pre-commit table fix (ultralytics#6744)

* Update tutorial.ipynb (2 CPUs, 12.7 GB RAM, 42.2/166.8 GB disk) (ultralytics#6767)

* Update min warmup iterations from 1k to 100 (ultralytics#6768)

* Default `OMP_NUM_THREADS=8` (ultralytics#6770)

* Update tutorial.ipynb (ultralytics#6771)

* Update hyp.VOC.yaml (ultralytics#6772)

* Fix export for 1-channel images (ultralytics#6780)

Export failed for 1-channel input shape, 1-liner fix

* Update EMA decay `tau` (ultralytics#6769)

* Update EMA

* Update EMA

* ratio invert

* fix ratio invert

* fix2 ratio invert

* warmup iterations to 100

* ema_k

* implement tau

* implement tau

* YOLOv5s6 params FLOPs fix (ultralytics#6782)

* Update PULL_REQUEST_TEMPLATE.md (ultralytics#6783)

* Update autoanchor.py (ultralytics#6794)

* Update autoanchor.py

* Update autoanchor.py

* Update sweep.yaml (ultralytics#6825)

* Update sweep.yaml

Changed focal loss gamma search range between 1 and 4

* Update sweep.yaml

lowered the min value to match default

* AutoAnchor improved initialization robustness (ultralytics#6854)

* Update AutoAnchor

* Update AutoAnchor

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Add `*.ts` to `VID_FORMATS` (ultralytics#6859)

* Update `--cache disk` deprecate `*_npy/` dirs (ultralytics#6876)

* Updates

* Updates

* Updates

* Updates

* Updates

* Updates

* Updates

* Updates

* Updates

* Updates

* Cleanup

* Cleanup

* Update yolov5s.yaml (ultralytics#6865)

* Update yolov5s.yaml

* Update yolov5s.yaml

Co-authored-by: Glenn Jocher <glenn.jocher@ultralytics.com>

* Default FP16 TensorRT export (ultralytics#6798)

* Assert engine precision ultralytics#6777

* Default to FP32 inputs for TensorRT engines

* Default to FP16 TensorRT exports ultralytics#6777

* Remove wrong line ultralytics#6777

* Automatically adjust detect.py input precision ultralytics#6777

* Automatically adjust val.py input precision ultralytics#6777

* Add missing colon

* Cleanup

* Cleanup

* Remove default trt_fp16_input definition

* Experiment

* Reorder detect.py if statement to after half checks

* Update common.py

* Update export.py

* Cleanup

Co-authored-by: Glenn Jocher <glenn.jocher@ultralytics.com>

* Bump actions/setup-python from 2 to 3 (ultralytics#6880)

Bumps [actions/setup-python](https://github.com/actions/setup-python) from 2 to 3.
- [Release notes](https://github.com/actions/setup-python/releases)
- [Commits](actions/setup-python@v2...v3)

---
updated-dependencies:
- dependency-name: actions/setup-python
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Bump actions/checkout from 2 to 3 (ultralytics#6881)

Bumps [actions/checkout](https://github.com/actions/checkout) from 2 to 3.
- [Release notes](https://github.com/actions/checkout/releases)
- [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md)
- [Commits](actions/checkout@v2...v3)

---
updated-dependencies:
- dependency-name: actions/checkout
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Fix TRT `max_workspace_size` deprecation notice (ultralytics#6856)

* Fix TRT `max_workspace_size` deprecation notice

* Update export.py

* Update export.py

* Update bytes to GB with bitshift (ultralytics#6886)

* Move `git_describe()` to general.py (ultralytics#6918)

* Move `git_describe()` to general.py

* Move `git_describe()` to general.py

* PyTorch 1.11.0 compatibility updates (ultralytics#6932)

Resolves `AttributeError: 'Upsample' object has no attribute 'recompute_scale_factor'` first raised in ultralytics#5499

* Optimize PyTorch 1.11.0 compatibility update (ultralytics#6933)

* Allow 3-point segments (ultralytics#6938)

May resolve ultralytics#6931

* Fix PyTorch Hub export inference shapes (ultralytics#6949)

May resolve ultralytics#6947

* DetectMultiBackend() `--half` handling (ultralytics#6945)

* DetectMultiBackend() `--half` handling

* CI fixes

* rename .half to .fp16 to avoid conflict

* warmup fix

* val update

* engine update

* engine update

* Update Dockerfile `torch==1.11.0+cu113` (ultralytics#6954)

* New val.py `cuda` variable (ultralytics#6957)

* New val.py `cuda` variable

Fix for ONNX GPU val.

* Update val.py

* DetectMultiBackend() return `device` update (ultralytics#6958)

Fixes ONNX validation that returns outputs on CPU.

* Tensor initialization on device improvements (ultralytics#6959)

* Update common.py speed improvements

Eliminate .to() ops where possible for reduced data transfer overhead. Primarily affects warmup and PyTorch Hub inference.

* Updates

* Updates

* Update detect.py

* Update val.py

* EdgeTPU optimizations (ultralytics#6808)

* removed transpose op for better edgetpu support

* fix for training case

* enabled experimental new quantizer flag

* precalculate add and mul ops at compile time

Co-authored-by: Glenn Jocher <glenn.jocher@ultralytics.com>

* Model `ema` key backward compatibility fix (ultralytics#6972)

Fix for older model loading issue in ultralytics@d3d9cbc#commitcomment-68622388

* pt model to cpu on TF export

* YOLOv5 Export Benchmarks for GPU (ultralytics#6963)

* Add benchmarks.py GPU support

* Updates

* Updates

* Updates

* Updates

* Add --half

* Add TRT requirements

* Cleanup

* Add TF to warmup types

* Update export.py

* Update export.py

* Update benchmarks.py

* Update TQDM bar format (ultralytics#6988)

* Conditional `Timeout()` by OS (disable on Windows) (ultralytics#7013)

* Conditional `Timeout()` by OS (disable on Windows)

* Update general.py

* fix: add default PIL font as fallback  (ultralytics#7010)

* fix: add default font as fallback

Add default font as fallback if the downloading of the Arial.ttf font
fails for some reason, e.g. no access to public internet.

* Update plots.py

Co-authored-by: Maximilian Strobel <Maximilian.Strobel@infineon.com>
Co-authored-by: Glenn Jocher <glenn.jocher@ultralytics.com>

* Consistent saved_model output format (ultralytics#7032)

* `ComputeLoss()` indexing/speed improvements (ultralytics#7048)

* device as class attribute

* Update loss.py

* Update loss.py

* improve zeros

* tensor split

* Update Dockerfile to `git clone` instead of `COPY` (ultralytics#7053)

Resolves git command errors that currently happen in image, i.e.:

```bash
root@382ae64aeca2:/usr/src/app# git pull
Warning: Permanently added the ECDSA host key for IP address '140.82.113.3' to the list of known hosts.
git@github.com: Permission denied (publickey).
fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.
```

* Create SECURITY.md (ultralytics#7054)

* Create SECURITY.md

Resolves ultralytics#7052

* Move into ./github

* Update SECURITY.md

* Fix incomplete URL substring sanitation (ultralytics#7056)

Resolves code scanning alert in ultralytics#7055

* Use PIL to eliminate chroma subsampling in crops (ultralytics#7008)

* use pillow to save higher-quality jpg (w/o color subsampling)

* Cleanup and doc issue

Co-authored-by: Glenn Jocher <glenn.jocher@ultralytics.com>

* Fix `check_anchor_order()` in pixel-space not grid-space (ultralytics#7060)

* Update `check_anchor_order()`

Use mean area per output layer for added stability.

* Check in pixel-space not grid-space fix

* Update detect.py non-inplace with `y.tensor_split()` (ultralytics#7062)

* Update common.py lists for tuples (ultralytics#7063)

Improved profiling.

* Update W&B message to `LOGGER.info()` (ultralytics#7064)

* Update __init__.py (ultralytics#7065)

* Add non-zero `da` `check_anchor_order()` condition (ultralytics#7066)

* Fix2 `check_anchor_order()` in pixel-space not grid-space (ultralytics#7067)

Follows ultralytics#7060 which provided only a partial solution to this issue. ultralytics#7060 resolved occurences in yolo.py, this applies the same fix in autoanchor.py.

* Revert "Update detect.py non-inplace with `y.tensor_split()` (ultralytics#7062)" (ultralytics#7074)

This reverts commit d5e363f.

* Update loss.py with `if self.gr < 1:` (ultralytics#7087)

* Update loss.py with `if self.gr < 1:`

* Update loss.py

* Update loss for FP16 `tobj` (ultralytics#7088)

* Update model summary to display model name (ultralytics#7101)

* `torch.split()` 1.7.0 compatibility fix (ultralytics#7102)

* Update loss.py

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update loss.py

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Update benchmarks significant digits (ultralytics#7103)

* Model summary `pathlib` fix (ultralytics#7104)

Stems not working correctly for YOLOv5l with current .rstrip() implementation. After fix:
```
YOLOv5l summary: 468 layers, 46563709 parameters, 46563709 gradients, 109.3 GFLOPs
```

* Remove named arguments where possible (ultralytics#7105)

* Remove named arguments where possible

Speed improvements.

* Update yolo.py

* Update yolo.py

* Update yolo.py

* Multi-threaded VisDrone and VOC downloads (ultralytics#7108)

* Multi-threaded VOC download

* Update VOC.yaml

* Update

* Update general.py

* Update general.py

* `np.fromfile()` Chinese image paths fix (ultralytics#6979)

* 🎉 🆕 now can read Chinese image path. 

use "cv2.imdecode(np.fromfile(f, np.uint8), cv2.IMREAD_COLOR)" instead of "cv2.imread(f)" for Chinese image path.

* Update datasets.py

* Update __init__.py

Co-authored-by: Glenn Jocher <glenn.jocher@ultralytics.com>

* Add PyTorch Hub `results.save(labels=False)` option (ultralytics#7129)

Resolves ultralytics#388 (comment)

* SparseML integration

* Add SparseML dependancy

* Update: add missing files

* Update requirements.txt

* Update: sparseml-nightly support

* Update: remove model versioning

* Partial update for multi-stage recipes

* Update: multi-stage recipe support

* Update: remove sparseml dep

* Fix: multi-stage recipe handeling

* Fix: multi stage support

* Fix: non-recipe runs

* Add: legacy hyperparam files

* Fix: add copy-paste to hyps

* Fix: nit

* apply structure fixes

* Squashed rebase to v6.1 upstream

* Update SparseML Integration to V6.1 (#26)

* SparseML integration

* Add SparseML dependancy

* Update: add missing files

* Update requirements.txt

* Update: sparseml-nightly support

* Update: remove model versioning

* Partial update for multi-stage recipes

* Update: multi-stage recipe support

* Update: remove sparseml dep

* Fix: multi-stage recipe handeling

* Fix: multi stage support

* Fix: non-recipe runs

* Add: legacy hyperparam files

* Fix: add copy-paste to hyps

* Fix: nit

* apply structure fixes

* manager fixes

* Update function name

Co-authored-by: Konstantin <konstantin@neuralmagic.com>
Co-authored-by: Konstantin Gulin <66528950+KSGulin@users.noreply.github.com>
@0chandansharma
Copy link

0chandansharma commented May 17, 2022

I exported the model yolo5n.yaml, with a size 2.1MB,

While running this model into EDGE takes 9.3MB of space, WHY??

How I can reduce this memory since my EDGE device is restricted to 7MB ??

@zldrobit
Copy link
Contributor Author

@0chandansharma You could try to export an Edge TPU model with a smaller size (e.g. 320) as python export.py --weights yolov5n.pt --include edgetpu --img 320. This should reduce the memory footprint of the model inference.
Do you mean Edge TPU by EDGE device?

@0chandansharma
Copy link

@zldrobit Thanks for the response,
The result was after python export.py --weights yolov5n.pt --include edgetpu --img 320, I have tried with different image size as well, still it is taking 9MB .

@zldrobit
Copy link
Contributor Author

@0chandansharma When I use Edge TPU compiler v15, the exported model of yolov5n (python export.py --weights yolov5n.pt --include edgetpu --img 320) only uses 1.11MB for caching parameters.
image
I cannot figure out where the 9MB memory occupation is. Would you share your output message that takes up 9MB memory?

@glenn-jocher
Copy link
Member

Good news here, next release later in June should have significantly smaller models.

BjarneKuehl pushed a commit to fhkiel-mlaip/yolov5 that referenced this pull request Aug 26, 2022
* Add models/tf.py for TensorFlow and TFLite export

* Set auto=False for int8 calibration

* Update requirements.txt for TensorFlow and TFLite export

* Read anchors directly from PyTorch weights

* Add --tf-nms to append NMS in TensorFlow SavedModel and GraphDef export

* Remove check_anchor_order, check_file, set_logging from import

* Reformat code and optimize imports

* Autodownload model and check cfg

* update --source path, img-size to 320, single output

* Adjust representative_dataset

* Put representative dataset in tfl_int8 block

* detect.py TF inference

* weights to string

* weights to string

* cleanup tf.py

* Add --dynamic-batch-size

* Add xywh normalization to reduce calibration error

* Update requirements.txt

TensorFlow 2.3.1 -> 2.4.0 to avoid int8 quantization error

* Fix imports

Move C3 from models.experimental to models.common

* Add models/tf.py for TensorFlow and TFLite export

* Set auto=False for int8 calibration

* Update requirements.txt for TensorFlow and TFLite export

* Read anchors directly from PyTorch weights

* Add --tf-nms to append NMS in TensorFlow SavedModel and GraphDef export

* Remove check_anchor_order, check_file, set_logging from import

* Reformat code and optimize imports

* Autodownload model and check cfg

* update --source path, img-size to 320, single output

* Adjust representative_dataset

* detect.py TF inference

* Put representative dataset in tfl_int8 block

* weights to string

* weights to string

* cleanup tf.py

* Add --dynamic-batch-size

* Add xywh normalization to reduce calibration error

* Update requirements.txt

TensorFlow 2.3.1 -> 2.4.0 to avoid int8 quantization error

* Fix imports

Move C3 from models.experimental to models.common

* implement C3() and SiLU()

* Add TensorFlow and TFLite Detection

* Add --tfl-detect for TFLite Detection

* Add int8 quantized TFLite inference in detect.py

* Add --edgetpu for Edge TPU detection

* Fix --img-size to add rectangle TensorFlow and TFLite input

* Add --no-tf-nms to detect objects using models combined with TensorFlow NMS

* Fix --img-size list type input

* Update README.md

* Add Android project for TFLite inference

* Upgrade TensorFlow v2.3.1 -> v2.4.0

* Disable normalization of xywh

* Rewrite names init in detect.py

* Change input resolution 640 -> 320 on Android

* Disable NNAPI

* Update README.me --img 640 -> 320

* Update README.me for Edge TPU

* Update README.md

* Fix reshape dim to support dynamic batching

* Fix reshape dim to support dynamic batching

* Add epsilon argument in tf_BN, which is different between TF and PT

* Set stride to None if not using PyTorch, and do not warmup without PyTorch

* Add list support in check_img_size()

* Add list input support in detect.py

* sys.path.append('./') to run from yolov5/

* Add int8 quantization support for TensorFlow 2.5

* Add get_coco128.sh

* Remove --no-tfl-detect in models/tf.py (Use tf-android-tfl-detect branch for EdgeTPU)

* Update requirements.txt

* Replace torch.load() with attempt_load()

* Update requirements.txt

* Add --tf-raw-resize to set half_pixel_centers=False

* Remove android directory

* Update README.md

* Update README.md

* Add multiple OS support for EdgeTPU detection

* Fix export and detect

* Export 3 YOLO heads with Edge TPU models

* Remove xywh denormalization with Edge TPU models in detect.py

* Fix saved_model and pb detect error

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix pre-commit.ci failure

* Add edgetpu in export.py docstring

* Fix Edge TPU model detection exported by TF 2.7

* Add class names for TF/TFLite in DetectMultibackend

* Fix assignment with nl in TFLite Detection

* Add check when getting Edge TPU compiler version

* Add UTF-8 encoding in opening --data file for Windows

* Remove redundant TensorFlow import

* Add Edge TPU in export.py's docstring

* Add the detect layer in Edge TPU model conversion

* Default `dnn=False`

* Cleanup data.yaml loading

* Update detect.py

* Update val.py

* Comments and generalize data.yaml names

Co-authored-by: Glenn Jocher <glenn.jocher@ultralytics.com>
Co-authored-by: unknown <fangjiacong@ut.cn>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.