[RFC] Support YOLOX detection model

### 🚀 The feature

YOLO aka. You Only Look Once, which is a vibrant series of object detection models since the release of Joseph Redmon [You Only Look Once: Unified, Real-Time Object Detection](https://arxiv.org/abs/1506.02640).

So far a couple of more notable implementations are as follows (all PyTorch):

- [YOLOv3](https://arxiv.org/abs/1804.02767) 2018. Cited by 14369 [^yolov3-ultralytics][^yolov3-mmdet]
- [YOLOv4](https://arxiv.org/abs/2004.10934) 2020. Cited by 4905
- YOLOv5 2020. Starred at GitHub 29.4k [^yolov5-ultralytics]
- [ ] [YOLOX](https://arxiv.org/abs/2107.08430) 2021. Cited by 299 [^yolox]
- [YOLOv7](https://arxiv.org/abs/2207.02696) 2022. [^yolov7]

### Motivation, pitch

Until now, one of the most successful ones is probably YOLOv5. YOLOv5 is great, and they have also built up a very friendly community and ecosystem. We don't intend to copy YOLOv5 into TorchVision, our main goal here is to make training SoTA models easier and share reusable subcomponents to build the next SoTA models in the same/proxy family.[^yolo-keras]

YOLOX is a high-performance anchor-free YOLO, and it has a good balance in terms of copyright and code quality, it's enough to have a YOLOX implementation from the community's perspective.

### The License

YOLO{v5/v7} are built under the [GPL-3.0 license](https://github.com/ultralytics/yolov5/blob/master/LICENSE), and YOLOX is built under the [Apache-2.0 license](https://github.com/Megvii-BaseDetection/YOLOX/blob/main/LICENSE).

### More context

I have previously rewritten the code used in the inference part of YOLOv5 according to the style and specification of torchvision[^yolort], and I can relicense that part to [BSD-3-Clause license](https://github.com/pytorch/vision/blob/main/LICENSE). The amount of work involved in the model inference part is not much with the help of YOLOX base code.

Data augmentation and a new trainer engine will be the core of what we will do here.

The data augmentation section is in the planning list https://github.com/pytorch/vision/issues/6224 , and we have already merged some augmentation methods like https://github.com/pytorch/vision/pull/5825 , I think it would help us to build the next SoTA models with a new primitives like classification models.[^classification-primitives]

- [Mosaic](https://github.com/ultralytics/yolov5/blob/2e10909905b1e0e7eb7bac086600fe7ee2c0e6a5/utils/dataloaders.py#L691)
- [augment_hsv](https://github.com/ultralytics/yolov5/blob/0669f1b27bbdcbdbb0e2baf4e9f09c6fc8337ec7/utils/augmentations.py#L47)
- [random_perspective](https://github.com/ultralytics/yolov5/blob/0669f1b27bbdcbdbb0e2baf4e9f09c6fc8337ec7/utils/augmentations.py#L124)
- [mixup](https://github.com/ultralytics/yolov5/blob/0669f1b27bbdcbdbb0e2baf4e9f09c6fc8337ec7/utils/augmentations.py#L271)
- [random_flip, left-right](https://github.com/ultralytics/yolov5/blob/0669f1b27bbdcbdbb0e2baf4e9f09c6fc8337ec7/utils/dataloaders.py#L648)

As TorchVision adds more and more models, it may be time to abstract out a simple trainer engine for sharing reusable subcomponents. It might be more appropriate to open a new thread for necessity and specific steps about this part.

[^yolov3-ultralytics]: https://github.com/ultralytics/yolov3/tree/v9.1
[^yolov3-mmdet]: https://github.com/open-mmlab/mmdetection/tree/master/configs/yolo
[^yolov5-ultralytics]: https://github.com/ultralytics/yolov5
[^yolox]: https://github.com/Megvii-BaseDetection/YOLOX
[^yolov7]: https://github.com/WongKinYiu/yolov7
[^yolo-keras]: https://github.com/keras-team/keras-cv/issues/622#issuecomment-1198063712
[^yolort]: https://github.com/zhiqwang/yolov5-rt-stack/tree/main/yolort/models
[^classification-primitives]: https://pytorch.org/blog/how-to-train-state-of-the-art-models-using-torchvision-latest-primitives/

cc @datumbox @YosuaMichael @oke-aditya 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[RFC] Support YOLOX detection model #6341

🚀 The feature

Motivation, pitch

The License

More context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[RFC] Support YOLOX detection model #6341

Description

🚀 The feature

Motivation, pitch

The License

More context

Footnotes

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions