Batch Detection

### Search before asking

- [X] I have searched the YOLOv5 [issues](https://github.com/ultralytics/yolov5/issues) and found no similar bug report.


### YOLOv5 Component

_No response_

### Bug

Hello everyone, 
I am trying tiling methods so what I am trying to do is get an image, split it into patches and batch-detect objects on those images  but instead there is much more delay instead. I don't know what I am doing wrong but nothing compares to any batched inference times that are mentioned in the documentations. I am getting like 20 fps with an image inference of size 1080p (using yolov5s custom trained model) and when splitting the image into 15 patches I am getting 5 FPS. 
Things I tried:
-  using both pytorch hub with ''ultralytics/yolov5" and with local repo
- using the code in detect.py of Yolov5 
- I've tried stacking the images into a tensor, instead of a tuple, before passing them in the model(imgs) but then it returns a tensor of size [16128, 9 ] for each image instead of pandas. In order to get the actual results for each image I need to call the nms function on the tensor [batch,16128, 9] and results to a huge delay.
All these result to the same fps and yes, I am running using a GPU (RTX 2070)

Another example i've tried, is to split a 4k image into 60 patches of 512 x 512 and detect them with the pytorch hub example as a tuple. I am getting these results as per performance
`Speed: 5.5ms pre-process, 56.4ms inference, 0.8ms NMS per image at shape (60, 3, 640, 640)
` 
but it actually needed 3.7 seconds to run. So, the  speeds are misleading because they represent the inference per image which makes no sense since I wanted batched inference.

Please help me specify if I am doing something wrong or if it is normal having these results and I should stop trying to find a solution to my issue.
Thank you in advance.

### Environment

I am using a custom made Docker that includes:
- nvcr.io/nvidia/tensorrt:21.05-py3
- OpenCV v4.5.3 build with Cuda
- torch==1.10.0+cu113 torchvision==0.11.1+cu113 torchaudio==0.10.0+cu113 -f https://download.pytorch.org/whl/cu113/torch_stable.html
- yolov5 requirements

### Minimal Reproducible Example

_No response_

### Additional

_No response_

### Are you willing to submit a PR?

- [ ] Yes I'd like to help by submitting a PR!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Batch Detection #7683

Search before asking

YOLOv5 Component

Bug

Environment

Minimal Reproducible Example

Additional

Are you willing to submit a PR?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Batch Detection #7683

Description

Search before asking

YOLOv5 Component

Bug

Environment

Minimal Reproducible Example

Additional

Are you willing to submit a PR?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions