This repo provide you easy way to convert yolov5 model by ultralitics to TensorRT and fast inference wrapper.
Code has minimal depenencies - PyCuda and TensorRT for model inference and Numpy for NMS (No PyTorch code!).
TensorRT - is a toolset, that contains model optimizer and high performance inference runtime.
- Nvidia Jetson platform (I'm testsed on Jetson Nano 2gb)
- Nvidia JetPack (I'm stested on Jetpack 4.5.1)
- USB wevcamera for webcamera demo
The table provide number of frame per second for Jetson Nano 2GB.
model / image_size | 256 | 320 | 640 |
---|---|---|---|
yolov5s + fp16 | - | 25 fps | 9 fps |
yolov5m + fp16 | - | - | - |
yolov5l + fp16 | - | - | - |
yolov5x + fp16 | - | - | - |
yolov5s | - | 20 fps | 7 fps |
yolov5m | - | - | - |
yolov5l | - | - | - |
yolov5x | - | - | - |
Process of model convertation to TensorRT looks like: Pytorch -> ONNX -> TensorRT.
Ultralitics repo already provide tool for convertation yolo to ONNX, please follow this recipe.
After that you need to use trtexec
tool, my docker container includes builded trtexec. You can use it just by pulling the container.
JetPack already includes nvidia docker, you does need to install additional sofrware to run exampels.
- Pool docker container or build (See build section):
docker pull alxmamaev/jetson_yolov5_trt:latest
- Run
docker run --runtime nvidia -v /path/to/dir/with/model/:/models/ --rm alxmamaev/jetson_yolov5_trt:latest trtexec --onnx=/models/model_name.onnx --saveEngine=/models/model_name.plan - -fp16
- Provide directory with your model after
-v
option, this dir will be shared between container and the host. - Also replace
model_name
by name of your model file - TensorRT model will be saved at path that sets in
--saveEngine
option - If you want to know more convertion options call trtexec with
--help
option
- Provide directory with your model after
Note: trtexec has --int8
option, thats allows you to quantize model into 8-bit integer. Usually it's speedup inference (Read More). But nvidia Jetson Nano, does not support int8 inferece, inference will be slowdown with this option, but if you have nvidia xavier, you can check it, because xavier supports int8.
- Pool docker container or build it (See build section):
docker pull alxmamaev/jetson_yolov5_trt:latest
(if you not pull it yet) - Allow docker use Xserver for drawing window with dertections:
xhost +
- Check what is your webcamera device index, by
find /dev -name video\*
and find files like/dev/video0
- Run webcam demo:
docker run --rm --net=host --runtime nvidia -e DISPLAY=$DISPLAY -v /tmp/.X11-unix/:/tmp/.X11-unix --device=/dev/video0 -v /path/to/data:/data alxmamaev/jetson_yolov5_trt:latest yolov5_detect.py /data/model_name.plan --source 0
--rm
says remove container after exist--device
says what the device you want provide inside docker container, in this case/dev/video0
means webcamera device--rm will
delete the container when finished--runtime
nvidia will use the NVIDIA container runtime/path/to/data
- is a path to directory with a model file- if you want to check model detection on the images or on the video use
--source=/path/to/video.mp4
or--source=/path/to/image1.jpg,/path/to/image2.jpg
- Run
yolov5_detect.py --help
to get more options
import cv2
from yolov5_trt import Yolov5TRTWrapper
labels = [...] # List of class names for your model
conf_th = 0.25 # Confidence threshold
wrapper = Yolov5TRTWrapper(args.engine_path, labels=labels, conf_thresh=conf_th) # See additional options in trt/examples/yolov5_detect.py
for image, bboxes in wrapper.detect_from_webcam(0): # Gets detection and image from the usb camera with id 0
image = wrapper.draw_detections(image, bboxes) # Drawing bboxes on the image
# Show detections in the window with name "demo"
cv2.imshow("demo", image)
if cv2.waitKey(1) == 27:
break
import cv2
from yolov5_trt import Yolov5TRTWrapper
labels = [...] # List of class names for your model
conf_th = 0.25 # Confidence threshold
wrapper = Yolov5TRTWrapper(args.engine_path, labels=labels, conf_thresh=conf_th) # See additional options in trt/examples/yolov5_detect.py
video = cv2.VideoCapture("video.mp4") # Opening video file
for image, bboxes in wrapper.detect_from_video(video): # Gets detection and image from the video
image = wrapper.draw_detections(image, bboxes) # Drawing bboxes on the image
# Show detections in the window with name "demo"
cv2.imshow("demo", image)
if cv2.waitKey(1) == 27:
break
import cv2
from yolov5_trt import Yolov5TRTWrapper
labels = [...] # List of class names for your model
conf_th = 0.25 # Confidence threshold
wrapper = Yolov5TRTWrapper(args.engine_path, labels=labels, conf_thresh=conf_th) # See additional options in trt/examples/yolov5_detect.py
images_paths = [...] # Path to images
images = (cv2.imread(image_path) for image_path in images_paths)
for image, bboxes in wrapper.detect_from_itterator(video): # Gets detection and image from the itterator
image = wrapper.draw_detections(image, bboxes) # Drawing bboxes on the image
# Show detections in the window with name "demo"
cv2.imshow("demo", image)
cv2.waitKey()
Note: in the streaming tasks you does not need to process batches with size more than 1. Because in streaming latency is more important than throughput.
Note2: Be careful, the size of the input batch must be less than or equal to the maximum batch size specified during conversion
import cv2
from yolov5_trt import Yolov5TRTWrapper
labels = [...] # List of class names for your model
conf_th = 0.25 # Confidence threshold
wrapper = Yolov5TRTWrapper(args.engine_path, labels=labels, conf_thresh=conf_th) # See additional options in trt/examples/yolov5_detect.py
images_paths = [...] # Path to images
images_batch = [cv2.imread(image_path) for image_path in images_paths] # read images batch
session = wrapper.create_session()
with session:
bboxes = wrapper.detect_from_batch_images(images_batch, session): # Gets detection from images batch.
for b, img in zip(bboxes, images_batch)
image = wrapper.draw_detections(img, b) # Drawing bboxes on the image
# Show detections in the window with name "demo"
cv2.imshow("demo", image)
cv2.waitKey()
Session creating takes a some time, for high performance, process all batch inside with session
block (befor session closing).
Building possible only in nvidia runtime, to setting up nvidia runtime as default, edit /etc/docker/daemon.json
file.
{
"runtimes": {
"nvidia": {
"path": "/usr/bin/nvidia-container-runtime",
"runtimeArgs": []
}
},
"default-runtime": "nvidia"
}
And restart docker by sudo systemctl restart docker
. After that build docker container docker build . -t yolo5_trt:latest