Skip to content

C++ object detection inference from video or image input source

License

Notifications You must be signed in to change notification settings

M4D-AI/object-detection-inference

Β 
Β 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

398 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Object Detection Inference

License: MIT C++20

C++ framework for real-time object detection, supporting multiple deep learning backends and input sources. Run state-of-the-art object detection models (YOLOv4-11, RT-DETR) on video streams, video files, or images with configurable hardware acceleration.

πŸš€ Key Features

  • Multiple model support (YOLO series from YOLOv4 to YOLO11, RT-DETR)
  • Switchable inference backends (OpenCV DNN, ONNX Runtime, TensorRT, Libtorch, OpenVINO, Libtensorflow)
  • Real-time video processing with GStreamer integration
  • GPU acceleration support
  • Docker deployment ready
  • Benchmarking tools included

πŸ”§ Requirements

Core Dependencies

  • CMake (β‰₯ 3.15)
  • C++17 compiler (GCC β‰₯ 8.0)
  • OpenCV (β‰₯ 4.6)
    apt install libopencv-dev
  • Google Logging (glog)
    apt install libgoogle-glog-dev

Fetched Dependencies

The project automatically fetches and builds the following dependencies using CMake's FetchContent:

VideoCapture Library (Only for the App module, not the library)

FetchContent_Declare(
    VideoCapture
    GIT_REPOSITORY https://github.com/olibartfast/videocapture
    GIT_TAG main
)
  • Handles video input processing
  • Provides unified interface for various video sources
  • Optional GStreamer integration
FetchContent_Declare(
    InferenceEngines
    GIT_REPOSITORY https://github.com/olibartfast/inference-engines
    GIT_TAG main
)
  • Provides abstraction layer for multiple inference backends
  • Supported backends:
    • OpenCV DNN Module (default)
    • ONNX Runtime
    • LibTorch
    • TensorRT
    • OpenVINO
    • LibTensorflow

πŸ— Building

Complete Build (Shared Lib + App)

mkdir build && cd build
cmake -DDEFAULT_BACKEND=<backend> -DBUILD_ONLY_LIB=OFF -DCMAKE_BUILD_TYPE=Release ..
cmake --build .

# With GStreamer support
cmake -DDEFAULT_BACKEND=<backend> -DBUILD_ONLY_LIB=OFF -DUSE_GSTREAMER=ON -DCMAKE_BUILD_TYPE=Release ..

Library-Only Build

mkdir build && cd build
cmake -DBUILD_ONLY_LIB=ON -DDEFAULT_BACKEND=<backend>  -DCMAKE_BUILD_TYPE=Release ..
cmake --build .
  • Replace <backend> with one of the following:
    • OPENCV_DNN (default)
    • ONNX_RUNTIME
    • LIBTORCH
    • TENSORRT
    • OPENVINO
    • LIBTENSORFLOW
  • Note: If the backend package is not installed on your system, set the path manually in the backend's CMake module (i.e. for Libtorch modify Libtorch.cmake or pass the argument Torch_DIR, for onnx-runtume modify ONNXRuntime.cmake or pass the argument ORT_VERSION, same apply to other backend local packages)

Test Builds

# App tests
cmake -DENABLE_APP_TESTS=ON ..

# Library tests
cmake -DENABLE_DETECTORS_TESTS=ON ..

πŸ’» App Usage

Command Line Options

./object-detection-inference \
    --type=<model_type> \
    --source=<input_source> \
    --labels=<labels_file> \
    --weights=<model_weights> \
    [--config=<model_config>] \
    [--min_confidence=<threshold>] \
    [--use-gpu] \
    [--warmup] \
    [--benchmark]

Required Parameters

  • --type=<model type>: Specifies the type of object detection model to use. Possible values include yolov4, yolov5, yolov6, yolov7, yolov8, yolov9, yolov10, yolo11,rtdetr, and rtdetrul. Choose the appropriate model based on your requirements.

  • --source=<source>: Defines the input source for the object detection. It can be:

    • A live feed URL, e.g., rtsp://cameraip:port/somelivefeed
    • A path to a video file, e.g., path/to/video.format
    • A path to an image file, e.g., path/to/image.format
  • --labels=<path/to/labels/file>: Specifies the path to the file containing the class labels. This file should list the labels used by the model, each label on a new line.

  • --weights=<path/to/model/weights>: Defines the path to the file containing the model weights. T

Optional Parameters

  • [--config=<path/to/model/config>]: (Optional) Specifies the path to the model configuration file. This file contains the model architecture and other configurations necessary for setting up the inference. This parameter is primarily needed if the model is from the OpenVINO backend.

  • [--min_confidence=<confidence value>]: (Optional) Sets the minimum confidence threshold for detections. Detections with a confidence score below this value will be discarded. The default value is 0.25.

  • [--use-gpu]: (Optional) Activates GPU support for inference. This can significantly speed up the inference process if a compatible GPU is available.

  • [--warmup]: (Optional) Enables GPU warmup. Warming up the GPU before performing actual inference can help achieve more consistent and optimized performance. This parameter is relevant only if the inference is being performed on an image source.

  • [--benchmark]: (Optional) Enables benchmarking mode. In this mode, the application will run multiple iterations of inference to measure and report the average inference time. This is useful for evaluating the performance of the model and the inference setup. This parameter is relevant only if the inference is being performed on an image source.

To check all available options:

./object-detection-inference --help

Common Use Case Examples

# YOLOv8 Onnx Runtime image processing
./object-detection-inference \
    --type=yolov8 \
    --source=image.png \
    --weights=models/yolov8s.onnx \
    --labels=data/coco.names

# YOLOv8s TensorRT video processing
./object-detection-inference \
    --type=yolov8 \
    --source=video.mp4 \
    --weights=models/yolov8s.engine \
    --labels=data/coco.names \
    --min_confidence=0.4

# RTSP stream processing using rtdetr ultralytics implementation
./object-detection-inference \
    --type=rtdetrul \
    --source="rtsp://camera:554/stream" \
    --weights=models/rtdetr-l.onnx \
    --labels=data/coco.names \
    --use-gpu

🐳 Docker Deployment

Building Images

Inside the project, in the Dockerfiles folder, there will be a dockerfile for each inference backend (currently onnxruntime, libtorch, tensorrt, openvino)

# Build for specific backend
docker build --rm -t object-detection-inference:<backend_tag>  \
    -f docker/Dockerfile.backend .

Running Containers

Replace the wildcards with your desired options and paths:

docker run --rm \
    -v<path_host_data_folder>:/app/data \
    -v<path_host_weights_folder>:/weights \
    -v<path_host_labels_folder>:/labels \
    object-detection-inference:<backend_tag> \
    --type=<model_type> \
    --weights=<weight_according_your_backend> \
    --source=/app/data/<image_or_video> \
    --labels=/labels/<labels_file>

For GPU support, add --gpus all to the docker run command.

πŸ—Ί Project Structure

.
β”œβ”€β”€ app/            # Main application
β”œβ”€β”€ detectors/      # Detection library
β”œβ”€β”€ cmake/          # CMake modules
└── docker/         # Dockerfiles

πŸ“š Additional Resources

⚠️ Known Limitations

  • Models with dynamic axes not fully supported
  • Windows builds not currently supported
  • Some model/backend combinations may require specific export configurations

πŸ™ Acknowledgments

πŸ“« Support

  • Open an issue for bug reports or feature requests
  • Check existing issues for solutions to common problems

About

C++ object detection inference from video or image input source

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • C++ 78.9%
  • CMake 21.1%