Video Object Detection Project

Introduction

The Video Object Detection Project is designed specifically for detecting persons in video sequences. The current implementation is focused only on person detection, and the models are trained and calibrated to recognize and track human figures. This project integrates multiple models, including YOLOv8, Faster R-CNN, and SSD, providing flexibility and robustness in detecting persons within video frames. It offers a user-friendly graphical interface, efficient video processing capabilities, detailed evaluation metrics, and intuitive visualizations of detection results.

Example Video Output

Here is an example of the output generated by the model "Faster R-CNN", showcasing person detection:

Architecture

The project is structured into several modular components to ensure scalability, maintainability, and ease of integration. The core components include:

Person Detection Focus: The current version of the project is designed to detect persons specifically, with future updates planned to add additional object categories.
Graphical User Interface (GUI): Allows users to select video sequences and choose the desired detection models.
Main Application: Orchestrates the workflow by handling video processing, model integration, evaluation, and visualization.
Video Processing Module: Utilizes OpenCV for reading and writing video frames.
Model Integration Module: Integrates various object detection models such as YOLOv8, Faster R-CNN, and SSD.
Evaluation Module: Computes performance metrics using TorchMetrics and manages data with Pandas.
Visualization Module: Generates plots of metrics using Matplotlib and displays progress bars with tqdm.
Output Directory: Stores annotated videos, metrics tables, and visualization images for further analysis.

Project Architecture

Here is the high-level architecture of the project, represented by a PlantUML diagram:

Features

Multi-Model Support: Seamlessly integrates YOLOv8, Faster R-CNN, and SSD for versatile object detection.
User-Friendly GUI: Intuitive interface for selecting video sequences and models.
Efficient Video Processing: Fast frame extraction and annotation using OpenCV.
Detailed Evaluation Metrics: Computes mAP, Precision, Recall, F1 Score, IoU, and more.
Comprehensive Visualizations: Generates tables and confusion matrices to visualize performance metrics.
Automatic Calibration: Option to calibrate confidence thresholds using linear or binary search methods.
Progress Monitoring: Real-time progress bars to track processing status.

Models

YOLOv8 Model

The YOLOv8 model (yolo.py) leverages the Ultralytics YOLOv8 framework for real-time object detection. It is optimized for speed and accuracy, making it suitable for applications requiring rapid inference.

Faster R-CNN Model

The Faster R-CNN model (custom_faster_rcnn.py) is implemented using PyTorch. This model excels in detecting objects with high precision by employing a region proposal network to identify potential object locations before classification.

SSD Model

The SSD300 model (ssd.py) utilizes the Single Shot MultiBox Detector (SSD) architecture with a VGG16 backbone. SSD balances speed and accuracy, making it effective for detecting objects in various scales within video frames.

Installation

Important

Follow these instructions to set up and run the video object detection project.

Prerequisites

Python 3.10
Git

Clone the Repository

git clone https://github.com/Danielkis97/video-object-detection.git
cd video-object-detection

Create a Virtual Environment

It's recommended to use a virtual environment to manage dependencies.

python -m venv venv
source venv/bin/activate

Alternative for different Operating Systems(OS)

On Windows:

.venv\Scripts\activate

On Unix/macOS:

source .venv/bin/activate

Install Dependencies

Install the required Python packages using pip.

pip install -r requirements.txt

Upgrade PIP(Optional)

If asked for, upgrade the pip.

 python.exe -m pip install --upgrade pip

Note

If the "Assets" and "BestVideoResult" folders are also cloned, you can ignore them.

Additional Setup

1. Ultralytics YOLOv8:

Ensure you have access to the YOLOv8 weights. You can download pre-trained weights from the Ultralytics repository.

2. Faster R-CNN and SSD:

These models utilize pre-trained weights available through PyTorch's model zoo. Ensure you have an active internet connection for the initial download.

3. MOT17 Dataset:

Download the MOT17 Dataset and extract it. Place the dataset in the data folder located in the project's root directory. Here's an example of how your directory structure should look:

mkdir -p data/MOT17
# Place the extracted MOT17 dataset files here

Ensure the directory structure is as follows:

videodetectionproject/
│
├── custom_models/
│   └── __init__.py
│   └── custom_faster_rcnn.py
│   └── ssd.py
│   └── yolo.py
│   └── yolov8s.pt
├── utils/
│   └── __init__.py
│   └── video_processing.py
│   └── visualization.py
├── data/
│   └── MOT17/
│       ├── train/
│       └── test/
│
├── requirements.txt
├── main.py
...

Graphical User Interface (GUI)

The application features a user-friendly Graphical User Interface (GUI) built with tkinter. The GUI allows you to select video sequences, configure models, and monitor the progress of processing.

GUI Overview

Main Components of the GUI

Sequence Selection:
- Add Sequences: Browse and select directories containing your video sequences.
- Remove Selected: Remove selected sequences from the list.
- Clear List: Clear all sequences from the list.
Model Selection:
- Select one or more models for object detection (e.g., YOLOv8s, Faster R-CNN, SSD).
Parameters:
- Automatic Calibration: Enable or disable the automatic calibration of confidence thresholds.
- Confidence Threshold: Manually set the confidence threshold if automatic calibration is disabled.
- Calibration Method: Choose between Linear Search (more accurate) and Binary Search (faster).
Control Buttons:
- Start Processing: Begin processing the selected sequences with the chosen models.
- Stop Processing: Stop the processing at any time.
Progress Bar:
- Monitor the progress of processing in real-time.
Status Display:
- View status messages and updates during processing.
Evaluation Results:
- After processing, detailed performance metrics are displayed in this section.

Using the GUI

Select Video Sequences:
- Click on "Add Sequences" to browse and select directories containing your video sequences.
- Supported format is the folder structure of the MOT17 dataset sequences.
Choose Detection Models:
- Check the boxes next to the desired models (e.g., YOLOv8s, Faster R-CNN, SSD).
Configure Parameters:
- Enable "Automatic Calibration" to let the application determine optimal confidence thresholds.
- Choose the calibration method: Linear Search (more accurate) or Binary Search (faster).
- If automatic calibration is disabled, manually set the desired confidence threshold.
Start Processing:
- Click on "Start Processing" to begin object detection on the selected sequences.
- Monitor the progress through the progress bar and status display.
Stop Processing:
- Click on "Stop Processing" to terminate the process at any time.
View Results:
- Upon completion, annotated videos, metrics tables, and confusion matrix images will be saved in the output directory.
- Evaluation results are displayed within the application under the "Evaluation Results" section.

Evaluation

The application computes a range of performance metrics to evaluate the effectiveness of the object detection models.

Metrics Computed

Mean Average Precision (mAP): Measures the accuracy of the model in predicting bounding boxes and classifying objects.
Precision: The ratio of correctly predicted positive observations to the total predicted positives.
Recall: The ratio of correctly predicted positive observations to all actual positives.
F1 Score: The weighted average of Precision and Recall.
Intersection over Union (IoU): Measures the overlap between the predicted bounding box and the ground truth.
Processing Time: Total time taken to process the video sequences.
Average Confidence: The average confidence score of the detections.

Evaluation Process

Calibration (Optional): If enabled, the application calibrates the confidence threshold to optimize mAP.
Detection: Each frame is processed to detect objects using the selected models.
Metrics Calculation: Performance metrics are computed based on the detections and ground truth annotations.
Visualization: Metrics are visualized through tables and confusion matrices for easy interpretation.

Visualization

The project provides comprehensive visualizations to aid in understanding the detection performance.

Metrics Table

A table summarizing all the computed metrics is generated and saved as an image. This table includes mAP, Precision, Recall, F1 Score, IoU, Processing Time, Average Confidence, TP, FP, and FN.

Confusion Matrix

A confusion matrix image displays the counts of True Positives (TP), False Positives (FP), and False Negatives (FN), providing insight into the model's detection capabilities.

Annotated Videos

Processed videos with bounding boxes and labels drawn around detected objects are saved for visual inspection.

Results and Metrics

In this section, the results of the evaluation using different calibration methods will be presented (Automatic and Manual). These methods were applied on three videos from the MOT17 Dataset, specifically Videos 02, 04, and 05, and the models evaluated include Faster R-CNN, SSD300, and YOLOv8.

1. Binary Search Automatic Calibration

Metrics Table:

Confusion Matrix:

2. Linear Search Automatic Calibration

Metrics Table:

Confusion Matrix:

3. Manual Calibration with 0.1 as Threshold

Metrics Table:

Confusion Matrix:

4. Manual Calibration with 0.95 as Threshold

Metrics Table:

Confusion Matrix:

These tables and confusion matrices provide a detailed overview of the performance of different object detection models under various calibration methods.

Possible Bugs and Solutions

1. Stopping Sequences Issue

Bug: If the Stop Sequence button is used during processing and the program is not restarted, it can cause errors in subsequent runs.

Solution: After stopping a sequence or finishing a sequence process, ensure that the program is restarted before running a new sequence. This prevents any lingering state issues.

2. Video Format Issues

Bug: If unsupported video formats are used, the application may fail to process the videos.

Solution: Make sure to use only MP4 format, as this is the only format currently supported.

License

This project is licensed under the MIT License.

Acknowledgments

Ultralytics YOLOv8: For providing an efficient and accurate object detection framework.
PyTorch: For its robust deep learning library that facilitates model training and inference.
OpenCV: For powerful video processing capabilities.
TorchMetrics: For comprehensive evaluation metrics.
Matplotlib & tqdm: For effective data visualization and progress tracking.
MOT17 Dataset: For providing ground truth annotations used in evaluation.

Development Environment

The code for this project was developed using PyCharm, which offers a powerful IDE for Python development.

Happy Detecting! 🚀

Name		Name	Last commit message	Last commit date
Latest commit History 48 Commits
BestVideoResult		BestVideoResult
assets		assets
custom_models		custom_models
utils		utils
.gitignore		.gitignore
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

Danielkis97/Video-Object-Detection

Folders and files

Latest commit

History

Repository files navigation

Video Object Detection Project

Introduction

Example Video Output

Architecture

Project Architecture

Features

Models

YOLOv8 Model

Faster R-CNN Model

SSD Model

Installation

Prerequisites

Clone the Repository

Create a Virtual Environment

Alternative for different Operating Systems(OS)

On Windows:

On Unix/macOS:

Install Dependencies

Upgrade PIP(Optional)

Additional Setup

1. Ultralytics YOLOv8:

2. Faster R-CNN and SSD:

3. MOT17 Dataset:

Graphical User Interface (GUI)

GUI Overview

Main Components of the GUI

Using the GUI

Evaluation

Metrics Computed

Evaluation Process

Visualization

Metrics Table

Confusion Matrix

Annotated Videos

Results and Metrics

1. Binary Search Automatic Calibration

Metrics Table:

Confusion Matrix:

2. Linear Search Automatic Calibration

Metrics Table:

Confusion Matrix:

3. Manual Calibration with 0.1 as Threshold

Metrics Table:

Confusion Matrix:

4. Manual Calibration with 0.95 as Threshold

Metrics Table:

Confusion Matrix:

Possible Bugs and Solutions

1. Stopping Sequences Issue

2. Video Format Issues

License

Acknowledgments

Development Environment

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Packages 0

Uh oh!

Languages

Packages