The Video Object Detection Project is designed specifically for detecting persons in video sequences. The current implementation is focused only on person detection, and the models are trained and calibrated to recognize and track human figures. This project integrates multiple models, including YOLOv8, Faster R-CNN, and SSD, providing flexibility and robustness in detecting persons within video frames. It offers a user-friendly graphical interface, efficient video processing capabilities, detailed evaluation metrics, and intuitive visualizations of detection results.
Here is an example of the output generated by the model "Faster R-CNN", showcasing person detection:
The project is structured into several modular components to ensure scalability, maintainability, and ease of integration. The core components include:
- Person Detection Focus: The current version of the project is designed to detect persons specifically, with future updates planned to add additional object categories.
- Graphical User Interface (GUI): Allows users to select video sequences and choose the desired detection models.
- Main Application: Orchestrates the workflow by handling video processing, model integration, evaluation, and visualization.
- Video Processing Module: Utilizes OpenCV for reading and writing video frames.
- Model Integration Module: Integrates various object detection models such as YOLOv8, Faster R-CNN, and SSD.
- Evaluation Module: Computes performance metrics using TorchMetrics and manages data with Pandas.
- Visualization Module: Generates plots of metrics using Matplotlib and displays progress bars with tqdm.
- Output Directory: Stores annotated videos, metrics tables, and visualization images for further analysis.
Here is the high-level architecture of the project, represented by a PlantUML diagram:
- Multi-Model Support: Seamlessly integrates YOLOv8, Faster R-CNN, and SSD for versatile object detection.
- User-Friendly GUI: Intuitive interface for selecting video sequences and models.
- Efficient Video Processing: Fast frame extraction and annotation using OpenCV.
- Detailed Evaluation Metrics: Computes mAP, Precision, Recall, F1 Score, IoU, and more.
- Comprehensive Visualizations: Generates tables and confusion matrices to visualize performance metrics.
- Automatic Calibration: Option to calibrate confidence thresholds using linear or binary search methods.
- Progress Monitoring: Real-time progress bars to track processing status.
The YOLOv8 model (yolo.py
) leverages the Ultralytics YOLOv8 framework for real-time object detection. It is optimized for speed and accuracy, making it suitable for applications requiring rapid inference.
The Faster R-CNN model (custom_faster_rcnn.py
) is implemented using PyTorch. This model excels in detecting objects with high precision by employing a region proposal network to identify potential object locations before classification.
The SSD300 model (ssd.py
) utilizes the Single Shot MultiBox Detector (SSD) architecture with a VGG16 backbone. SSD balances speed and accuracy, making it effective for detecting objects in various scales within video frames.
Important
Follow these instructions to set up and run the video object detection project.
- Python 3.10
- Git
git clone https://github.com/Danielkis97/video-object-detection.git
cd video-object-detection
It's recommended to use a virtual environment to manage dependencies.
python -m venv venv
source venv/bin/activate
.venv\Scripts\activate
source .venv/bin/activate
Install the required Python packages using pip.
pip install -r requirements.txt
If asked for, upgrade the pip.
python.exe -m pip install --upgrade pip
Note
If the "Assets" and "BestVideoResult" folders are also cloned, you can ignore them.
Ensure you have access to the YOLOv8 weights. You can download pre-trained weights from the Ultralytics repository.
These models utilize pre-trained weights available through PyTorch's model zoo. Ensure you have an active internet connection for the initial download.
Download the MOT17 Dataset and extract it. Place the dataset in the data
folder located in the project's root directory.
Here's an example of how your directory structure should look:
mkdir -p data/MOT17
# Place the extracted MOT17 dataset files here
Ensure the directory structure is as follows:
videodetectionproject/
│
├── custom_models/
│ └── __init__.py
│ └── custom_faster_rcnn.py
│ └── ssd.py
│ └── yolo.py
│ └── yolov8s.pt
├── utils/
│ └── __init__.py
│ └── video_processing.py
│ └── visualization.py
├── data/
│ └── MOT17/
│ ├── train/
│ └── test/
│
├── requirements.txt
├── main.py
...
The application features a user-friendly Graphical User Interface (GUI) built with tkinter
. The GUI allows you to select video sequences, configure models, and monitor the progress of processing.
-
Sequence Selection:
- Add Sequences: Browse and select directories containing your video sequences.
- Remove Selected: Remove selected sequences from the list.
- Clear List: Clear all sequences from the list.
-
Model Selection:
- Select one or more models for object detection (e.g., YOLOv8s, Faster R-CNN, SSD).
-
Parameters:
- Automatic Calibration: Enable or disable the automatic calibration of confidence thresholds.
- Confidence Threshold: Manually set the confidence threshold if automatic calibration is disabled.
- Calibration Method: Choose between Linear Search (more accurate) and Binary Search (faster).
-
Control Buttons:
- Start Processing: Begin processing the selected sequences with the chosen models.
- Stop Processing: Stop the processing at any time.
-
Progress Bar:
- Monitor the progress of processing in real-time.
-
Status Display:
- View status messages and updates during processing.
-
Evaluation Results:
- After processing, detailed performance metrics are displayed in this section.
-
Select Video Sequences:
- Click on "Add Sequences" to browse and select directories containing your video sequences.
- Supported format is the folder structure of the MOT17 dataset sequences.
-
Choose Detection Models:
- Check the boxes next to the desired models (e.g., YOLOv8s, Faster R-CNN, SSD).
-
Configure Parameters:
- Enable "Automatic Calibration" to let the application determine optimal confidence thresholds.
- Choose the calibration method: Linear Search (more accurate) or Binary Search (faster).
- If automatic calibration is disabled, manually set the desired confidence threshold.
-
Start Processing:
- Click on "Start Processing" to begin object detection on the selected sequences.
- Monitor the progress through the progress bar and status display.
-
Stop Processing:
- Click on "Stop Processing" to terminate the process at any time.
-
View Results:
- Upon completion, annotated videos, metrics tables, and confusion matrix images will be saved in the output directory.
- Evaluation results are displayed within the application under the "Evaluation Results" section.
The application computes a range of performance metrics to evaluate the effectiveness of the object detection models.
- Mean Average Precision (mAP): Measures the accuracy of the model in predicting bounding boxes and classifying objects.
- Precision: The ratio of correctly predicted positive observations to the total predicted positives.
- Recall: The ratio of correctly predicted positive observations to all actual positives.
- F1 Score: The weighted average of Precision and Recall.
- Intersection over Union (IoU): Measures the overlap between the predicted bounding box and the ground truth.
- Processing Time: Total time taken to process the video sequences.
- Average Confidence: The average confidence score of the detections.
- Calibration (Optional): If enabled, the application calibrates the confidence threshold to optimize mAP.
- Detection: Each frame is processed to detect objects using the selected models.
- Metrics Calculation: Performance metrics are computed based on the detections and ground truth annotations.
- Visualization: Metrics are visualized through tables and confusion matrices for easy interpretation.
The project provides comprehensive visualizations to aid in understanding the detection performance.
A table summarizing all the computed metrics is generated and saved as an image. This table includes mAP, Precision, Recall, F1 Score, IoU, Processing Time, Average Confidence, TP, FP, and FN.
A confusion matrix image displays the counts of True Positives (TP), False Positives (FP), and False Negatives (FN), providing insight into the model's detection capabilities.
Processed videos with bounding boxes and labels drawn around detected objects are saved for visual inspection.
In this section, the results of the evaluation using different calibration methods will be presented (Automatic and Manual). These methods were applied on three videos from the MOT17 Dataset, specifically Videos 02, 04, and 05, and the models evaluated include Faster R-CNN, SSD300, and YOLOv8.
These tables and confusion matrices provide a detailed overview of the performance of different object detection models under various calibration methods.
Bug: If the Stop Sequence button is used during processing and the program is not restarted, it can cause errors in subsequent runs.
Solution: After stopping a sequence or finishing a sequence process, ensure that the program is restarted before running a new sequence. This prevents any lingering state issues.
Bug: If unsupported video formats are used, the application may fail to process the videos.
Solution: Make sure to use only MP4 format, as this is the only format currently supported.
This project is licensed under the MIT License.
- Ultralytics YOLOv8: For providing an efficient and accurate object detection framework.
- PyTorch: For its robust deep learning library that facilitates model training and inference.
- OpenCV: For powerful video processing capabilities.
- TorchMetrics: For comprehensive evaluation metrics.
- Matplotlib & tqdm: For effective data visualization and progress tracking.
- MOT17 Dataset: For providing ground truth annotations used in evaluation.
The code for this project was developed using PyCharm, which offers a powerful IDE for Python development.
Happy Detecting! 🚀