Authors: Jingwei Guo, Yitai Cheng, Meihui Wang, Ilya Ilyankou, Natchapon Jongwiriyanurak, Xiaowei Gao, Nicola Christie, James Haworth
This package is used for object detection, object tracking, and overtaking behaviour detection on panoramic (360°) equirectangular videos, initially developed as part of Jingwei Guo's MSc thesis.
The approach improves detection by projecting equirectangular frames into four overlapping perspective sub-images, applying detectors, and then reprojecting and merging bounding boxes to handle distortions and long objects. YOLOv12 models pre-trained on the COCO dataset are used as detectors. Tracking is based on StrongSORT (see https://github.com/yitai-cheng/StrongSORT), modified to incorporate object category information and boundary continuity, reducing false positives and ID switches in panoramic views. The overtaking detection module builds on these tracking results, identifying and classifying overtaking manoeuvres by vehicles around cyclists.
The library should be run under Python 3.8+ with the following libraries installed:
detectron2 (version updated before Aug 5, 2022 only)
- First, clone the repository:
git clone https://github.com/SpaceTimeLab/360_object_tracking
- To install all the dependencies (except Detectron2), run the following command in a new conda environment called, for example,
360:
conda create --name 360 -c conda-forge python=3.8
conda install pip
pip install -r requirements.txt
- Since in the new versions of Detectron2 (updated after Aug 5, 2022), some APIs have been modified, here we install an old version of it:
pip install -e git+https://github.com/facebookresearch/detectron2.git@5aeb252b194b93dc2879b4ac34bc51a31b5aee13#egg=detectron2
pip install pillow==9.5.0 # see https://github.com/facebookresearch/detectron2/issues/5010#issuecomment-1752284625
- Download the pre-trained ReID network used in DeepSORT:
cd deep_sort/deep/checkpoint
pip install gdown
gdown 'https://drive.google.com/uc?export=download&id=1_qwTWdzT9dWNudpusgKavj_4elGgbkUN'
cd ../../../
The implementation process of each functionality (object detection, object tracking, and overtaking detection) is explained in detail in Code Explanation.ipynb.
To realize object detection on panoramic videos of equirectangular projection, execute Object_Detection.py in the Terminal as below:
python Object_Detection.py [--input_video_path INPUT_VIDEO_PATH] [--output_video_path OUTPUT_VIDEO_PATH] [--classes_to_detect CLASSES_TO_DETECT] [--FOV FOV] [--THETAs THETAS] [--PHIs PHIS] [--sub_image_width SUB_IMAGE_WIDTH] [--model_type MODEL_TYPE] [--score_threshold SCORE_THRESHOLD] [--nms_threshold NMS_THRESHOLD] [--use_mymodel USE_MYMODEL]
The following arguments are provided:
| Argument | Description | Required? | Defaults |
|---|---|---|---|
| INPUT_VIDEO_PATH | Path of the input video | ✔️ | |
| OUTPUT_VIDEO_PATH | Path of the output video | ✔️ | |
| CLASSES_TO_DETECT | Index numbers of the categories to detect in the COCO dataset | [0, 1, 2, 3, 5, 7, 9] | |
| FOV | Field of view of the sub images | 120 | |
| THETAS | A list which contains the theta of each sub image (The length should be the same as the number of sub images) | [0, 90, 180, 270] | |
| PHIS | A list which contains the Phi of each sub image (The length should be the same as the number of sub images) | [-10, -10, -10, -10] | |
| SUB_IMAGE_WIDTH | Width (or height) of the sub images | 640 | |
| MODEL_TYPE | A string that determines which detector to use ("YOLO" or "Faster RCNN") | "YOLO" | |
| SCORE_THRESHOLD | The threshold of the confidence score | 0.4 | |
| NMS_THRESHOLD | The threshold of the Non Maximum Suppression | 0.45 | |
| USE_MYMODEL | A boolean value which determines whether to use the improved object detection model, if False, instead of being split into 4 parts, the image will be detected as a whole | True | |
| short_edge_size | The length of short edge | 0 |
To realize object tracking on panoramic videos of equirectangular projection, execute Object_Tracking.py in the Terminal as below:
python Object_Tracking.py [--input_video_path INPUT_VIDEO_PATH] [--output_video_path OUTPUT_VIDEO_PATH] [--MOT_text_path MOT_TEXT_PATH] [--prevent_different_classes_match PREVENT_DIFFERENT_CLASSES_MATCH] [--match_across_boundary MATCH_ACROSS_BOUNDARY] [--classes_to_detect CLASSES_TO_DETECT] [--FOV FOV] [--THETAs THETAS] [--PHIs PHIS] [--sub_image_width SUB_IMAGE_WIDTH] [--model_type MODEL_TYPE] [--score_threshold SCORE_THRESHOLD] [--nms_threshold NMS_THRESHOLD] [--use_mymodel USE_MYMODEL]
The following arguments are provided:
| Argument | Description | Required? | Defaults |
|---|---|---|---|
| INPUT_VIDEO_PATH | Path of the input video | ✔️ | |
| OUTPUT_VIDEO_PATH | Path of the output video | ✔️ | |
| PREVENT_DIFFERENT_CLASSES_MATCH | A boolean value which determines whether to use the support for multiple categories in DeepSORT | True | |
| MATCH_ACROSS_BOUNDARY | A boolean value which determines whether to use the support for boundary continuity in DeepSORT | True | |
| CLASSES_TO_DETECT | Index numbers of the categories to detect in the COCO dataset | [0, 1, 2, 3, 5, 7, 9] | |
| FOV | Field of view of the sub images | 120 | |
| THETAS | A list which contains the theta of each sub image (The length should be the same as the number of sub images) | [0, 90, 180, 270] | |
| PHIS | A list which contains the Phi of each sub image (The length should be the same as the number of sub images) | [-10, -10, -10, -10] | |
| SHORT_EDGE_SIZE | Width (or height) of the sub images | 1280 | |
| MODEL_TYPE | A string that determines which detector to use ("YOLO" or "Faster RCNN") | "YOLO" | |
| SCORE_THRESHOLD | The threshold of the confidence score | 0.4 | |
| NMS_THRESHOLD | The threshold of the Non Maximum Suppression | 0.45 | |
| USE_MYMODEL | A boolean value which determines whether to use the improved object detection model, if False, instead of being split into 4 parts, the image will be detected as a whole | True |
To realize overtaking behaviour detection on panoramic videos of equirectangular projection, execute Overtaking_Detection.py in the Terminal as below:
python Overtaking_Detection.py [--input_video_path INPUT_VIDEO_PATH] [--output_video_path OUTPUT_VIDEO_PATH] [--mode MODE] [--prevent_different_classes_match PREVENT_DIFFERENT_CLASSES_MATCH] [--match_across_boundary MATCH_ACROSS_BOUNDARY] [--classes_to_detect CLASSES_TO_DETECT] [--classes_to_detect_movement CLASSES_TO_DETECT_MOVEMENT] [--size_thresholds SIZE_THRESHOLDS] [--FOV FOV] [--THETAs THETAS] [--PHIs PHIS] [--sub_image_width SUB_IMAGE_WIDTH] [--model_type MODEL_TYPE] [--score_threshold SCORE_THRESHOLD] [--nms_threshold NMS_THRESHOLD] [--use_mymodel USE_MYMODEL]
The following arguments are provided:
| Argument | Description | Required? | Defaults |
|---|---|---|---|
| INPUT_VIDEO_PATH | Path of the input video | ✔️ | |
| OUTPUT_VIDEO_PATH | Path of the output video | ✔️ | |
| MODE | A string that determines which kind of overtaking behaviour to detect, "Confirmed" or "Unconfirmed" | "Confirmed" | |
| PREVENT_DIFFERENT_CLASSES_MATCH | A boolean value which determines whether to use the support for multiple categories in DeepSORT | True | |
| MATCH_ACROSS_BOUNDARY | A boolean value which determines whether to use the support for boundary continuity in DeepSORT | True | |
| CLASSES_TO_DETECT | Index numbers of the categories to detect in the COCO dataset | [0, 1, 2, 3, 5, 7, 9] | |
| CLASSES_TO_DETECT_MOVEMENT | Index numbers of the categories for movement detection in the COCO dataset, which should be a subset of classes_to_detect | [2, 5, 7] | |
| SIZE_THRESHOLDS | A set of size thresholds which should share the same length with classes_to_detect_movement, if the size of a track of a certain class is larger than the corresponding threshold, then it is considered as close to the user | [500 * 500, 900 * 900, 600 * 600] | |
| FOV | Field of view of the sub images | 120 | |
| THETAS | A list which contains the theta of each sub image (The length should be the same as the number of sub images) | [0, 90, 180, 270] | |
| PHIS | A list which contains the Phi of each sub image (The length should be the same as the number of sub images) | [-10, -10, -10, -10] | |
| SUB_IMAGE_WIDTH | Width (or height) of the sub images | 1280 | |
| MODEL_TYPE | A string that determines which detector to use ("YOLO" or "Faster RCNN") | "YOLO" | |
| SCORE_THRESHOLD | The threshold of the confidence score | 0.4 | |
| NMS_THRESHOLD | The threshold of the Non Maximum Suppression | 0.45 | |
| USE_MYMODEL | A boolean value which determines whether to use the improved object detection model, if False, instead of being split into 4 parts, the image will be detected as a whole | True |
For better understanding, several examples of using this package is listed as below:
- To detect the bicycles, cars and motorbikes ([1, 2, 3] in COCO) in a video called test.mp4 with the original Faster RCNN and output the result video as test_object_detection.mp4, run the following command:
python Object_Detection.py --input_video_path test.mp4 --output_video_path test_object_detection.mp4 --use_mymodel True --model_type "YOLO"
- To track the people and cars ([0, 2] in COCO) in a video called test.mp4 with the improved YOLO v5 whose input resolution is 1280, and to output the result video and MOT texts as test_object_tracking.mp4 and test_object_tracking.txt, run the following command:
python Object_Tracking.py --input_video_path test.mp4 --output_video_path test_object_tracking.mp4 --short_edge_size 1280
- To track people, bicycles, cars, motorbikes, buses, trucks and traffic lights ([0, 1, 2, 3, 5, 7, 9] in COCO) in a video called test.mp4 and detect the close unconfirmed overtakes (size<160000) of only cars with the improved YOLO v5 whose input resolution is 640, and to output the result video as test_overtaking_detection.mp4, run the following command:
python Overtaking_Detection.py --input_video_path test.mp4 --output_video_path test_overtaking_detection.mp4 --mode 'Unconfirmed' --classes_to_detect_movement 2 5 7
See evaluation_code/ for evaluation scripts. They:
- Load ground truth annotations and compare against predictions
- Compute standard COCO metrics (AP, AR) using cocoeval
- Evaluate tracking performance using py-motmetrics
- Run overtaking detection on the video dataset and compare the result with ground truth
If you find the project useful in your research, please consider citing:
@misc{guo2024multipleobjectdetectiontracking,
title={Multiple Object Detection and Tracking in Panoramic Videos for Cycling Safety Analysis},
author={Jingwei Guo and Yitai Cheng and Meihui Wang and Ilya Ilyankou and Natchapon Jongwiriyanurak and Xiaowei Gao and Nicola Christie and James Haworth},
year={2024},
eprint={2407.15199},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2407.15199},
}