Skip to content

Object detection, object tracking, and overtaking behaviour detection on panoramic (360°) equirectangular videos

Notifications You must be signed in to change notification settings

SpaceTimeLab/360_object_tracking

 
 

Repository files navigation

Multiple Object Detection and Tracking in Panoramic Videos for Cycling Safety Analysis

arXiv preprint

Authors: Jingwei Guo, Yitai Cheng, Meihui Wang, Ilya Ilyankou, Natchapon Jongwiriyanurak, Xiaowei Gao, Nicola Christie, James Haworth

This package is used for object detection, object tracking, and overtaking behaviour detection on panoramic (360°) equirectangular videos, initially developed as part of Jingwei Guo's MSc thesis.

The approach improves detection by projecting equirectangular frames into four overlapping perspective sub-images, applying detectors, and then reprojecting and merging bounding boxes to handle distortions and long objects. YOLOv12 models pre-trained on the COCO dataset are used as detectors. Tracking is based on StrongSORT (see https://github.com/yitai-cheng/StrongSORT), modified to incorporate object category information and boundary continuity, reducing false positives and ID switches in panoramic views. The overtaking detection module builds on these tracking results, identifying and classifying overtaking manoeuvres by vehicles around cyclists.

Dependencies and Installation

The library should be run under Python 3.8+ with the following libraries installed:

detectron2 (version updated before Aug 5, 2022 only)

torch

torchvision

numpy

matplotlib

scipy

opencv-python

pillow

pandas

seaborn

  1. First, clone the repository:
git clone https://github.com/SpaceTimeLab/360_object_tracking
  1. To install all the dependencies (except Detectron2), run the following command in a new conda environment called, for example, 360:
conda create --name 360 -c conda-forge python=3.8
conda install pip
pip install -r requirements.txt
  1. Since in the new versions of Detectron2 (updated after Aug 5, 2022), some APIs have been modified, here we install an old version of it:
pip install -e git+https://github.com/facebookresearch/detectron2.git@5aeb252b194b93dc2879b4ac34bc51a31b5aee13#egg=detectron2

pip install pillow==9.5.0 # see https://github.com/facebookresearch/detectron2/issues/5010#issuecomment-1752284625
  1. Download the pre-trained ReID network used in DeepSORT:
cd deep_sort/deep/checkpoint
pip install gdown
gdown 'https://drive.google.com/uc?export=download&id=1_qwTWdzT9dWNudpusgKavj_4elGgbkUN'
cd ../../../

Main functionality

The implementation process of each functionality (object detection, object tracking, and overtaking detection) is explained in detail in Code Explanation.ipynb.

360 Object Detection

To realize object detection on panoramic videos of equirectangular projection, execute Object_Detection.py in the Terminal as below:

python Object_Detection.py [--input_video_path INPUT_VIDEO_PATH] [--output_video_path OUTPUT_VIDEO_PATH] [--classes_to_detect CLASSES_TO_DETECT] [--FOV FOV] [--THETAs THETAS] [--PHIs PHIS] [--sub_image_width SUB_IMAGE_WIDTH] [--model_type MODEL_TYPE] [--score_threshold SCORE_THRESHOLD] [--nms_threshold NMS_THRESHOLD] [--use_mymodel USE_MYMODEL]

The following arguments are provided:

Argument Description Required? Defaults
INPUT_VIDEO_PATH Path of the input video ✔️
OUTPUT_VIDEO_PATH Path of the output video ✔️
CLASSES_TO_DETECT Index numbers of the categories to detect in the COCO dataset [0, 1, 2, 3, 5, 7, 9]
FOV Field of view of the sub images 120
THETAS A list which contains the theta of each sub image (The length should be the same as the number of sub images) [0, 90, 180, 270]
PHIS A list which contains the Phi of each sub image (The length should be the same as the number of sub images) [-10, -10, -10, -10]
SUB_IMAGE_WIDTH Width (or height) of the sub images 640
MODEL_TYPE A string that determines which detector to use ("YOLO" or "Faster RCNN") "YOLO"
SCORE_THRESHOLD The threshold of the confidence score 0.4
NMS_THRESHOLD The threshold of the Non Maximum Suppression 0.45
USE_MYMODEL A boolean value which determines whether to use the improved object detection model, if False, instead of being split into 4 parts, the image will be detected as a whole True
short_edge_size The length of short edge 0

360 Object Tracking

To realize object tracking on panoramic videos of equirectangular projection, execute Object_Tracking.py in the Terminal as below:

python Object_Tracking.py [--input_video_path INPUT_VIDEO_PATH] [--output_video_path OUTPUT_VIDEO_PATH] [--MOT_text_path MOT_TEXT_PATH] [--prevent_different_classes_match PREVENT_DIFFERENT_CLASSES_MATCH] [--match_across_boundary MATCH_ACROSS_BOUNDARY] [--classes_to_detect CLASSES_TO_DETECT] [--FOV FOV] [--THETAs THETAS] [--PHIs PHIS] [--sub_image_width SUB_IMAGE_WIDTH] [--model_type MODEL_TYPE] [--score_threshold SCORE_THRESHOLD] [--nms_threshold NMS_THRESHOLD] [--use_mymodel USE_MYMODEL]

The following arguments are provided:

Argument Description Required? Defaults
INPUT_VIDEO_PATH Path of the input video ✔️
OUTPUT_VIDEO_PATH Path of the output video ✔️
PREVENT_DIFFERENT_CLASSES_MATCH A boolean value which determines whether to use the support for multiple categories in DeepSORT True
MATCH_ACROSS_BOUNDARY A boolean value which determines whether to use the support for boundary continuity in DeepSORT True
CLASSES_TO_DETECT Index numbers of the categories to detect in the COCO dataset [0, 1, 2, 3, 5, 7, 9]
FOV Field of view of the sub images 120
THETAS A list which contains the theta of each sub image (The length should be the same as the number of sub images) [0, 90, 180, 270]
PHIS A list which contains the Phi of each sub image (The length should be the same as the number of sub images) [-10, -10, -10, -10]
SHORT_EDGE_SIZE Width (or height) of the sub images 1280
MODEL_TYPE A string that determines which detector to use ("YOLO" or "Faster RCNN") "YOLO"
SCORE_THRESHOLD The threshold of the confidence score 0.4
NMS_THRESHOLD The threshold of the Non Maximum Suppression 0.45
USE_MYMODEL A boolean value which determines whether to use the improved object detection model, if False, instead of being split into 4 parts, the image will be detected as a whole True

360 Overtaking Behaviour Detection

To realize overtaking behaviour detection on panoramic videos of equirectangular projection, execute Overtaking_Detection.py in the Terminal as below:

python Overtaking_Detection.py [--input_video_path INPUT_VIDEO_PATH] [--output_video_path OUTPUT_VIDEO_PATH] [--mode MODE] [--prevent_different_classes_match PREVENT_DIFFERENT_CLASSES_MATCH] [--match_across_boundary MATCH_ACROSS_BOUNDARY] [--classes_to_detect CLASSES_TO_DETECT] [--classes_to_detect_movement CLASSES_TO_DETECT_MOVEMENT] [--size_thresholds SIZE_THRESHOLDS] [--FOV FOV] [--THETAs THETAS] [--PHIs PHIS] [--sub_image_width SUB_IMAGE_WIDTH] [--model_type MODEL_TYPE] [--score_threshold SCORE_THRESHOLD] [--nms_threshold NMS_THRESHOLD] [--use_mymodel USE_MYMODEL]

The following arguments are provided:

Argument Description Required? Defaults
INPUT_VIDEO_PATH Path of the input video ✔️
OUTPUT_VIDEO_PATH Path of the output video ✔️
MODE A string that determines which kind of overtaking behaviour to detect, "Confirmed" or "Unconfirmed" "Confirmed"
PREVENT_DIFFERENT_CLASSES_MATCH A boolean value which determines whether to use the support for multiple categories in DeepSORT True
MATCH_ACROSS_BOUNDARY A boolean value which determines whether to use the support for boundary continuity in DeepSORT True
CLASSES_TO_DETECT Index numbers of the categories to detect in the COCO dataset [0, 1, 2, 3, 5, 7, 9]
CLASSES_TO_DETECT_MOVEMENT Index numbers of the categories for movement detection in the COCO dataset, which should be a subset of classes_to_detect [2, 5, 7]
SIZE_THRESHOLDS A set of size thresholds which should share the same length with classes_to_detect_movement, if the size of a track of a certain class is larger than the corresponding threshold, then it is considered as close to the user [500 * 500, 900 * 900, 600 * 600]
FOV Field of view of the sub images 120
THETAS A list which contains the theta of each sub image (The length should be the same as the number of sub images) [0, 90, 180, 270]
PHIS A list which contains the Phi of each sub image (The length should be the same as the number of sub images) [-10, -10, -10, -10]
SUB_IMAGE_WIDTH Width (or height) of the sub images 1280
MODEL_TYPE A string that determines which detector to use ("YOLO" or "Faster RCNN") "YOLO"
SCORE_THRESHOLD The threshold of the confidence score 0.4
NMS_THRESHOLD The threshold of the Non Maximum Suppression 0.45
USE_MYMODEL A boolean value which determines whether to use the improved object detection model, if False, instead of being split into 4 parts, the image will be detected as a whole True

Examples

For better understanding, several examples of using this package is listed as below:

  1. To detect the bicycles, cars and motorbikes ([1, 2, 3] in COCO) in a video called test.mp4 with the original Faster RCNN and output the result video as test_object_detection.mp4, run the following command:
python Object_Detection.py --input_video_path test.mp4 --output_video_path test_object_detection.mp4 --use_mymodel True  --model_type "YOLO"
  1. To track the people and cars ([0, 2] in COCO) in a video called test.mp4 with the improved YOLO v5 whose input resolution is 1280, and to output the result video and MOT texts as test_object_tracking.mp4 and test_object_tracking.txt, run the following command:
python Object_Tracking.py --input_video_path test.mp4 --output_video_path test_object_tracking.mp4 --short_edge_size 1280
  1. To track people, bicycles, cars, motorbikes, buses, trucks and traffic lights ([0, 1, 2, 3, 5, 7, 9] in COCO) in a video called test.mp4 and detect the close unconfirmed overtakes (size<160000) of only cars with the improved YOLO v5 whose input resolution is 640, and to output the result video as test_overtaking_detection.mp4, run the following command:
python Overtaking_Detection.py --input_video_path test.mp4 --output_video_path test_overtaking_detection.mp4 --mode 'Unconfirmed' --classes_to_detect_movement 2 5 7

Model evaluation

See evaluation_code/ for evaluation scripts. They:

  • Load ground truth annotations and compare against predictions
  • Compute standard COCO metrics (AP, AR) using cocoeval
  • Evaluate tracking performance using py-motmetrics
  • Run overtaking detection on the video dataset and compare the result with ground truth

To cite

If you find the project useful in your research, please consider citing:

@misc{guo2024multipleobjectdetectiontracking,
      title={Multiple Object Detection and Tracking in Panoramic Videos for Cycling Safety Analysis},
      author={Jingwei Guo and Yitai Cheng and Meihui Wang and Ilya Ilyankou and Natchapon Jongwiriyanurak and Xiaowei Gao and Nicola Christie and James Haworth},
      year={2024},
      eprint={2407.15199},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2407.15199},
}

About

Object detection, object tracking, and overtaking behaviour detection on panoramic (360°) equirectangular videos

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 68.8%
  • Jupyter Notebook 31.2%