This project implements a complete computer vision pipeline to process underwater video footage. The system is designed to perform three core tasks:
- Detection: Identify and locate fish within each frame of a video.
- Tracking: Assign a unique, persistent ID to each detected fish and follow it across multiple frames.
- Classification: Determine the specific species of each detected fish.
The pipeline is built to be flexible, offering multiple processing modes ranging from fast, real-time analysis for live camera feeds to more computationally intensive, high-accuracy offline processing for pre-recorded videos.
The project is organized into the following directories to ensure a clean and scalable workflow:
Deep-Learning/
├── Datasets/
│ └── ... (Contains raw datasets like F4K and DeepFish)
├── models/
│ └── ... (Contains the models, both pre-trained and fine-tuned for fishes)
├── outputs/
│ └── ... (Generated videos and data files are saved here)
├── presentation/
│ └── ... (Contains files for the presentation of the project)
├── src/
│ ├── main.py # The main entry point for the application
│ ├── pipeline.py
│ ├── object_detector/
│ │ └── train_yolo.py
│ │ └── ...
│ └── tracker/
│ ├── bytetrack.yaml
│ └── evaluate_f4k.py
├── theory/
│ └── ... (Contains all the theory regarding the YOLO models and its history)
└── README.md
Datasets/: Holds all the datasets used for training and evaluation.models/: Contains the final, trained model files (.pt).outputs/: The default location where all generated videos and data files are saved.src/: Contains all the Python source code.
Important
The Datasets/ folder is not present in the repository due to its weight.
You can download the DeepFish dataset here (small version) or here (original version) and the Fish4Knowledge dataset here.
Then save them in the Datasets/ folder and run the converter script in src/object_detector to have the correct format.
You can download the already preprocessed Datasets here
The pipeline relies on a single, multi-class YOLOv8 model that must be trained beforehand.
Training the Model (deepfish_multi_m.pt): This model is trained to detect and classify multiple fish species simultaneously in the DeepFish dataset. It should be trained using the src/object_detector/train_yolo.py script on a dataset that has bounding boxes labeled with their specific species. The relative model for F4K can be obtain fine-tuning the YOLOv8 model on the F4K dataset.
Ensure your final, trained best.pt model is placed in the models/ directory.
You can run the train_yolo.py script using the commands in this file.
The main entry point for all operations is src/main.py. You must run all commands from the project's root directory. The pipeline can process both pre-recorded video files and live camera footage.
The --source argument determines the input type:
- For a video file, provide the full path to the file (e.g.,
"Datasets/f4k/gt_113.mp4"). - For a live camera, provide its numerical ID (e.g.,
"0"for the default system webcam).
The pipeline has two modes, selected with the --mode flag:
standard: (Default) Optimized for speed. It displays the most confident species prediction for each track in real-time. Ideal for live camera feeds.buffered: A real-time capable mode that introduces a short delay. It only displays a track after it has been stable for a certain number of frames (--min-duration), reducing visual noise from fleeting detections.
Live Camera (Buffered Mode):
# Use camera 0 and wait for 10 frames of stability
py src/main.py --source 0 --mode buffered --min-duration 10Pre-recorded Video (Standard Mode):
py src/main.py --source "data/Deepfish/7398/train/output.mp4" --mode standard --yolo-model "models/multi_deepfish_m.pt"Note
This project was developed using Python 3.10
You can customize the pipeline's behavior using the following arguments:
| Argument | Description | Default Value |
|---|---|---|
--source |
[Required] Path to the input video file or the camera ID (e.g., "0"). | None |
--yolo-model |
Path to the multi-class YOLO model for detection and classification. | models/deepfish_multi_m.pt |
--output-dir |
Directory where the annotated output videos will be saved. | outputs |
--mode |
The processing mode to use. | standard |
--tracker-config |
Path to the tracker's .yaml configuration file for tuning. |
bytetrack.yaml |
--min-duration |
Minimum frames a track must exist to be considered stable. (Used in buffered mode). |
2 |
-
Fisher, R. B., Chen-Burger, Y.-H., Giordano, D., Hardman, L., & Lin, F.-P. (2016). Fish4Knowledge: Collecting and Analyzing Massive Coral Reef Fish Video Data. Springer.
-
Saleh, A., Laradji, I. H., Konovalov, D. A., Bradley, M. D., Vazquez, D., & Sheaves, M. (2020). A realistic fish-habitat dataset to evaluate algorithms for underwater visual analysis. Scientific Reports, 10(1). https://doi.org/10.1038/s41598-020-71639-x
-
Zhang, Y., Sun, P., Jiang, Y., Yu, D., Weng, F., Yuan, Z., Luo, P., Liu, W., & Wang, X. (2022). ByteTrack: Multi-Object Tracking by Associating Every Detection Box. ArXiv:2110.06864 [cs]. https://arxiv.org/abs/2110.06864
-
Yaseen, M. (2024). What is YOLOv8: An In-Depth Exploration of the Internal Features of the Next-Generation Object Detector. ArXiv.org. https://arxiv.org/abs/2408.15857
-
Cui, M., et al. (2024). Fish Tracking, Counting, and Behaviour Analysis in Digital Aquaculture: A Comprehensive Review. ArXiv.org. https://arxiv.org/abs/2406.17800