An annotated dataset for automated detection and counting of estuarine fish in poor visibility conditions
Here we provide an open-access and annotated baited underwater dataset of poor and fair visibility videos for the development of fish detection models and benchmarking of image pre-processing tools. We provide the annotated training annotations and images, and a 12 hour testing dataset with groundtruth MaxN abundance for four target species.
This dataset can be used
- As a computer vision training dataset to monitor estuarine fish in the eastern coast of Australia.
- As a benchmark dataset to test image pre-processing techniques (e.g. colour correction).
- As a benchmark dataset to test image post-processing techniques (e.g. fish occlussion filters)
- To supplement global fish detection models (e.g. see MegaDetector by Microsoft)
- To increase accessibility of underwater computer vision tools for aquatic monitoring and environmental science (e.g. see lilascience)
We provide access to two datasets: training and testing dataset.
The training dataset is a fully annotatated dataset that contains images, annotations and labels of various fish species. The training dataset includes videos from 2017-2021 of Moreton Bay, Australia across poor visibility secchi depths (2-5 m) from an standard baited underwater video rig with GoPro cameras recording at 1080p.
The testing dataset includes several non-annotated videos from the same location, visibility scenarios and period as the training dataset. The testing dataset can be used to evaluate computer vision fish detection models. The groundtruth is a csv that has manual maximum abundance counts of each fish species across each video. The maximum number of individuals per video were manually determined by researchers at the Moreton Bay Environmental Education Centre.
The training and testing dataset were collected by the Moreton Bay Environmental Education Centre.
The training dataset contains >65,000 segmentation mask annotations of 19 different estuarine fish species from Moreton Bay, Australia. We targeted four species for studies conducted at the Global Wetlands Project. Therefore, these species have a larger number of annotations. We suggest caution when using annotations of the non-targeted species, as these were variably annotated across the dataset. Please contact Sebastian Lopez-Marcano for more information
Species | Num Annotations | Targeted species |
---|---|---|
Australasian Snapper | 9,489 | YES |
Bengal Sergeant | 277 | NO |
Black-Banded Trevally | 89 | NO |
Blue Catfish | 2,411 | NO |
Blue Swimmer Crab | 847 | NO |
Eastern Striped Grunter | 14,631 | NO |
Eastern Stripey | 307 | NO |
Echinoderm | 14 | NO |
Fanbelly Leatherjacket | 190 | NO |
Gunthers Wrasse | 603 | NO |
Mackerel spp | 139 | NO |
Moses Snapper | 53 | NO |
Paradise Threadfin Bream | 10,658 | YES |
Pinkbanded Grubfish | 502 | NO |
Pomacentrid spp | 27 | NO |
Remora spp | 41 | NO |
Smallmouth Scad | 7,067 | YES |
Smooth Golden Toadfish | 5,014 | YES |
Yellowfin Bream and Tarwhine | 11,872 | NO |
Dataset | Raw Videos | Raw Images | Version | Num Annotations | Annotations (CSV/JSON) |
---|---|---|---|---|---|
Training dataset | NA | Download 555 MB | 7 | 8,696 | Download 19 MB |
Testing dataset | Download 6.5 GB | NA | 1 | NA | Groundtruth |
Each annotation includes object instance annotations which consists of the following key fields: Labels are provided as a common name: YellowfinBream for Acanthopagrus australis; bounding boxes that enclose the species in each frame are provided in "[x,y,width,height]" format, in pixel units; Segmentation masks which outline the species as a polygon are provided as a list of pixel coordinates in the format "9x,y,x,y,...]".
The corresponding image is provided as an image filename. All image coordinated (bounding box and segmentation masks) are measured from the top left mage corner and or 0-indexed.
Annotations are provided in both CSV format and COCO JSON format which is a commonly used data format for integration with object detection frameworks including PyTorch and TensorFlow. For more information on annotations files in COCO JSON and/or CSV formats go here.
Please use 'CITATION.cff' to cite this dataset.
We kindly request that the following text be included in an acknowledgements section at the end of your publications:
"We would like to thank the Moreton Bay Environmental Education Centre for freely supplying us with the fish dataset for our research. The fish dataset was supported by an AI for Earth grant from Microsoft."