Skip to content

A fully annotated baited underwater dataset of poor and fair visibility videos for the development of fish detection models and image pre-processing tools.

Notifications You must be signed in to change notification settings

slopezmarcano/dataset-fish-detection-low-visibility

Repository files navigation

An annotated dataset for automated detection and counting of estuarine fish in poor visibility conditions

Powered by In collaboration with Supported by Status DOI

Table of Contents

  1. Overview
  2. Uses
  3. Datasets
  4. Species
  5. Dataset Links
  6. Attributions

Overview

Here we provide an open-access and annotated baited underwater dataset of poor and fair visibility videos for the development of fish detection models and benchmarking of image pre-processing tools. We provide the annotated training annotations and images, and a 12 hour testing dataset with groundtruth MaxN abundance for four target species.

alt text

Uses

This dataset can be used

  1. As a computer vision training dataset to monitor estuarine fish in the eastern coast of Australia.
  2. As a benchmark dataset to test image pre-processing techniques (e.g. colour correction).
  3. As a benchmark dataset to test image post-processing techniques (e.g. fish occlussion filters)
  4. To supplement global fish detection models (e.g. see MegaDetector by Microsoft)
  5. To increase accessibility of underwater computer vision tools for aquatic monitoring and environmental science (e.g. see lilascience)

Datasets

We provide access to two datasets: training and testing dataset.

The training dataset is a fully annotatated dataset that contains images, annotations and labels of various fish species. The training dataset includes videos from 2017-2021 of Moreton Bay, Australia across poor visibility secchi depths (2-5 m) from an standard baited underwater video rig with GoPro cameras recording at 1080p.

The testing dataset includes several non-annotated videos from the same location, visibility scenarios and period as the training dataset. The testing dataset can be used to evaluate computer vision fish detection models. The groundtruth is a csv that has manual maximum abundance counts of each fish species across each video. The maximum number of individuals per video were manually determined by researchers at the Moreton Bay Environmental Education Centre.

The training and testing dataset were collected by the Moreton Bay Environmental Education Centre.

Species

The training dataset contains >65,000 segmentation mask annotations of 19 different estuarine fish species from Moreton Bay, Australia. We targeted four species for studies conducted at the Global Wetlands Project. Therefore, these species have a larger number of annotations. We suggest caution when using annotations of the non-targeted species, as these were variably annotated across the dataset. Please contact Sebastian Lopez-Marcano for more information

Species Num Annotations Targeted species
Australasian Snapper 9,489 YES
Bengal Sergeant 277 NO
Black-Banded Trevally 89 NO
Blue Catfish 2,411 NO
Blue Swimmer Crab 847 NO
Eastern Striped Grunter 14,631 NO
Eastern Stripey 307 NO
Echinoderm 14 NO
Fanbelly Leatherjacket 190 NO
Gunthers Wrasse 603 NO
Mackerel spp 139 NO
Moses Snapper 53 NO
Paradise Threadfin Bream 10,658 YES
Pinkbanded Grubfish 502 NO
Pomacentrid spp 27 NO
Remora spp 41 NO
Smallmouth Scad 7,067 YES
Smooth Golden Toadfish 5,014 YES
Yellowfin Bream and Tarwhine 11,872 NO

Dataset Links

Dataset Raw Videos Raw Images Version Num Annotations Annotations (CSV/JSON)
Training dataset NA Download 555 MB 7 8,696 Download 19 MB
Testing dataset Download 6.5 GB NA 1 NA Groundtruth

Annotations

Each annotation includes object instance annotations which consists of the following key fields: Labels are provided as a common name: YellowfinBream for Acanthopagrus australis; bounding boxes that enclose the species in each frame are provided in "[x,y,width,height]" format, in pixel units; Segmentation masks which outline the species as a polygon are provided as a list of pixel coordinates in the format "9x,y,x,y,...]".

The corresponding image is provided as an image filename. All image coordinated (bounding box and segmentation masks) are measured from the top left mage corner and or 0-indexed.

Annotations are provided in both CSV format and COCO JSON format which is a commonly used data format for integration with object detection frameworks including PyTorch and TensorFlow. For more information on annotations files in COCO JSON and/or CSV formats go here.

Attributions

Please use 'CITATION.cff' to cite this dataset.

We kindly request that the following text be included in an acknowledgements section at the end of your publications:

"We would like to thank the Moreton Bay Environmental Education Centre for freely supplying us with the fish dataset for our research. The fish dataset was supported by an AI for Earth grant from Microsoft."

alt text alt text

About

A fully annotated baited underwater dataset of poor and fair visibility videos for the development of fish detection models and image pre-processing tools.

Resources

Stars

Watchers

Forks

Packages

No packages published