Skip to content

Manually implemented ResNet-50 layer by layer to classify 5 bird species with 85% accuracy using Mel-spectrograms from 1,250+ audio samples.

aneetalr/BirdSoundClassification

Repository files navigation

Bird Sound Classification Using ResNet50

Developed a deep learning-based system to classify bird species from audio recordings using the ResNet-50 architecture. Audio data was pre-processed into Mel-spectrograms and fed into a 50-layer convolutional neural network to classify five distinct bird species: Blue Jay, Black-capped Chickadee, Mallard, Common Tern, and American Redstart. The model was trained on a subset of the Xeno-Canto bird sound dataset from Kaggle, assisting researchers in identifying species efficiently through acoustic analysis.

INTRODUCTION

There are some researchers and scientists who are studying about bird. This study is known as ornithology (a branch of zoology) and the researchers are known as ornithologist. Birds help a lot to shape the plant life we see around us. By recognizing bird songs ornithologist can monitor wildlife and study the behaviour of birds. Every species of birds has their unique sounds. But identifying a bird is a difficult method because we can’t find birds physically. Bird tends to fly away from human presence. Birds tends to hide from humans also. So, seeing them and taking pictures of them for identification is difficult. But recording the sound of bird is easy and the recording can be useful in the future researches.

Their sound can be heard up to a certain distance and with the help of new technologies (smartphones, audio recorders, etc) we can record those sound. These recorded sounds can be used to find bird species because different bird species are having different sounds. Due to easily available recordings of bird sounds we can easily convert into spectrogram train the model with CNN architecture. People who are curious about wildlife can also use this system, all the have to do is record the bird sound and upload it into system.

The proposed system is a web application called bird sound classification. It helps in identifying the species of bird using their recorded sound. By identifying the bird helps researchers and scientist in ornithology study. This system uses the model built using Resnet50 architecture. Resnet50 is a convolutional neural network architecture. Resnet stand for Residual Network and 50 means it has 50 layers. The dataset used to train and evaluate the model is taken from a part of an online bird sound repository called xeno-canto. They are available as a dataset in Kaggle. It has many different bird species but here, we are using only 5 of them. They are Blue Jay, Blackcapped Chickadee, Mallard, Common Tern and American Redstart. It is an unbalanced dataset having total 1250 audio files. By predicting the species from their sound, the researches go forward with their research on that bird.

Dataset

Dataset was collected from: https://www.kaggle.com/datasets/rohanrao/xeno-canto-bird-recordings-extended-a-m Dataset used in this project is xeno-canto bird recordings taken from Kaggle. This +dataset contains sound recordings of 264 species. It is an unbalanced dataset containing 14.7K audio files. These are mp3 files. From this dataset only 5 species were selected for this project. They are Blue Jay, Black-capped Chickadee, Mallard, Common Tern and American Redstart. Audio files are having 5 sec or more duration. The number of files selected as data all together is 1250. These audio files are trimmed into 5 to 10 sec until the end of audio and converted to Mel-spectrogram. These converted Melspectrogram are saved as a dataset for building the model. These Mel-spectrogram are of size 432 x 432 and in PNG format. Each class excepted around 900 spectrograms except Common Tern and Common Tern has around 700 spectrograms. These PNG files are then pre-processed (converted to 224 x 224 size) to pass it to the architecture.

CONCLUSION

The main aim of this project was to create a web application called Bird sound at can be used as input to the ResNet50 model by this conversion. The project would involve pre-processing the dataset to extract mel-spectrograms, training the ResNet50 model on the mel-spectrograms, and evaluating the performance of the model on a test dataset. The model has 88.07% of training accuracy, 81.64% of validation accuracy and 81.73% of evaluation accuracy. I used Google Colab and Jupyter Notebook for developing this application. It made the work so easy and efficient. The use of Graphics Processing Unit (GPU) can accelerate the training process and yield faster results. The models were trained six to ten times for different number of epochs and finally an optimum number 20 was chosen as the number of epochs for training in the final stage. Saving the model after each epoch can help us to choose the best model based on the validation loss and validation accuracy. By this I have achieved the abovementioned accuracy. Implementing, the bird sound classification system using ResNet50 and mel-spectrograms could accurately classify bird species based on their vocalizations. This system could be useful for researchers studying bird behaviour, conservationists monitoring bird populations, or bird enthusiasts interested in identifying the birds they hear in the wild.

About

Manually implemented ResNet-50 layer by layer to classify 5 bird species with 85% accuracy using Mel-spectrograms from 1,250+ audio samples.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published