Sound Classification Using CNN on ESC-50 Dataset

Project Overview

This project utilizes a Convolutional Neural Network (CNN) to classify audio samples from the ESC-50 dataset into predefined urban sound classes. The aim is to demonstrate effective audio classification using neural networks, including training, validation, and inference phases, and integrate data visualization and performance metrics to analyze results.

Dataset

The ESC-50 dataset is a collection of 2,000 environmental audio recordings equally distributed among 50 classes. For this project, a subset of 7 urban sound classes was selected:

Airplane
Helicopter
Train
Street Music
Engine
Siren
Car Horn

Each audio sample is preprocessed into Mel spectrograms for input into the CNN model.

Project Components

Data Preparation

Dataset Loading: Utilized the ESC-50 dataset, filtering for the selected classes.
Preprocessing: Applied Mel Spectrogram transformation with normalization.
Data Splitting: Split into training (80%) and validation (20%) sets.

Model Construction

CNN Architecture: Built a deep learning model with four convolutional layers, max pooling, dropout layers, and fully connected layers for classification.
Training Parameters:
- Batch size: 32
- Epochs: 500
- Learning rate: 0.001

Training and Validation

Loss Function: Cross-Entropy Loss
Optimizer: Adam optimizer with learning rate scheduling using ReduceLROnPlateau.
Training and Validation: Monitored loss and accuracy over epochs using Weights & Biases (WandB) for visualization and experiment tracking.

Inference

Implemented inference pipeline to test the trained model on unseen data.
Added visualizations, including:
- Class Distribution Histogram
- Confusion Matrix
- Sample Spectrograms

Results

Performance

Accuracy: Achieved 78% accuracy on the validation set, demonstrating effective classification for most classes.

Visualizations

Class-Wise Prediction Distribution

Confusion Matrix

Learning Curves

Observations

Street Music: This class was not predicted at all during inference. Analysis of the confusion matrix and histograms suggests that:
- The model may have struggled to identify distinct features for "Street Music".
- Insufficient samples or low representation of this class in the training data could have contributed.
- High overlap in features between "Street Music" and other urban sounds, such as "Siren" or "Engine", might have led to misclassifications.

Conclusion

This project highlights the effectiveness of CNNs in sound classification tasks using the ESC-50 dataset. However, the absence of predictions for "Street Music" emphasizes the importance of:

Ensuring balanced and adequate representation of all classes.
Experimenting with advanced feature extraction techniques to differentiate overlapping classes.
Exploring alternative architectures or regularization techniques to improve generalization.

Future work could address these limitations by fine-tuning the dataset, exploring pre-trained models, or using domain-specific audio augmentation techniques.

How to Run

Clone the repository.

git clone https://github.com/Codeexia0/Sound_Classification

Install the required dependencies.
```
pip install -r requirements.txt
```
Preprocess the dataset and train the model.
```
python train.py
```
Run inference and generate visualizations.
```
python inference.py
```

Acknowledgments

ESC-50 Dataset: Karol J. Piczak
WandB: Weights & Biases for experiment tracking

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
ESC-50-master		ESC-50-master
__pycache__		__pycache__
images		images
wandb		wandb
README.md		README.md
cnn.py		cnn.py
inference.py		inference.py
quit		quit
requirements.txt		requirements.txt
train.py		train.py
urbansounddataset.py		urbansounddataset.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sound Classification Using CNN on ESC-50 Dataset

Project Overview

Dataset

Project Components

Data Preparation

Model Construction

Training and Validation

Inference

Results

Performance

Visualizations

Class-Wise Prediction Distribution

Confusion Matrix

Learning Curves

Observations

Conclusion

How to Run

Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Codeexia0/Sound_Classification

Folders and files

Latest commit

History

Repository files navigation

Sound Classification Using CNN on ESC-50 Dataset

Project Overview

Dataset

Project Components

Data Preparation

Model Construction

Training and Validation

Inference

Results

Performance

Visualizations

Class-Wise Prediction Distribution

Confusion Matrix

Learning Curves

Observations

Conclusion

How to Run

Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages