Audio processing is one of the most complex tasks in artificial intelligence as compared to image processing and other classification techniques. One such application is Music Genre Classification or MGR which aims to classify the audio files in certain categories of sound to which they belong (Read More). This application is very important and requires automation to reduce the manual error and time because if we have to classify the music manually then one has to listen out each file for the complete duration. So to automate the process we use Machine Learning techniques and this project is my attempt at solving this particular problem.
-
I Utilized the open access dataset called GTZAN. It consists of 1000 clips of songs, 10 for each of the ten genres it contains.
-
Each clip is a 30 second recording of the song, which I further cropped to 3 secs that gave me 10 parts of one clip effectively increasing the size of the dataset to 10000 clips of 3sec, 100 for each of its 10 genres
-
Then I extracted various features from both time and frequency domain using librosa library (Read Here), took the mean and standard deviation of the features in a csv.
-
Some of the features i utilized:
-
Trained and Tuned three classification algorithm on the dataset namely:
- XGBoost - 90% Val. Accuracy
- CatBoost - 90% Val. Accuracy
- Random Forest Classifier - 85% Val. Accuracy
-
Created an Ensembled model using all three of the trained model using a custom bagging approach with user defined weights
- Ensembled Model (XGB, CatBoost, Random Forest) - 97.8% Valdiation Accuracy