VoiceOfTheNation

This project is made by Group 11 of course CS-671 at IIT Mandi.

We have trained deep learning models to classify an audio into one of the 9 Indian languages, namely Bengali,Tamil,Telugu,Gujarati,Marathi, Malayalam, Kannada, Urdu and Hindi.

The dataset we have used for the project is Audio Dataset with 10 Indian Languages, which is openly available on Kaggle.

Basic pipeline that we used is :-

Data Preparation -> Data Preprocessing -> Model Building -> Training -> Evaluation

-> Data Preparation = In this step, the files are renamed as "language_name'_'original_file_name'.mp3 and has been stored into a single folder instead of class folders.

-> Data Preprocessing = In this process following tasks are performed on the prepared dataset: --Silence Trimming --Normalisation --Resampling --Feature Extraction ( using librosa for cnn based models and log mel spectrogram for whisper based models).

-> Model Building = Two different models have been trained on completely different architectures- one is the cnn based approach and other is the whisper based approach.

-> Training= cnns are fed with images of mel spectrogram and whisper is fed with log mel spectrogram and the models are trained.

-> Evaluation= The following methods are used for evaluating the models: --Accuracy --Class wise and overall f1 scores --confusion matrix

Respective graphs for training+ validation loss vs epochs and embeddings visualization using t-SNE/PCA after each stage have also been plotted in both the models.

We also have made gradio apps for both the models. The cnn model is saved as .h5 file and whisper model is saved as .pt file. After that, two apps are made for both using their saved file path. The app uses microphone to capture real time audio data, and the use it for prediction of language using the respective trained models.

FUTURE WORK:-

->Instead of using mel spectrogram and related features, the models can be trained on the raw input given as a vector. This will eliminate any kind of computational noise or compression, which will help to improve efficiency of the models. ->Training on more languages ->Efficient language detection on shorter clips.

This read.me file is not ai-generated, it is written by us ourselves :) Thank you

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
DataPreprocessing		DataPreprocessing
Results/CNN		Results/CNN
app		app
models		models
README.md		README.md
dataset		dataset

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Repository files navigation

VoiceOfTheNation

Basic pipeline that we used is :-

FUTURE WORK:-

About

Uh oh!

Releases

Packages

Languages

Uh oh!

Uh oh!

riya-chaudhary11/language_classification

Folders and files

Latest commit

History

Repository files navigation

VoiceOfTheNation

Basic pipeline that we used is :-

FUTURE WORK:-

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages