You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This paper doesn't make any sence because you are taking average of 13mfcc features which is quite absurd as you it is ridiculous we actually have to mean for all the frames so there should .T at np.mean at feature extraction and from there everything should change your model, accuracy every thing as your function is fundamentally wrong , Hope you change it as this repo is most stared one ,so this lead to miss information for many
The text was updated successfully, but these errors were encountered:
I was also wondering about the feature extraction: In the README, it says that 3 seconds of audio are used, which matches the provided screenshot. But in this screenshot 25 mfcc are used, whereas in the notebook 13 are used with an audio duration of 2.5 seconds. Does anyone know with which features the saved model was trained?
This paper doesn't make any sence because you are taking average of 13mfcc features which is quite absurd as you it is ridiculous we actually have to mean for all the frames so there should .T at np.mean at feature extraction and from there everything should change your model, accuracy every thing as your function is fundamentally wrong , Hope you change it as this repo is most stared one ,so this lead to miss information for many
The text was updated successfully, but these errors were encountered: