You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Excuse me, I have search that the audio pretrained model used in this project is Speech-Command, and it use over 105,000 WAVE audio files of people saying thirty different words.
So this base model have the ability to well recognize many different words, and its learned low level feature should be only associated with the speech only.
But my question is that why the transfer learning model trained on some very different audio samples like clap table, water sounds, whistle, etc, such non-speech sounds, are also magically perform very well?
The text was updated successfully, but these errors were encountered:
Excuse me, I have search that the audio pretrained model used in this project is Speech-Command, and it use over 105,000 WAVE audio files of people saying thirty different words.
So this base model have the ability to well recognize many different words, and its learned low level feature should be only associated with the speech only.
But my question is that why the transfer learning model trained on some very different audio samples like clap table, water sounds, whistle, etc, such non-speech sounds, are also magically perform very well?
The text was updated successfully, but these errors were encountered: