Show Whisper in Gradio WebUI,Convert audio data directly from Numpy types instead of secondary conversion to files. #1998
Closed
Natmat626
started this conversation in
Show and tell
Replies: 1 comment 1 reply
-
directly pass waveform at 16 kHz as numpy array into https://github.com/openai/whisper/blob/main/whisper/transcribe.py#L64 |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I am now trying to create a function similar to digital human conversation communication. Whisper has helped me a lot.
Also Gradio can be used to easily display UI on the Web.
But there is one thing that has troubled me for a long time. It seems that only the type in gradient.audio is set to ‘filepath’ in order for Whisper to correctly read the audio file ‘whisper.load_audio’。
The above figure shows the logic of Gradio.Audio processing audio data. You can see that if type==filepath, the audio data is converted back into a file again, but type==numpy will not do this.
The picture above shows the logic of Whisper.load_audio. You can see that it is a very low-level usage of 'FFmpeg'. What is directly returned is the numpy data type representing the audio data.
Gradio.Audio use 'pydub' to process audio,and the 'pydub' is base on the 'FFmpeg',
I don’t have much experience in audio, but logically it should be possible to avoid secondary conversion of audio data into files. Is there any way?
(By the way, all the similar displays I saw set the type in Gradio.Audio equal to filepath. If I am too naive, please tell me.)
Beta Was this translation helpful? Give feedback.
All reactions