Show Whisper in Gradio WebUI，Convert audio data directly from Numpy types instead of secondary conversion to files. #1998

Natmat626 · 2024-02-05T09:54:36Z

Natmat626
Feb 5, 2024

I am now trying to create a function similar to digital human conversation communication. Whisper has helped me a lot.
Also Gradio can be used to easily display UI on the Web.
But there is one thing that has troubled me for a long time. It seems that only the type in gradient.audio is set to ‘filepath’ in order for Whisper to correctly read the audio file ‘whisper.load_audio’。

The above figure shows the logic of Gradio.Audio processing audio data. You can see that if type==filepath, the audio data is converted back into a file again, but type==numpy will not do this.

The picture above shows the logic of Whisper.load_audio. You can see that it is a very low-level usage of 'FFmpeg'. What is directly returned is the numpy data type representing the audio data.

Gradio.Audio use 'pydub' to process audio,and the 'pydub' is base on the 'FFmpeg',

I don’t have much experience in audio, but logically it should be possible to avoid secondary conversion of audio data into files. Is there any way?
（By the way, all the similar displays I saw set the type in Gradio.Audio equal to filepath. If I am too naive, please tell me.）

phineas-pta · 2024-02-05T21:20:44Z

phineas-pta
Feb 5, 2024

directly pass waveform at 16 kHz as numpy array into transcribe

https://github.com/openai/whisper/blob/main/whisper/transcribe.py#L64

1 reply

Natmat626 Feb 6, 2024
Author

I solved it by changing the numpy array type to float32 type and then resampling it to 16khz, and divided it by 2^15 (32768)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Show Whisper in Gradio WebUI，Convert audio data directly from Numpy types instead of secondary conversion to files. #1998

{{title}}

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

Show Whisper in Gradio WebUI，Convert audio data directly from Numpy types instead of secondary conversion to files. #1998

Natmat626 Feb 5, 2024

Replies: 1 comment · 1 reply

phineas-pta Feb 5, 2024

Natmat626 Feb 6, 2024 Author

Natmat626
Feb 5, 2024

Replies: 1 comment 1 reply

phineas-pta
Feb 5, 2024

Natmat626 Feb 6, 2024
Author