Fine-tuning Whisper #759
Replies: 17 comments 50 replies
-
I would suggest you to split your audio in to smaller chunks, because Whisper cannot process audio longer than 30s. |
Beta Was this translation helpful? Give feedback.
-
Hey @ehsantaati! Cool to see that you're fine-tuning Whisper for telephone recordings. This Colab nicely explains how you can use Transformer's You can play around a bit with the |
Beta Was this translation helpful? Give feedback.
-
@sanchit-gandhi |
Beta Was this translation helpful? Give feedback.
-
Whisper (large model) transcribes drug names wrong quite often when used to transcribe medical audio files. Is there some way that I can add say a medical dictionary (in text format) or any other way to improve the accuracy of drug names? |
Beta Was this translation helpful? Give feedback.
-
Thanks. Will give it a try.
…On Fri, Feb 17, 2023 at 7:50 PM Sanchit Gandhi ***@***.***> wrote:
Does this help?
https://discuss.huggingface.co/t/adding-custom-vocabularies-on-whisper/29311/2?u=nbroad
—
Reply to this email directly, view it on GitHub
<#759 (reply in thread)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AWI6XTJFJHSKTTHXJZJF42TWX6CMBANCNFSM6AAAAAATLLKZKM>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
Might I ask for scripts/instructions on fine tuning? |
Beta Was this translation helpful? Give feedback.
-
@sanchit-gandhi , @DavraYoung If I am planning to collect custom data to fine-tune the whisper model, what things I need to keep in mind while collecting audio data, can you help with the setup configuration such as:
Please let me know other things to keep in mind before collecting large data? |
Beta Was this translation helpful? Give feedback.
-
Hello all, I'm currently attempting to employ my own customized dataset (accessible here: https://huggingface.co/datasets/pedramaa/arabic-llm-egyption) for the purpose of fine-tuning in the realm of whisper transcription tasks. Having encountered significant challenges, primarily involving the selection of an appropriate environment and the formatting of my data to align with the structures found in common_voice datasets, I have finally managed to configure my local notebook with GPU support. However, I've hit a roadblock at the final step, specifically within the line containing Despite my efforts to follow the guidance provided in the article titled "https://huggingface.co/blog/fine-tune-whisper", I'm encountering difficulties when it comes to executing a custom fine-tuning process. In my most recent attempt, I encountered an error within Google Colab that seemed to require either the addition of an "updating accelerator" or the installation of If any of you have insights or ideas to offer that could potentially assist me in overcoming these challenges, I would greatly appreciate your input. Thank you |
Beta Was this translation helpful? Give feedback.
-
My issue is that I'm trying to fine-tune for someone who breaths on a
ventilator. Their utterances are short phrases of several words maximum. Is
whisper at all going to be suitable for them?
…On Mon, Aug 14, 2023, 4:38 AM Gagandeep Singh ***@***.***> wrote:
@rampedro <https://github.com/rampedro> I faced a similar issue.
Here is what you can try:
!pip install -U accelerate
and then restart runtime.
You don't need to install the packages after that in that runtime session.
—
Reply to this email directly, view it on GitHub
<#759 (reply in thread)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AE3AV7PGOMGL756M5ZWQOATXVIE5FANCNFSM6AAAAAATLLKZKM>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
@sanchit-gandhi I have been having real trouble following https://huggingface.co/blog/fine-tune-whisper with my own dataset. See my dataset here: https://huggingface.co/datasets/MathiasFoster/whisper-data |
Beta Was this translation helpful? Give feedback.
-
Hello, everyone, I want to know if I can use audio data mix with multiple languages for finetune. Like a audio someone speaks with a part of English and a part of Chinese. Can it also be used or I should avoid it? But I think it is hard for the tokenizer to face this text. |
Beta Was this translation helpful? Give feedback.
-
Is there any way to use whisper for real-time speech recognition ? |
Beta Was this translation helpful? Give feedback.
-
Hi @sanchit-gandhi, @DavraYoung, I'm currently working on fine-tuning the Whisper model to transcribe Spanish rural dialects, focusing on phonological aspects like elision and concatenation. My goal is to preserve the spoken disfluencies for linguistic analysis. I've been using @sanchit-gandhi's Colab notebook for fine-tuning but encountered an issue during the training parameter setup. Error Encountered: ImportError: Using the Could you provide any insights on resolving this ImportError? |
Beta Was this translation helpful? Give feedback.
-
any way to finetune the original openai-whisper model without using huggingface abstractions? |
Beta Was this translation helpful? Give feedback.
-
I'm fine-tuning for a patient whose voice is whispery, and breaths on a ventilator. That is, it's airy and like white noise. I'm wondering if I should include the natural noises in the training (like the patient coughing, murmering, etc.) so the model can learn this is not voice? |
Beta Was this translation helpful? Give feedback.
-
Based on this guide https://huggingface.co/blog/fine-tune-whisper, I tried to fine-tune "small" and "large-v3" models:
|
Beta Was this translation helpful? Give feedback.
-
I am trying to fine-tune the whisper to improve the WER for a simulated telephone records in English. I am using the "small model" and a dataset of around 32 hours in English with the audio duration of 8 seconds on average.
I unfreeze the decoder's attention blocks from the last block. However, while the fine-tuning performs well on the validation and test sets (just by fine-tuning the last blocks), I am getting poor WER for longer speeches ( for example a 1 minutes audio). I use the transcribe method for transcribing longer audios.
Any suggestion to improve the wer on longer speeches?
Beta Was this translation helpful? Give feedback.
All reactions