Error in the "large-v3" model #1778
Replies: 7 comments 4 replies
-
File "/root/miniconda3/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context |
Beta Was this translation helpful? Give feedback.
-
Hello, then you use |
Beta Was this translation helpful? Give feedback.
-
Hello. |
Beta Was this translation helpful? Give feedback.
-
I had the same problem, but it seems to happen because of a mismatch between the versions.
And then go to that location and check the timestamp of the files. If they belong to a previous version (before 20231117), it will not work and you will have to update the package in that location, using the following command:
The 2 models (large-v2 and large-v3) have different input shapes: V2 has n_mels=80 and V3 has n_mels=128. |
Beta Was this translation helpful? Give feedback.
-
OpenAI's transcribe code currently gets the number of mel channels from the model and passes it to log_mel_spectrogram() to prevent these number of mel channel mismatches:
Someone should probably update the decode() example on OpenAI's whisper home page with this change so people stop tripping over this error. |
Beta Was this translation helpful? Give feedback.
-
Can you help Den 30. okt. 2024 kl. 14.53 skrev vsuryacharan ***@***.***>:
yes there should be a change in home page
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
It's worked, many thanks! |
Beta Was this translation helpful? Give feedback.
-
I am writing to bring to your attention an error I encountered while using the "large" model. The error message is as follows:
openai-whisper==20231106
ERROR:root:Given groups=1, weight of size [1280, 128, 3], expected input[1, 80, 3000] to have 128 channels, but got 80 channels instead
Given groups=1, weight of size [1280, 128, 3], expected input[1, 80, 3000] to have 128 channels, but got 80 channels instead
This error appears to be related to an inconsistency between the model's expected input shape and the actual input provided. It suggests that the model expected 128 channels in the input data but received only 80.
I would appreciate it if you could investigate this issue and provide guidance on how to resolve it. If there are any specific steps or workarounds I should follow
Beta Was this translation helpful? Give feedback.
All reactions