Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Whisper encoder + No 30 second padding #5

Merged
merged 9 commits into from
Jun 4, 2024
Merged

Conversation

farzadab
Copy link
Contributor

@farzadab farzadab commented Jun 4, 2024

This PR enables Whisper model for training and inference, without the need for full 30 second padding.

I verified that the inference works with:

$ python -m ultravox.tools.infer_tool  -T 32 -n 10 -d boolq_in --asr \
    -m wandb://fixie/ultravox/model-llama3_whisper_s__gs__cont_lora_gs_ai_constant__longest_pad:v1

--- Sample 0 ---
Q: Transcribe <|audio|> ["do iran and afghanistan speak the same language?"]
A: Do Iran and Afghanistan speak the same language?
X: do iran and afghanistan speak the same language? [wer: 0.00, avg: 0.00]
--- Sample 1 ---
Q: Transcribe <|audio|> ["do good samaritan laws protect those who help at an accident?"]
A: Do good Samaritan laws protect those who help at an accident?
X: do good samaritan laws protect those who help at an accident? [wer: 0.00, avg: 0.00]
--- Sample 2 ---
Q: Transcribe <|audio|> ["is windows movie maker part of windows essentials?"]
A: Is Windows Movie Maker part of windows essentials?
X: is windows movie maker part of windows essentials? [wer: 0.00, avg: 0.00]
--- Sample 3 ---
Q: Transcribe <|audio|> ["is confectionary sugar the same as powdered sugar?"]
A: Is confectionary sugar the same as powdered sugar?
X: is confectionary sugar the same as powdered sugar? [wer: 0.00, avg: 0.00]
--- Sample 4 ---
Q: Transcribe <|audio|> ["is elder scrolls online the same as skyrim?"]
A: Is elder scrolls Online the same as Skyrim?
X: is elder scrolls online the same as skyrim? [wer: 0.00, avg: 0.00]
--- Sample 5 ---
Q: Transcribe <|audio|> ["can you use oyster card at epsom station?"]
A: Can you use Oyster card at Upminster station?
X: can you use oyster card at epsom station? [wer: 0.12, avg: 0.02]
--- Sample 6 ---
Q: Transcribe <|audio|> ["will there be a season 4 of da vinci's demons?"]
A: Will there be a season four of Downton Abbey?
X: will there be a season 4 of da vinci's demons? [wer: 0.45, avg: 0.08]
...

@farzadab farzadab marked this pull request as ready for review June 4, 2024 18:20
@farzadab farzadab requested a review from juberti June 4, 2024 18:23
ultravox/model/modified_whisper.py Outdated Show resolved Hide resolved
ultravox/data/datasets.py Outdated Show resolved Hide resolved
ultravox/inference/ultravox_infer.py Show resolved Hide resolved
ultravox/inference/ultravox_infer.py Outdated Show resolved Hide resolved
ultravox/data/datasets.py Outdated Show resolved Hide resolved
ultravox/data/datasets.py Show resolved Hide resolved
ultravox/inference/ultravox_infer.py Outdated Show resolved Hide resolved
ultravox/model/modified_whisper.py Outdated Show resolved Hide resolved
ultravox/model/modified_whisper.py Outdated Show resolved Hide resolved
@farzadab farzadab merged commit edc3797 into main Jun 4, 2024
1 check passed
@farzadab farzadab deleted the farzad-whisper-pr branch June 4, 2024 21:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants