Skip to content

chore(whisper-cpp): Convert to Purego and add VAD #6087

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

richiejp
Copy link
Collaborator

@richiejp richiejp commented Aug 18, 2025

Description

Converts the Whisper backend to use Purego similar to stablediffusion. Also adds some features which are not in the upstream CGO bindings.

Notes for Reviewers

  • We could upstream the Purego bindings, but I'm not sure what that would look like, so will just try it here first.
  • Initially I've added just a new VAD backend for testing, then will convert the rest.

Signed commits

  • Yes, I signed my commits.

TODO:

  • fix VAD end time (speech segments are detected, but RT API is not submitting for transcription after period of silence, possibly time units on segments are wrong)
  • convert rest of whisper backend to purego
  • fix transcription failed bug
  • use transcriptions in-built VAD mode

Copy link

netlify bot commented Aug 18, 2025

Deploy Preview for localai ready!

Name Link
🔨 Latest commit 0345dfb
🔍 Latest deploy log https://app.netlify.com/projects/localai/deploys/68a83eb0f179e30008a6da3b
😎 Deploy Preview https://deploy-preview-6087--localai.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@richiejp richiejp force-pushed the chore/whisper-purego branch from 206a71d to 0345dfb Compare August 22, 2025 09:55
@richiejp
Copy link
Collaborator Author

ah now I realise that the VAD model can be combined with the transcribe model. So we can just call transcribe and it does VAD first and short circuits if no speech is detected. This changes a lot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant