Skip to content

quinten-kamphuis/forced-alignment-model

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Forced Audio-Text Alignment

Generate precise word-level timings from audio and text input. Built using torchaudio's MMS model.

Replicate

Overview

This model aligns audio with text to generate word-level timings, useful for:

  • Generating accurate subtitles/captions
  • Creating word-level audio segmentation
  • Synchronizing text with audio

Try it out on Replicate!

Development

  1. Install Cog:
curl -o /usr/local/bin/cog -L https://github.com/replicate/cog/releases/latest/download/cog_`uname -s`_`uname -m`
chmod +x /usr/local/bin/cog
  1. Run predictions:
cog predict -i audio=@audio.mp3 -i script="Your transcript here"
  1. Push to Replicate:
cog push r8.im/username/forced-alignment

License

MIT License

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages