Forced Audio-Text Alignment

Generate precise word-level timings from audio and text input. Built using torchaudio's MMS model.

Overview

This model aligns audio with text to generate word-level timings, useful for:

Generating accurate subtitles/captions
Creating word-level audio segmentation
Synchronizing text with audio

Try it out on Replicate!

Development

Install Cog:

curl -o /usr/local/bin/cog -L https://github.com/replicate/cog/releases/latest/download/cog_`uname -s`_`uname -m`
chmod +x /usr/local/bin/cog

Run predictions:

cog predict -i audio=@audio.mp3 -i script="Your transcript here"

Push to Replicate:

cog push r8.im/username/forced-alignment

License

MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.cog		.cog
.github/workflows		.github/workflows
__pycache__		__pycache__
.dockerignore		.dockerignore
LICENSE		LICENSE
README.md		README.md
audio.mp3		audio.mp3
cog.yaml		cog.yaml
predict.py		predict.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Forced Audio-Text Alignment

Overview

Development

License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

quinten-kamphuis/forced-alignment-model

Folders and files

Latest commit

History

Repository files navigation

Forced Audio-Text Alignment

Overview

Development

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages