Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Request/Suggestion] Support unpredictable frame drops and unmatching speed/pitch (drift correction) #54

Open
alopatindev opened this issue Dec 29, 2023 · 1 comment

Comments

@alopatindev
Copy link

alopatindev commented Dec 29, 2023

I'm looking for a possibility to perform (potentially destructive) audio tracks synchronization from old (dubbed in different language) and remastered versions of movies.

In my scenario, applying single audio shift is not enough: sooner or later audios become out of sync at least due to

  • unpredictable frame drops in both tracks
  • unmatching overall average speed (often with higher pitch for faster audio)

Any interest in supporting such a scenario?

Any existing projects that try to accomplish this problem?

Any ideas what's the best way to implement it?


Naive idea for implementation:

  • do initial synchronization
  • until old dubbed audio ends
    • detect whether segment potentially contains voice (with something like silero-vad) or something non-silent/non-voiced (ideally, music segment)
    • somehow measure tempo difference between the old and new audio segments
      • if it's voice — recognize it (with something like whisper.cpp) and compare time differences of first and last word of the segment, between old and new audio segment
      • if it's something else — probably just compare differences of two most loud points of old and new audio segment
    • shrink/stretch (speedup/slowdown) the (old, dubbed in other language) audio segments (the possible analyzed non-silent/non-voiced segment and any next N segments)
    • repeat

Thanks!

@alopatindev alopatindev changed the title [Request/Suggestion] Support unpredictable frame drops and unmatching speed/pitch [Request/Suggestion] Support unpredictable frame drops and unmatching speed/pitch (drift correction) Dec 29, 2023
@benfmiller
Copy link
Owner

Sorry for the late reply, and thanks for the suggestion!

Audalign currently has a "locality" feature, which breaks up audio files into segments and aligns based on the strength of the match between segments of the audio file (more info in wiki). This could be relatively easily used to stretch the audio files, but wouldn't handle frame drops.

It looks like AudioAlign's graph/feature is purely based on correlation? I don't have much time to work on this in the near future, but if it's an easy change I'd be happy to work on it. Or, I'd gladly accept pull requests!

silero-vad and whisper look like a neat idea for a new recognizer! For this case, would translated audio segments necessarily line up with word starts and ends? Would translated segments be viable as time markers, or would shrink/stretching have to be done based on the background?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants