Skip to content

Daniel-McLarty/Python-Autodub

Repository files navigation

Python Autodub

An automated, AI-powered video dubbing pipeline that extracts audio, separates vocals from background noise (using Demucs), diarizes speakers (using DiariZen), and generates translated voice clones (using F5-TTS). It then re-assembles the audio using fast, frame-accurate numpy math and muxes it back into a final MKV video file.

Copyright (C) Daniel McLarty 2026

Features

  • Vocal Separation: Isolates background noise and music from dialogue using 2-stem htdemucs, utilizing smart 10-minute chunking to prevent System RAM OOM crashes on full-length movies.
  • Speaker Diarization (DiariZen): Identifies multiple speakers automatically using state-of-the-art EEND/VBx clustering. (Automatically bypassed when targeting a single speaker to save VRAM).
  • Human-in-the-Loop Clone Editor: Optional interactive UI that pauses the pipeline, plays back the extracted voice clones synced to the video, and allows you to manually inject custom reference audio before generation begins.
  • Voice Cloning: Automatically extracts clean samples for each identified speaker and uses F5-TTS to generate high-fidelity, translated lines.
  • Anti-Hallucination & Speech Leakage: Features a "Language ID Killswitch" that automatically detects and trashes generations that leak the original language, and an ASR check that physically slices pre-speech babbling off the generated audio files.
  • Smart Audio Assembly: Uses phase vocoder time-stretching to strictly confine AI-generated audio to its exact subtitle window, preventing dialogue overlap.
  • Graphical Interface: Easy-to-use Tkinter GUI to configure thresholds, workers, file paths, and output settings (with native Windows Light/Dark theming).
  • Smart Resume & Caching: Automatically hashes input files to detect changes, allowing the pipeline to seamlessly resume interrupted jobs and skip heavy processing steps, saving hours of GPU time.

Prerequisites

  1. NVIDIA GPU: A CUDA-compatible GPU is required (Minimum 8GB VRAM recommended).
  2. No Accounts Required: Unlike previous versions, this pipeline utilizes fully open-source, ungated models. No Hugging Face tokens or API keys are needed.

FFmpeg Dependency & Licensing

To handle high-performance audio normalization, lossless chunk concatenation, and video muxing, this project utilizes FFmpeg.

Installation

  • Windows: A custom-built, optimized FFmpeg binary is included in the bin/ folder. No additional installation is required.
  • Linux: Please install FFmpeg via your system's package manager:
    • Ubuntu/Debian: sudo apt update && sudo apt install ffmpeg
    • Fedora: sudo dnf install ffmpeg
    • Arch: sudo pacman -S ffmpeg

Licensing & LGPL Compliance

This software uses a custom build of FFmpeg licensed under the GNU Lesser General Public License (LGPL) version 2.1.

  • No Changes: We have not modified the FFmpeg source code.
  • License Text: A copy of the LGPL v2.1 is provided in bin/FFMPEG_LGPL.
  • Build Instructions: Details on how this binary was configured and compiled can be found in bin/build_info.md.
  • Source Code: You can obtain the official FFmpeg source code at ffmpeg.org.

Installation & Usage

Python Autodub uses uv for lightning-fast, reproducible dependency management. The included launchers will automatically detect your environment, download necessary build tools, and sync the CUDA-enabled libraries before launching the UI.

Windows

  1. Double-click Python-Autodub.exe (or run Launch_UI.ps1 if running from source).
  2. The launcher will automatically configure MSVC Build Tools, sync the uv environment, and launch the GUI.

Linux

  1. Open your terminal in the project root.
  2. Run bash install_linux_shortcut.sh to automatically configure execution permissions and add the app to your desktop environment's application grid.
  3. Launch "Python Autodub Studio" from your app menu, or run src/Launch_UI.sh directly from the terminal.

Folder Structure & Artifacts

To keep the root directory clean, the project organizes files dynamically:

  • src/: Contains all python scripts, launchers, and base voice templates.
  • temp/: Generated during execution. Holds all intermediate files including separated vocals (htdemucs/), voice samples (base_clones/), individual generated lines (temp_lines/), and intermediate audio mixing tracks.
  • output/: Generated upon completion. Contains the final FINAL_DUBBED_MOVIE.mkv.
  • .uv_cache/: Localized package cache to support cross-drive hard linking and prevent phantom disk usage.

About

An automated, AI-powered video dubbing pipeline that extracts audio, separates vocals from background noise (using Demucs), diarizes speakers (using Pyannote), and generates translated voice clones (using F5-TTS). It then re-assembles the audio using fast, frame-accurate numpy math and muxes it back into a final MKV video file.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors