Skip to content

Conversation

@overcrash66
Copy link
Owner

@overcrash66 overcrash66 commented Jan 7, 2026

Introduces support for Tencent HY-MT1.5-7B and HY-MT1.5-1.8B local translation models and integrates Edge-TTS for high-quality neural text-to-speech. Updates the GUI to allow selection of local translation models, improves compatibility with PyTorch Nightly and CUDA 12.8, and enhances requirements for broader hardware and software support. Adds Windows batch scripts for launching the GUI, Web UI, and text-to-speech utilities. Updates documentation to reflect new features and installation instructions.

Summary by Sourcery

Add support for new local translation models and Edge-TTS, expose local model selection in the GUI, and update dependencies, docs, and Windows entry-point scripts to support the new capabilities and newer PyTorch/CUDA stacks.

New Features:

  • Introduce Tencent HY-MT1.5-7B and HY-MT1.5-1.8B local translation model support in the audio translation pipeline.
  • Integrate Edge-TTS for neural text-to-speech generation with language-specific Microsoft voices.
  • Add GUI controls to select between multiple local translation backends per translation run.
  • Provide Windows batch scripts to launch the GUI, Web UI, and text-to-speech utilities from the project root.

Enhancements:

  • Refactor local Whisper-based transcription to use a Hugging Face speech-recognition pipeline with fallback behavior for better robustness.
  • Allow choosing the local translation model (Llama2-13b vs Tencent HY-MT variants) when running chunked audio translation.
  • Improve error handling during chunk processing so failures stop further work and update the UI status appropriately.
  • Expand and modernize Python and ML-related dependencies (transformers, accelerate, torch/torchaudio notes, etc.) for broader hardware and CUDA support.

Build:

  • Revise requirements.txt to include new core, ML, audio, UI, and utility dependencies and to reflect the updated installation flow for different PyTorch/CUDA setups.

Documentation:

  • Update README with instructions for installing requirements, using Tencent HY-MT models and Edge-TTS, and configuring PyTorch Nightly with CUDA 12.8 and newer GPUs.
  • Document the new local model selection options and outline performance/VRAM considerations for each model.
  • Add acknowledgements and links for Edge-TTS and Tencent HY-MT models to the credits section.

Introduces support for Tencent HY-MT1.5-7B and HY-MT1.5-1.8B local translation models and integrates Edge-TTS for high-quality neural text-to-speech. Updates the GUI to allow selection of local translation models, improves compatibility with PyTorch Nightly and CUDA 12.8, and enhances requirements for broader hardware and software support. Adds Windows batch scripts for launching the GUI, Web UI, and text-to-speech utilities. Updates documentation to reflect new features and installation instructions.
@sourcery-ai
Copy link

sourcery-ai bot commented Jan 7, 2026

Reviewer's Guide

Adds support for Tencent HY-MT1.5 local translation models and Edge-TTS, refactors local speech-to-text to use a Whisper pipeline, wires model selection through the GUI, broadens Python package requirements for newer PyTorch/CUDA stacks, and adds Windows convenience launch scripts plus docs updates.

Sequence diagram for local translation using Tencent HY-MT models and Edge-TTS

sequenceDiagram
    actor User
    participant TranslatorGUI
    participant CustomTranslator
    participant WhisperPipeline as Whisper_pipeline
    participant Llama2 as Llama2_13b_MBart
    participant HY7B as HY_MT1_5_7B
    participant HY1_8B as HY_MT1_5_1_8B
    participant EdgeTTS as Edge_TTS

    User->>TranslatorGUI: Select TranslationMethod Local
    User->>TranslatorGUI: Select LocalModel (HY-MT1.5-1.8B | HY-MT1.5-7B | Llama2-13b)
    User->>TranslatorGUI: Click Translate

    TranslatorGUI->>TranslatorGUI: translate()
    TranslatorGUI->>TranslatorGUI: start thread run_translation(output_path, local_model_name)

    TranslatorGUI->>CustomTranslator: process_audio_chunk(input_path, target_language, src_lang, chunk_idx, output_path, Local, local_model_name)

    alt Whisper pipeline not loaded
        CustomTranslator->>CustomTranslator: load_models()
        CustomTranslator->>WhisperPipeline: create pipeline(model_id distil_whisper_distil_large_v3)
        WhisperPipeline-->>CustomTranslator: pipeline instance
    end

    CustomTranslator->>WhisperPipeline: pipe(input_path, generate_kwargs language, task translate)
    WhisperPipeline-->>CustomTranslator: transcription text

    alt local_model_name is Llama2_13b
        CustomTranslator->>Llama2: load MBart model and tokenizer
        CustomTranslator->>Llama2: generate(input_ids, forced_bos_token_id)
        Llama2-->>CustomTranslator: translated_text
    else local_model_name is HY_MT1_5_7B
        alt HY_MT1_5_7B not loaded
            CustomTranslator->>CustomTranslator: load_hy_model()
            CustomTranslator->>HY7B: load HY_MT1_5_7B and tokenizer
            HY7B-->>CustomTranslator: model and tokenizer
        end
        CustomTranslator->>CustomTranslator: build prompt and messages
        CustomTranslator->>HY7B: apply_chat_template(messages)
        HY7B-->>CustomTranslator: tokenized_chat
        CustomTranslator->>HY7B: generate(tokenized_chat, max_new_tokens, sampling params)
        HY7B-->>CustomTranslator: generated_ids
        CustomTranslator->>CustomTranslator: decode new_tokens to translated_text
    else local_model_name is HY_MT1_5_1_8B
        alt HY_MT1_5_1_8B not loaded
            CustomTranslator->>CustomTranslator: load_hy_small_model()
            CustomTranslator->>HY1_8B: load HY_MT1_5_1_8B and tokenizer
            HY1_8B-->>CustomTranslator: model and tokenizer
        end
        CustomTranslator->>CustomTranslator: build prompt and messages
        CustomTranslator->>HY1_8B: apply_chat_template(messages)
        HY1_8B-->>CustomTranslator: tokenized_chat
        CustomTranslator->>HY1_8B: generate(tokenized_chat, max_new_tokens, sampling params)
        HY1_8B-->>CustomTranslator: generated_ids
        CustomTranslator->>CustomTranslator: decode new_tokens to translated_text
    end

    CustomTranslator-->>TranslatorGUI: translated_text
    TranslatorGUI->>TranslatorGUI: append translated_text to UI

    loop for each merged translated chunk
        TranslatorGUI->>CustomTranslator: generate_audio(text, output_path, target_language, input_path)
        CustomTranslator->>EdgeTTS: Communicate(text, voice)
        EdgeTTS-->>CustomTranslator: mp3_output
        alt output_path is wav
            CustomTranslator->>CustomTranslator: convert MP3 to WAV via ffmpeg
        end
        CustomTranslator-->>TranslatorGUI: audio file path
    end

    TranslatorGUI-->>User: Show status and allow playback of generated speech
Loading

Updated class diagram for CustomTranslator and GUI integration

classDiagram
    class CustomTranslator {
        +StringVar target_language
        +object pipe
        +object hy_model
        +object hy_tokenizer
        +object hy_small_model
        +object hy_small_tokenizer
        +dtype torch_dtype
        +__init__()
        +load_models()
        +load_hy_model()
        +load_hy_small_model()
        +process_audio_chunk(input_path, target_language, src_lang, chunk_idx, output_path, Target_Text_Translation_Option, local_model_name)
        +generate_audio(text, output_path, target_language, input_path)
    }

    class TranslatorGUI {
        +StringVar stringvarTextTranslationOption
        +CTkOptionMenu target_TextTranslationOption_dropdown
        +StringVar stringvarLocalModel
        +CTkOptionMenu local_model_dropdown
        +CTkLabel label_local_model
        +StringVar stringvarlanguage
        +StringVar stringvarsource_AudioFileLang
        +object translator_instance
        +textwidget text_translated
        +label label_status
        +button save_button
        +button clear_button
        +filepath audio_path
        +__init__(master)
        +Update_Gui(local, online, hybrid)
        +switch_event()
        +translate()
        +run_translation(output_path, local_model_name)
        +clear_text()
    }

    TranslatorGUI --> CustomTranslator : uses for translation and TTS

    class Whisper_pipeline {
        +pipeline pipe
        +from_pretrained(model_id)
    }

    class HY_MT1_5_7B_Model {
        +AutoTokenizer hy_tokenizer
        +AutoModelForCausalLM hy_model
        +apply_chat_template(messages, tokenize, add_generation_prompt, return_tensors)
        +generate(tokenized_chat, max_new_tokens, do_sample, top_k, top_p, temperature, repetition_penalty, eos_token_id, pad_token_id)
    }

    class HY_MT1_5_1_8B_Model {
        +AutoTokenizer hy_small_tokenizer
        +AutoModelForCausalLM hy_small_model
        +apply_chat_template(messages, tokenize, add_generation_prompt, return_tensors)
        +generate(tokenized_chat, max_new_tokens, do_sample, top_k, top_p, temperature, repetition_penalty, eos_token_id, pad_token_id)
    }

    class Llama2_13b_MBart_Model {
        +MBartForConditionalGeneration tt
        +MBart50TokenizerFast tokenizer
        +generate(input_ids, forced_bos_token_id)
    }

    CustomTranslator --> Whisper_pipeline : manages
    CustomTranslator --> HY_MT1_5_7B_Model : optional local model
    CustomTranslator --> HY_MT1_5_1_8B_Model : optional local model
    CustomTranslator --> Llama2_13b_MBart_Model : legacy local model
Loading

File-Level Changes

Change Details Files
Refactor local ASR to use HuggingFace Whisper pipeline and add Tencent HY-MT1.5 translation model backends.
  • Replace manual Distil-Whisper processor/model usage with a HuggingFace pipeline for automatic-speech-recognition, including SDPA attention and a fallback path.
  • Introduce load_hy_model and load_hy_small_model helpers to lazily load Tencent HY-MT1.5-7B and HY-MT1.5-1.8B models and tokenizers with trust_remote_code and device_map=auto.
  • Extend process_audio_chunk to accept a local_model_name selector, switching between legacy MBart/Llama2-13b and the two HY-MT1.5 variants with appropriate prompts, chat templates, and generation parameters.
  • Update local translation flow to reuse a single Whisper pipeline instance instead of reloading models per chunk and simplify transcription handling via pipeline outputs.
OpenTranslator/audio_translator.py
Integrate Edge-TTS for neural text-to-speech output, including MP3/WAV handling.
  • Replace Coqui XTTS-based generate_audio implementation with an async Edge-TTS integration using edge_tts.Communicate.
  • Map application language codes to specific Microsoft neural voices and select a default when no mapping exists.
  • Generate MP3 output by default and, when a WAV path is requested, attempt conversion via ffmpeg, including error handling and cleanup of temporary files.
OpenTranslator/audio_translator.py
Expose local translation model selection in the GUI and propagate it to translation calls.
  • Add a "Select Local Model" label and CTkOptionMenu listing HY-MT1.5-1.8B, HY-MT1.5-7B, and Llama2-13b, with default selection and state toggling based on translation mode.
  • Wire the selected local model into translate and run_translation, passing it through to process_audio_chunk for both chunked and non-chunked paths.
  • Adjust error handling in run_translation so that failures during a chunk abort further processing and status is updated appropriately.
OpenTranslator/translator_gui.py
OpenTranslator/audio_translator.py
Broaden and modernize Python dependencies for ML, audio, and UI stacks, including Edge-TTS and newer PyTorch/CUDA support.
  • Replace the old minimal requirements set with a structured list covering core numerics (numpy/scipy/pandas/scikit-learn), ML tooling (transformers/accelerate/optimum/torch/torchaudio/torchvision), audio libraries, UI frameworks, and utilities.
  • Pin or minimum-version several key libraries to more recent releases and document that torch/torchaudio/torchvision should be installed via CUDA-specific commands.
  • Add new dependencies needed for Edge-TTS, ffmpeg integration, and broader NLP support (edge-tts, gTTS, spacy, thinc, soundfile, ffmpy, six).
requirements.txt
Update documentation to describe new models, TTS backend, compatibility guidance, and usage instructions.
  • Augment feature list with Tencent HY-MT1.5 models, Edge-TTS integration, and PyTorch Nightly/CUDA 12.8 compatibility notes (including RTX 50-series).
  • Add specific install instructions for RTX Blackwell architecture using nightly cu128 wheels and simplify requirements installation to use requirements.txt.
  • Document selection of translation method and local model, clarifying trade-offs between Llama2-13b and both HY-MT1.5 variants, and extend acknowledgements with new model/tts credits.
readme.md
Add Windows batch launchers for GUI, Web UI, and text-to-speech utilities.
  • Create OpenTranslator_GUI.bat, OpenTranslator_WebUi.bat, and textToSpeech.bat that set PYTHONPATH to the current directory, activate a local venv if present, and invoke the appropriate Python entrypoint script.
  • Standardize console output to show current directory on start, aiding debugging of path-related issues on Windows.
OpenTranslator_GUI.bat
OpenTranslator_WebUi.bat
textToSpeech.bat

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 5 issues, and left some high level feedback:

  • The HY-MT1.5-7B and HY-MT1.5-1.8B branches in process_audio_chunk duplicate a lot of logic (language mapping, prompt construction, chat template application, generation parameters); consider extracting shared helpers (e.g., a generic run_hy_mt_translation(model, tokenizer, transcription, target_language)) to keep this easier to maintain.
  • In translator_gui.py you call self.target_TextTranslationOption_dropdown.set(TextTranslationOption[0]) twice in __init__; you can drop the duplicate call to avoid confusion about the intended default behavior.
  • The new Edge-TTS generate_audio implementation uses asyncio.run, which can raise a RuntimeError if called from an existing event loop; since this may be used from GUI contexts, consider using asyncio.get_event_loop() with run_until_complete (or a dedicated thread) to make it robust in more environments.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The HY-MT1.5-7B and HY-MT1.5-1.8B branches in `process_audio_chunk` duplicate a lot of logic (language mapping, prompt construction, chat template application, generation parameters); consider extracting shared helpers (e.g., a generic `run_hy_mt_translation(model, tokenizer, transcription, target_language)`) to keep this easier to maintain.
- In `translator_gui.py` you call `self.target_TextTranslationOption_dropdown.set(TextTranslationOption[0])` twice in `__init__`; you can drop the duplicate call to avoid confusion about the intended default behavior.
- The new Edge-TTS `generate_audio` implementation uses `asyncio.run`, which can raise a `RuntimeError` if called from an existing event loop; since this may be used from GUI contexts, consider using `asyncio.get_event_loop()` with `run_until_complete` (or a dedicated thread) to make it robust in more environments.

## Individual Comments

### Comment 1
<location> `OpenTranslator/translator_gui.py:402-411` </location>
<code_context>
+				try:
</code_context>

<issue_to_address>
**issue (bug_risk):** Early return on error inside the chunk loop skips cleanup and finalization logic.

Returning from inside the chunk loop prevents the remaining cleanup/finalization in `run_translation` (merging chunks, resetting UI state, marking completion, deleting temp files) from running, which can leave the GUI and filesystem inconsistent. Refactor so shared cleanup runs in a `finally` block or dedicated helper, allowing errors to propagate while still performing required finalization.
</issue_to_address>

### Comment 2
<location> `OpenTranslator/audio_translator.py:269-274` </location>
<code_context>
+                            self.load_hy_model()
+                        
+                        # Language code to name mapping
+                        code_to_name = {
+                            "en": "English", "es": "Spanish", "fr": "French", "de": "German",
+                            "ja": "Japanese", "ko": "Korean", "tr": "Turkish", "ar": "Arabic",
+                            "ru": "Russian", "hu": "Hebrew", "hi": "Hindi", "it": "Italian",
+                            "pt": "Portuguese", "zh": "Chinese", "cs": "Czech", "nl": "Dutch",
+                            "pl": "Polish"
+                        }
+                        target_lang_name = code_to_name.get(target_language, "English")
</code_context>

<issue_to_address>
**issue (bug_risk):** Language mapping for `hu` appears to map to Hebrew instead of Hungarian.

The `code_to_name` mapping uses `"hu": "Hebrew"`, but `hu` is the ISO code for Hungarian (`he` is typically Hebrew). This will produce incorrect language names when targeting Hungarian. If you intend to support both, please map `"hu"` to `"Hungarian"` and add a separate `"he": "Hebrew"` entry.
</issue_to_address>

### Comment 3
<location> `readme.md:84-85` </location>
<code_context>
+pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu128
+```
+
 Install mecab 
 https://github.com/ikegami-yukino/mecab/releases

</code_context>

<issue_to_address>
**suggestion (typo):** Use the official capitalization for "MeCab".

This matches the spelling used in the official MeCab documentation and release page.

```suggestion
Install MeCab 
https://github.com/ikegami-yukino/mecab/releases
```
</issue_to_address>

### Comment 4
<location> `readme.md:110` </location>
<code_context>
+  - "HY-MT1.5-1.8B" (Tencent - faster, lower VRAM requirements)
+

 Or Local mode (using a set of LLM's) for audio file translation only, using a WEB UI (Gradio)

</code_context>

<issue_to_address>
**suggestion (typo):** Fix pluralization and consider standardizing capitalization in this sentence.

Change "LLM's" to the plural "LLMs" and consider updating "WEB UI" to either "web UI" or "Web UI", for example: "Or Local mode (using a set of LLMs) for audio file translation only, using a web UI (Gradio)".

```suggestion
Or Local mode (using a set of LLMs) for audio file translation only, using a web UI (Gradio)
```
</issue_to_address>

### Comment 5
<location> `OpenTranslator/audio_translator.py:225` </location>
<code_context>

-                    # Generate tokens on the GPU
-                    generated_tokens = tt.generate(input_ids=input_ids, forced_bos_token_id=tokenizer.lang_code_to_id[model_Target_language])
+                    if local_model_name == 'Llama2-13b':
+                        tt = MBartForConditionalGeneration.from_pretrained("SnypzZz/Llama2-13b-Language-translate").to(device)
+                        tokenizer = MBart50TokenizerFast.from_pretrained("SnypzZz/Llama2-13b-Language-translate", src_lang="en_XX", device=device)
</code_context>

<issue_to_address>
**issue (complexity):** Consider extracting the duplicated HY translation logic and Edge-TTS steps into focused helper functions to simplify branches and make the main methods easier to read.

You can keep all new functionality while reducing duplication and branching with a couple of focused extra helpers.

### 1. Consolidate HY-MT translation logic

The `HY-MT1.5-7B` and `HY-MT1.5-1.8B` branches are almost identical. You can extract the shared logic into a single helper and shared mappings:

```python
# at class/module level
HY_CODE_TO_NAME = {
    "en": "English", "es": "Spanish", "fr": "French", "de": "German",
    "ja": "Japanese", "ko": "Korean", "tr": "Turkish", "ar": "Arabic",
    "ru": "Russian", "hu": "Hebrew", "hi": "Hindi", "it": "Italian",
    "pt": "Portuguese", "zh": "Chinese", "cs": "Czech", "nl": "Dutch",
    "pl": "Polish",
}

def _build_hy_prompt(self, transcription: str, target_language: str) -> str:
    target_lang_name = HY_CODE_TO_NAME.get(target_language, "English")
    if target_language == "zh":
        return f"将以下文本翻译为中文,注意只需要输出翻译后的结果,不要额外解释:\n\n{transcription}"
    return (
        f"Translate the following segment into {target_lang_name}, "
        f"without additional explanation.\n\n{transcription}"
    )

def _translate_with_hy(self, model, tokenizer, transcription: str, target_language: str) -> str:
    prompt_text = self._build_hy_prompt(transcription, target_language)
    messages = [{"role": "user", "content": prompt_text}]
    tokenized_chat = tokenizer.apply_chat_template(
        messages,
        tokenize=True,
        add_generation_prompt=False,
        return_tensors="pt",
    ).to(device)
    input_length = tokenized_chat.shape[1]

    generated_ids = model.generate(
        tokenized_chat,
        max_new_tokens=512,
        do_sample=True,
        top_k=20,
        top_p=0.6,
        temperature=0.7,
        repetition_penalty=1.05,
        eos_token_id=tokenizer.eos_token_id,
        pad_token_id=tokenizer.pad_token_id,
    )

    new_tokens = generated_ids[:, input_length:]
    translated = tokenizer.decode(new_tokens[0], skip_special_tokens=True).strip()
    return translated
```

Then the three branches inside `process_audio_chunk` become much smaller and easier to scan:

```python
if local_model_name == "Llama2-13b":
    # existing MBart logic unchanged
    ...
elif local_model_name == "HY-MT1.5-7B":
    if self.hy_model is None:
        self.load_hy_model()
    translated_text = self._translate_with_hy(
        self.hy_model, self.hy_tokenizer, transcription, target_language
    )
elif local_model_name == "HY-MT1.5-1.8B":
    if self.hy_small_model is None:
        self.load_hy_small_model()
    translated_text = self._translate_with_hy(
        self.hy_small_model, self.hy_small_tokenizer, transcription, target_language
    )
```

This removes the duplicated mappings, prompt construction, chat template calls, and generation parameters while keeping behavior identical.

### 2. Reduce duplication in HY model loaders

`load_hy_model` and `load_hy_small_model` can share one internal loader:

```python
def _load_hy_generic(self, model_id: str):
    tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
    model = AutoModelForCausalLM.from_pretrained(
        model_id,
        device_map="auto",
        torch_dtype=torch.float16,
        trust_remote_code=True,
    )
    return model, tokenizer

def load_hy_model(self):
    print("Loading Tencent HY-MT1.5-7B Model...")
    try:
        self.hy_model, self.hy_tokenizer = self._load_hy_generic("tencent/HY-MT1.5-7B")
        print("HY-MT1.5-7B loaded successfully.")
    except Exception as e:
        print(f"Failed to load HY-MT1.5-7B: {e}")
        raise

def load_hy_small_model(self):
    print("Loading Tencent HY-MT1.5-1.8B Model...")
    try:
        self.hy_small_model, self.hy_small_tokenizer = self._load_hy_generic("tencent/HY-MT1.5-1.8B")
        print("HY-MT1.5-1.8B loaded successfully.")
    except Exception as e:
        print(f"Failed to load HY-MT1.5-1.8B: {e}")
        raise
```

This keeps the public API and behavior but centralizes the repetitive loading pattern.

### 3. Split `generate_audio` responsibilities

`generate_audio` is doing voice selection, async TTS, and optional conversion. Pulling these into small helpers will make it easier to follow without changing behavior:

```python
def _pick_edge_voice(self, target_language: str) -> str:
    edge_tts_voices = {
        "en": "en-US-AriaNeural",
        "es": "es-ES-ElviraNeural",
        "fr": "fr-FR-DeniseNeural",
        # ... rest unchanged ...
        "hi": "hi-IN-SwaraNeural",
    }
    return edge_tts_voices.get(target_language, "en-US-AriaNeural")

async def _run_edge_tts(self, text: str, voice: str, mp3_output: str):
    import edge_tts
    communicate = edge_tts.Communicate(text, voice)
    await communicate.save(mp3_output)

def _maybe_convert_mp3_to_wav(self, mp3_output: str, wav_output: str):
    import subprocess, os, shutil
    try:
        subprocess.run(
            ["ffmpeg", "-y", "-i", mp3_output, "-acodec", "pcm_s16le", "-ar", "24000", wav_output],
            check=True,
            capture_output=True,
        )
        os.remove(mp3_output)
    except Exception as conv_err:
        print(f"  Note: Could not convert to WAV, using MP3: {conv_err}")
        shutil.move(mp3_output, wav_output.replace(".wav", ".mp3"))
```

Then `generate_audio` is mostly orchestration:

```python
def generate_audio(self, text, output_path, target_language, input_path):
    import asyncio
    print("Generate audio using Edge-TTS")

    start_time = time.time()
    voice = self._pick_edge_voice(target_language)

    mp3_output = output_path.replace(".wav", ".mp3") if output_path.endswith(".wav") else output_path

    try:
        asyncio.run(self._run_edge_tts(text, voice, mp3_output))
        if output_path.endswith(".wav") and mp3_output != output_path:
            self._maybe_convert_mp3_to_wav(mp3_output, output_path)
    except Exception as e:
        print(f"  Edge-TTS error: {e}")
        raise

    print(f"Generate_audio Execution time: {(time.time() - start_time) / 60:.2f} minutes")
```

This keeps all the current behavior (Edge-TTS, MP3/WAV handling, ffmpeg conversion) but makes the main method much easier to read and maintain.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment on lines 402 to +411
try:
translation_result = self.translator_instance.process_audio_chunk(chunk_output_path,
self.languages[self.stringvarlanguage.get()],self.Src_lang[self.stringvarsource_AudioFileLang.get()],
chunk_idx, output_path,self.target_TextTranslationOption_dropdown.get())
chunk_idx, output_path,self.target_TextTranslationOption_dropdown.get(), local_model_name)

chunk_files.append(chunk_output_path)

self.text_translated.configure(state='normal')
self.text_translated.insert('end', f"{translation_result}\n\n")
self.text_translated.configure(state='disabled')
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (bug_risk): Early return on error inside the chunk loop skips cleanup and finalization logic.

Returning from inside the chunk loop prevents the remaining cleanup/finalization in run_translation (merging chunks, resetting UI state, marking completion, deleting temp files) from running, which can leave the GUI and filesystem inconsistent. Refactor so shared cleanup runs in a finally block or dedicated helper, allowing errors to propagate while still performing required finalization.

Comment on lines +269 to +274
code_to_name = {
"en": "English", "es": "Spanish", "fr": "French", "de": "German",
"ja": "Japanese", "ko": "Korean", "tr": "Turkish", "ar": "Arabic",
"ru": "Russian", "hu": "Hebrew", "hi": "Hindi", "it": "Italian",
"pt": "Portuguese", "zh": "Chinese", "cs": "Czech", "nl": "Dutch",
"pl": "Polish"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (bug_risk): Language mapping for hu appears to map to Hebrew instead of Hungarian.

The code_to_name mapping uses "hu": "Hebrew", but hu is the ISO code for Hungarian (he is typically Hebrew). This will produce incorrect language names when targeting Hungarian. If you intend to support both, please map "hu" to "Hungarian" and add a separate "he": "Hebrew" entry.

Comment on lines 84 to 85
Install mecab
https://github.com/ikegami-yukino/mecab/releases
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (typo): Use the official capitalization for "MeCab".

This matches the spelling used in the official MeCab documentation and release page.

Suggested change
Install mecab
https://github.com/ikegami-yukino/mecab/releases
Install MeCab
https://github.com/ikegami-yukino/mecab/releases

- "HY-MT1.5-1.8B" (Tencent - faster, lower VRAM requirements)


Or Local mode (using a set of LLM's) for audio file translation only, using a WEB UI (Gradio)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (typo): Fix pluralization and consider standardizing capitalization in this sentence.

Change "LLM's" to the plural "LLMs" and consider updating "WEB UI" to either "web UI" or "Web UI", for example: "Or Local mode (using a set of LLMs) for audio file translation only, using a web UI (Gradio)".

Suggested change
Or Local mode (using a set of LLM's) for audio file translation only, using a WEB UI (Gradio)
Or Local mode (using a set of LLMs) for audio file translation only, using a web UI (Gradio)


# Generate tokens on the GPU
generated_tokens = tt.generate(input_ids=input_ids, forced_bos_token_id=tokenizer.lang_code_to_id[model_Target_language])
if local_model_name == 'Llama2-13b':
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (complexity): Consider extracting the duplicated HY translation logic and Edge-TTS steps into focused helper functions to simplify branches and make the main methods easier to read.

You can keep all new functionality while reducing duplication and branching with a couple of focused extra helpers.

1. Consolidate HY-MT translation logic

The HY-MT1.5-7B and HY-MT1.5-1.8B branches are almost identical. You can extract the shared logic into a single helper and shared mappings:

# at class/module level
HY_CODE_TO_NAME = {
    "en": "English", "es": "Spanish", "fr": "French", "de": "German",
    "ja": "Japanese", "ko": "Korean", "tr": "Turkish", "ar": "Arabic",
    "ru": "Russian", "hu": "Hebrew", "hi": "Hindi", "it": "Italian",
    "pt": "Portuguese", "zh": "Chinese", "cs": "Czech", "nl": "Dutch",
    "pl": "Polish",
}

def _build_hy_prompt(self, transcription: str, target_language: str) -> str:
    target_lang_name = HY_CODE_TO_NAME.get(target_language, "English")
    if target_language == "zh":
        return f"将以下文本翻译为中文,注意只需要输出翻译后的结果,不要额外解释:\n\n{transcription}"
    return (
        f"Translate the following segment into {target_lang_name}, "
        f"without additional explanation.\n\n{transcription}"
    )

def _translate_with_hy(self, model, tokenizer, transcription: str, target_language: str) -> str:
    prompt_text = self._build_hy_prompt(transcription, target_language)
    messages = [{"role": "user", "content": prompt_text}]
    tokenized_chat = tokenizer.apply_chat_template(
        messages,
        tokenize=True,
        add_generation_prompt=False,
        return_tensors="pt",
    ).to(device)
    input_length = tokenized_chat.shape[1]

    generated_ids = model.generate(
        tokenized_chat,
        max_new_tokens=512,
        do_sample=True,
        top_k=20,
        top_p=0.6,
        temperature=0.7,
        repetition_penalty=1.05,
        eos_token_id=tokenizer.eos_token_id,
        pad_token_id=tokenizer.pad_token_id,
    )

    new_tokens = generated_ids[:, input_length:]
    translated = tokenizer.decode(new_tokens[0], skip_special_tokens=True).strip()
    return translated

Then the three branches inside process_audio_chunk become much smaller and easier to scan:

if local_model_name == "Llama2-13b":
    # existing MBart logic unchanged
    ...
elif local_model_name == "HY-MT1.5-7B":
    if self.hy_model is None:
        self.load_hy_model()
    translated_text = self._translate_with_hy(
        self.hy_model, self.hy_tokenizer, transcription, target_language
    )
elif local_model_name == "HY-MT1.5-1.8B":
    if self.hy_small_model is None:
        self.load_hy_small_model()
    translated_text = self._translate_with_hy(
        self.hy_small_model, self.hy_small_tokenizer, transcription, target_language
    )

This removes the duplicated mappings, prompt construction, chat template calls, and generation parameters while keeping behavior identical.

2. Reduce duplication in HY model loaders

load_hy_model and load_hy_small_model can share one internal loader:

def _load_hy_generic(self, model_id: str):
    tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
    model = AutoModelForCausalLM.from_pretrained(
        model_id,
        device_map="auto",
        torch_dtype=torch.float16,
        trust_remote_code=True,
    )
    return model, tokenizer

def load_hy_model(self):
    print("Loading Tencent HY-MT1.5-7B Model...")
    try:
        self.hy_model, self.hy_tokenizer = self._load_hy_generic("tencent/HY-MT1.5-7B")
        print("HY-MT1.5-7B loaded successfully.")
    except Exception as e:
        print(f"Failed to load HY-MT1.5-7B: {e}")
        raise

def load_hy_small_model(self):
    print("Loading Tencent HY-MT1.5-1.8B Model...")
    try:
        self.hy_small_model, self.hy_small_tokenizer = self._load_hy_generic("tencent/HY-MT1.5-1.8B")
        print("HY-MT1.5-1.8B loaded successfully.")
    except Exception as e:
        print(f"Failed to load HY-MT1.5-1.8B: {e}")
        raise

This keeps the public API and behavior but centralizes the repetitive loading pattern.

3. Split generate_audio responsibilities

generate_audio is doing voice selection, async TTS, and optional conversion. Pulling these into small helpers will make it easier to follow without changing behavior:

def _pick_edge_voice(self, target_language: str) -> str:
    edge_tts_voices = {
        "en": "en-US-AriaNeural",
        "es": "es-ES-ElviraNeural",
        "fr": "fr-FR-DeniseNeural",
        # ... rest unchanged ...
        "hi": "hi-IN-SwaraNeural",
    }
    return edge_tts_voices.get(target_language, "en-US-AriaNeural")

async def _run_edge_tts(self, text: str, voice: str, mp3_output: str):
    import edge_tts
    communicate = edge_tts.Communicate(text, voice)
    await communicate.save(mp3_output)

def _maybe_convert_mp3_to_wav(self, mp3_output: str, wav_output: str):
    import subprocess, os, shutil
    try:
        subprocess.run(
            ["ffmpeg", "-y", "-i", mp3_output, "-acodec", "pcm_s16le", "-ar", "24000", wav_output],
            check=True,
            capture_output=True,
        )
        os.remove(mp3_output)
    except Exception as conv_err:
        print(f"  Note: Could not convert to WAV, using MP3: {conv_err}")
        shutil.move(mp3_output, wav_output.replace(".wav", ".mp3"))

Then generate_audio is mostly orchestration:

def generate_audio(self, text, output_path, target_language, input_path):
    import asyncio
    print("Generate audio using Edge-TTS")

    start_time = time.time()
    voice = self._pick_edge_voice(target_language)

    mp3_output = output_path.replace(".wav", ".mp3") if output_path.endswith(".wav") else output_path

    try:
        asyncio.run(self._run_edge_tts(text, voice, mp3_output))
        if output_path.endswith(".wav") and mp3_output != output_path:
            self._maybe_convert_mp3_to_wav(mp3_output, output_path)
    except Exception as e:
        print(f"  Edge-TTS error: {e}")
        raise

    print(f"Generate_audio Execution time: {(time.time() - start_time) / 60:.2f} minutes")

This keeps all the current behavior (Edge-TTS, MP3/WAV handling, ffmpeg conversion) but makes the main method much easier to read and maintain.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants