-
Notifications
You must be signed in to change notification settings - Fork 4
Add Tencent HY-MT models and Edge-TTS support #8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Introduces support for Tencent HY-MT1.5-7B and HY-MT1.5-1.8B local translation models and integrates Edge-TTS for high-quality neural text-to-speech. Updates the GUI to allow selection of local translation models, improves compatibility with PyTorch Nightly and CUDA 12.8, and enhances requirements for broader hardware and software support. Adds Windows batch scripts for launching the GUI, Web UI, and text-to-speech utilities. Updates documentation to reflect new features and installation instructions.
Reviewer's GuideAdds support for Tencent HY-MT1.5 local translation models and Edge-TTS, refactors local speech-to-text to use a Whisper pipeline, wires model selection through the GUI, broadens Python package requirements for newer PyTorch/CUDA stacks, and adds Windows convenience launch scripts plus docs updates. Sequence diagram for local translation using Tencent HY-MT models and Edge-TTSsequenceDiagram
actor User
participant TranslatorGUI
participant CustomTranslator
participant WhisperPipeline as Whisper_pipeline
participant Llama2 as Llama2_13b_MBart
participant HY7B as HY_MT1_5_7B
participant HY1_8B as HY_MT1_5_1_8B
participant EdgeTTS as Edge_TTS
User->>TranslatorGUI: Select TranslationMethod Local
User->>TranslatorGUI: Select LocalModel (HY-MT1.5-1.8B | HY-MT1.5-7B | Llama2-13b)
User->>TranslatorGUI: Click Translate
TranslatorGUI->>TranslatorGUI: translate()
TranslatorGUI->>TranslatorGUI: start thread run_translation(output_path, local_model_name)
TranslatorGUI->>CustomTranslator: process_audio_chunk(input_path, target_language, src_lang, chunk_idx, output_path, Local, local_model_name)
alt Whisper pipeline not loaded
CustomTranslator->>CustomTranslator: load_models()
CustomTranslator->>WhisperPipeline: create pipeline(model_id distil_whisper_distil_large_v3)
WhisperPipeline-->>CustomTranslator: pipeline instance
end
CustomTranslator->>WhisperPipeline: pipe(input_path, generate_kwargs language, task translate)
WhisperPipeline-->>CustomTranslator: transcription text
alt local_model_name is Llama2_13b
CustomTranslator->>Llama2: load MBart model and tokenizer
CustomTranslator->>Llama2: generate(input_ids, forced_bos_token_id)
Llama2-->>CustomTranslator: translated_text
else local_model_name is HY_MT1_5_7B
alt HY_MT1_5_7B not loaded
CustomTranslator->>CustomTranslator: load_hy_model()
CustomTranslator->>HY7B: load HY_MT1_5_7B and tokenizer
HY7B-->>CustomTranslator: model and tokenizer
end
CustomTranslator->>CustomTranslator: build prompt and messages
CustomTranslator->>HY7B: apply_chat_template(messages)
HY7B-->>CustomTranslator: tokenized_chat
CustomTranslator->>HY7B: generate(tokenized_chat, max_new_tokens, sampling params)
HY7B-->>CustomTranslator: generated_ids
CustomTranslator->>CustomTranslator: decode new_tokens to translated_text
else local_model_name is HY_MT1_5_1_8B
alt HY_MT1_5_1_8B not loaded
CustomTranslator->>CustomTranslator: load_hy_small_model()
CustomTranslator->>HY1_8B: load HY_MT1_5_1_8B and tokenizer
HY1_8B-->>CustomTranslator: model and tokenizer
end
CustomTranslator->>CustomTranslator: build prompt and messages
CustomTranslator->>HY1_8B: apply_chat_template(messages)
HY1_8B-->>CustomTranslator: tokenized_chat
CustomTranslator->>HY1_8B: generate(tokenized_chat, max_new_tokens, sampling params)
HY1_8B-->>CustomTranslator: generated_ids
CustomTranslator->>CustomTranslator: decode new_tokens to translated_text
end
CustomTranslator-->>TranslatorGUI: translated_text
TranslatorGUI->>TranslatorGUI: append translated_text to UI
loop for each merged translated chunk
TranslatorGUI->>CustomTranslator: generate_audio(text, output_path, target_language, input_path)
CustomTranslator->>EdgeTTS: Communicate(text, voice)
EdgeTTS-->>CustomTranslator: mp3_output
alt output_path is wav
CustomTranslator->>CustomTranslator: convert MP3 to WAV via ffmpeg
end
CustomTranslator-->>TranslatorGUI: audio file path
end
TranslatorGUI-->>User: Show status and allow playback of generated speech
Updated class diagram for CustomTranslator and GUI integrationclassDiagram
class CustomTranslator {
+StringVar target_language
+object pipe
+object hy_model
+object hy_tokenizer
+object hy_small_model
+object hy_small_tokenizer
+dtype torch_dtype
+__init__()
+load_models()
+load_hy_model()
+load_hy_small_model()
+process_audio_chunk(input_path, target_language, src_lang, chunk_idx, output_path, Target_Text_Translation_Option, local_model_name)
+generate_audio(text, output_path, target_language, input_path)
}
class TranslatorGUI {
+StringVar stringvarTextTranslationOption
+CTkOptionMenu target_TextTranslationOption_dropdown
+StringVar stringvarLocalModel
+CTkOptionMenu local_model_dropdown
+CTkLabel label_local_model
+StringVar stringvarlanguage
+StringVar stringvarsource_AudioFileLang
+object translator_instance
+textwidget text_translated
+label label_status
+button save_button
+button clear_button
+filepath audio_path
+__init__(master)
+Update_Gui(local, online, hybrid)
+switch_event()
+translate()
+run_translation(output_path, local_model_name)
+clear_text()
}
TranslatorGUI --> CustomTranslator : uses for translation and TTS
class Whisper_pipeline {
+pipeline pipe
+from_pretrained(model_id)
}
class HY_MT1_5_7B_Model {
+AutoTokenizer hy_tokenizer
+AutoModelForCausalLM hy_model
+apply_chat_template(messages, tokenize, add_generation_prompt, return_tensors)
+generate(tokenized_chat, max_new_tokens, do_sample, top_k, top_p, temperature, repetition_penalty, eos_token_id, pad_token_id)
}
class HY_MT1_5_1_8B_Model {
+AutoTokenizer hy_small_tokenizer
+AutoModelForCausalLM hy_small_model
+apply_chat_template(messages, tokenize, add_generation_prompt, return_tensors)
+generate(tokenized_chat, max_new_tokens, do_sample, top_k, top_p, temperature, repetition_penalty, eos_token_id, pad_token_id)
}
class Llama2_13b_MBart_Model {
+MBartForConditionalGeneration tt
+MBart50TokenizerFast tokenizer
+generate(input_ids, forced_bos_token_id)
}
CustomTranslator --> Whisper_pipeline : manages
CustomTranslator --> HY_MT1_5_7B_Model : optional local model
CustomTranslator --> HY_MT1_5_1_8B_Model : optional local model
CustomTranslator --> Llama2_13b_MBart_Model : legacy local model
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey - I've found 5 issues, and left some high level feedback:
- The HY-MT1.5-7B and HY-MT1.5-1.8B branches in
process_audio_chunkduplicate a lot of logic (language mapping, prompt construction, chat template application, generation parameters); consider extracting shared helpers (e.g., a genericrun_hy_mt_translation(model, tokenizer, transcription, target_language)) to keep this easier to maintain. - In
translator_gui.pyyou callself.target_TextTranslationOption_dropdown.set(TextTranslationOption[0])twice in__init__; you can drop the duplicate call to avoid confusion about the intended default behavior. - The new Edge-TTS
generate_audioimplementation usesasyncio.run, which can raise aRuntimeErrorif called from an existing event loop; since this may be used from GUI contexts, consider usingasyncio.get_event_loop()withrun_until_complete(or a dedicated thread) to make it robust in more environments.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The HY-MT1.5-7B and HY-MT1.5-1.8B branches in `process_audio_chunk` duplicate a lot of logic (language mapping, prompt construction, chat template application, generation parameters); consider extracting shared helpers (e.g., a generic `run_hy_mt_translation(model, tokenizer, transcription, target_language)`) to keep this easier to maintain.
- In `translator_gui.py` you call `self.target_TextTranslationOption_dropdown.set(TextTranslationOption[0])` twice in `__init__`; you can drop the duplicate call to avoid confusion about the intended default behavior.
- The new Edge-TTS `generate_audio` implementation uses `asyncio.run`, which can raise a `RuntimeError` if called from an existing event loop; since this may be used from GUI contexts, consider using `asyncio.get_event_loop()` with `run_until_complete` (or a dedicated thread) to make it robust in more environments.
## Individual Comments
### Comment 1
<location> `OpenTranslator/translator_gui.py:402-411` </location>
<code_context>
+ try:
</code_context>
<issue_to_address>
**issue (bug_risk):** Early return on error inside the chunk loop skips cleanup and finalization logic.
Returning from inside the chunk loop prevents the remaining cleanup/finalization in `run_translation` (merging chunks, resetting UI state, marking completion, deleting temp files) from running, which can leave the GUI and filesystem inconsistent. Refactor so shared cleanup runs in a `finally` block or dedicated helper, allowing errors to propagate while still performing required finalization.
</issue_to_address>
### Comment 2
<location> `OpenTranslator/audio_translator.py:269-274` </location>
<code_context>
+ self.load_hy_model()
+
+ # Language code to name mapping
+ code_to_name = {
+ "en": "English", "es": "Spanish", "fr": "French", "de": "German",
+ "ja": "Japanese", "ko": "Korean", "tr": "Turkish", "ar": "Arabic",
+ "ru": "Russian", "hu": "Hebrew", "hi": "Hindi", "it": "Italian",
+ "pt": "Portuguese", "zh": "Chinese", "cs": "Czech", "nl": "Dutch",
+ "pl": "Polish"
+ }
+ target_lang_name = code_to_name.get(target_language, "English")
</code_context>
<issue_to_address>
**issue (bug_risk):** Language mapping for `hu` appears to map to Hebrew instead of Hungarian.
The `code_to_name` mapping uses `"hu": "Hebrew"`, but `hu` is the ISO code for Hungarian (`he` is typically Hebrew). This will produce incorrect language names when targeting Hungarian. If you intend to support both, please map `"hu"` to `"Hungarian"` and add a separate `"he": "Hebrew"` entry.
</issue_to_address>
### Comment 3
<location> `readme.md:84-85` </location>
<code_context>
+pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu128
+```
+
Install mecab
https://github.com/ikegami-yukino/mecab/releases
</code_context>
<issue_to_address>
**suggestion (typo):** Use the official capitalization for "MeCab".
This matches the spelling used in the official MeCab documentation and release page.
```suggestion
Install MeCab
https://github.com/ikegami-yukino/mecab/releases
```
</issue_to_address>
### Comment 4
<location> `readme.md:110` </location>
<code_context>
+ - "HY-MT1.5-1.8B" (Tencent - faster, lower VRAM requirements)
+
Or Local mode (using a set of LLM's) for audio file translation only, using a WEB UI (Gradio)
</code_context>
<issue_to_address>
**suggestion (typo):** Fix pluralization and consider standardizing capitalization in this sentence.
Change "LLM's" to the plural "LLMs" and consider updating "WEB UI" to either "web UI" or "Web UI", for example: "Or Local mode (using a set of LLMs) for audio file translation only, using a web UI (Gradio)".
```suggestion
Or Local mode (using a set of LLMs) for audio file translation only, using a web UI (Gradio)
```
</issue_to_address>
### Comment 5
<location> `OpenTranslator/audio_translator.py:225` </location>
<code_context>
- # Generate tokens on the GPU
- generated_tokens = tt.generate(input_ids=input_ids, forced_bos_token_id=tokenizer.lang_code_to_id[model_Target_language])
+ if local_model_name == 'Llama2-13b':
+ tt = MBartForConditionalGeneration.from_pretrained("SnypzZz/Llama2-13b-Language-translate").to(device)
+ tokenizer = MBart50TokenizerFast.from_pretrained("SnypzZz/Llama2-13b-Language-translate", src_lang="en_XX", device=device)
</code_context>
<issue_to_address>
**issue (complexity):** Consider extracting the duplicated HY translation logic and Edge-TTS steps into focused helper functions to simplify branches and make the main methods easier to read.
You can keep all new functionality while reducing duplication and branching with a couple of focused extra helpers.
### 1. Consolidate HY-MT translation logic
The `HY-MT1.5-7B` and `HY-MT1.5-1.8B` branches are almost identical. You can extract the shared logic into a single helper and shared mappings:
```python
# at class/module level
HY_CODE_TO_NAME = {
"en": "English", "es": "Spanish", "fr": "French", "de": "German",
"ja": "Japanese", "ko": "Korean", "tr": "Turkish", "ar": "Arabic",
"ru": "Russian", "hu": "Hebrew", "hi": "Hindi", "it": "Italian",
"pt": "Portuguese", "zh": "Chinese", "cs": "Czech", "nl": "Dutch",
"pl": "Polish",
}
def _build_hy_prompt(self, transcription: str, target_language: str) -> str:
target_lang_name = HY_CODE_TO_NAME.get(target_language, "English")
if target_language == "zh":
return f"将以下文本翻译为中文,注意只需要输出翻译后的结果,不要额外解释:\n\n{transcription}"
return (
f"Translate the following segment into {target_lang_name}, "
f"without additional explanation.\n\n{transcription}"
)
def _translate_with_hy(self, model, tokenizer, transcription: str, target_language: str) -> str:
prompt_text = self._build_hy_prompt(transcription, target_language)
messages = [{"role": "user", "content": prompt_text}]
tokenized_chat = tokenizer.apply_chat_template(
messages,
tokenize=True,
add_generation_prompt=False,
return_tensors="pt",
).to(device)
input_length = tokenized_chat.shape[1]
generated_ids = model.generate(
tokenized_chat,
max_new_tokens=512,
do_sample=True,
top_k=20,
top_p=0.6,
temperature=0.7,
repetition_penalty=1.05,
eos_token_id=tokenizer.eos_token_id,
pad_token_id=tokenizer.pad_token_id,
)
new_tokens = generated_ids[:, input_length:]
translated = tokenizer.decode(new_tokens[0], skip_special_tokens=True).strip()
return translated
```
Then the three branches inside `process_audio_chunk` become much smaller and easier to scan:
```python
if local_model_name == "Llama2-13b":
# existing MBart logic unchanged
...
elif local_model_name == "HY-MT1.5-7B":
if self.hy_model is None:
self.load_hy_model()
translated_text = self._translate_with_hy(
self.hy_model, self.hy_tokenizer, transcription, target_language
)
elif local_model_name == "HY-MT1.5-1.8B":
if self.hy_small_model is None:
self.load_hy_small_model()
translated_text = self._translate_with_hy(
self.hy_small_model, self.hy_small_tokenizer, transcription, target_language
)
```
This removes the duplicated mappings, prompt construction, chat template calls, and generation parameters while keeping behavior identical.
### 2. Reduce duplication in HY model loaders
`load_hy_model` and `load_hy_small_model` can share one internal loader:
```python
def _load_hy_generic(self, model_id: str):
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
model_id,
device_map="auto",
torch_dtype=torch.float16,
trust_remote_code=True,
)
return model, tokenizer
def load_hy_model(self):
print("Loading Tencent HY-MT1.5-7B Model...")
try:
self.hy_model, self.hy_tokenizer = self._load_hy_generic("tencent/HY-MT1.5-7B")
print("HY-MT1.5-7B loaded successfully.")
except Exception as e:
print(f"Failed to load HY-MT1.5-7B: {e}")
raise
def load_hy_small_model(self):
print("Loading Tencent HY-MT1.5-1.8B Model...")
try:
self.hy_small_model, self.hy_small_tokenizer = self._load_hy_generic("tencent/HY-MT1.5-1.8B")
print("HY-MT1.5-1.8B loaded successfully.")
except Exception as e:
print(f"Failed to load HY-MT1.5-1.8B: {e}")
raise
```
This keeps the public API and behavior but centralizes the repetitive loading pattern.
### 3. Split `generate_audio` responsibilities
`generate_audio` is doing voice selection, async TTS, and optional conversion. Pulling these into small helpers will make it easier to follow without changing behavior:
```python
def _pick_edge_voice(self, target_language: str) -> str:
edge_tts_voices = {
"en": "en-US-AriaNeural",
"es": "es-ES-ElviraNeural",
"fr": "fr-FR-DeniseNeural",
# ... rest unchanged ...
"hi": "hi-IN-SwaraNeural",
}
return edge_tts_voices.get(target_language, "en-US-AriaNeural")
async def _run_edge_tts(self, text: str, voice: str, mp3_output: str):
import edge_tts
communicate = edge_tts.Communicate(text, voice)
await communicate.save(mp3_output)
def _maybe_convert_mp3_to_wav(self, mp3_output: str, wav_output: str):
import subprocess, os, shutil
try:
subprocess.run(
["ffmpeg", "-y", "-i", mp3_output, "-acodec", "pcm_s16le", "-ar", "24000", wav_output],
check=True,
capture_output=True,
)
os.remove(mp3_output)
except Exception as conv_err:
print(f" Note: Could not convert to WAV, using MP3: {conv_err}")
shutil.move(mp3_output, wav_output.replace(".wav", ".mp3"))
```
Then `generate_audio` is mostly orchestration:
```python
def generate_audio(self, text, output_path, target_language, input_path):
import asyncio
print("Generate audio using Edge-TTS")
start_time = time.time()
voice = self._pick_edge_voice(target_language)
mp3_output = output_path.replace(".wav", ".mp3") if output_path.endswith(".wav") else output_path
try:
asyncio.run(self._run_edge_tts(text, voice, mp3_output))
if output_path.endswith(".wav") and mp3_output != output_path:
self._maybe_convert_mp3_to_wav(mp3_output, output_path)
except Exception as e:
print(f" Edge-TTS error: {e}")
raise
print(f"Generate_audio Execution time: {(time.time() - start_time) / 60:.2f} minutes")
```
This keeps all the current behavior (Edge-TTS, MP3/WAV handling, ffmpeg conversion) but makes the main method much easier to read and maintain.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| try: | ||
| translation_result = self.translator_instance.process_audio_chunk(chunk_output_path, | ||
| self.languages[self.stringvarlanguage.get()],self.Src_lang[self.stringvarsource_AudioFileLang.get()], | ||
| chunk_idx, output_path,self.target_TextTranslationOption_dropdown.get()) | ||
| chunk_idx, output_path,self.target_TextTranslationOption_dropdown.get(), local_model_name) | ||
|
|
||
| chunk_files.append(chunk_output_path) | ||
|
|
||
| self.text_translated.configure(state='normal') | ||
| self.text_translated.insert('end', f"{translation_result}\n\n") | ||
| self.text_translated.configure(state='disabled') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue (bug_risk): Early return on error inside the chunk loop skips cleanup and finalization logic.
Returning from inside the chunk loop prevents the remaining cleanup/finalization in run_translation (merging chunks, resetting UI state, marking completion, deleting temp files) from running, which can leave the GUI and filesystem inconsistent. Refactor so shared cleanup runs in a finally block or dedicated helper, allowing errors to propagate while still performing required finalization.
| code_to_name = { | ||
| "en": "English", "es": "Spanish", "fr": "French", "de": "German", | ||
| "ja": "Japanese", "ko": "Korean", "tr": "Turkish", "ar": "Arabic", | ||
| "ru": "Russian", "hu": "Hebrew", "hi": "Hindi", "it": "Italian", | ||
| "pt": "Portuguese", "zh": "Chinese", "cs": "Czech", "nl": "Dutch", | ||
| "pl": "Polish" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue (bug_risk): Language mapping for hu appears to map to Hebrew instead of Hungarian.
The code_to_name mapping uses "hu": "Hebrew", but hu is the ISO code for Hungarian (he is typically Hebrew). This will produce incorrect language names when targeting Hungarian. If you intend to support both, please map "hu" to "Hungarian" and add a separate "he": "Hebrew" entry.
| Install mecab | ||
| https://github.com/ikegami-yukino/mecab/releases |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion (typo): Use the official capitalization for "MeCab".
This matches the spelling used in the official MeCab documentation and release page.
| Install mecab | |
| https://github.com/ikegami-yukino/mecab/releases | |
| Install MeCab | |
| https://github.com/ikegami-yukino/mecab/releases |
| - "HY-MT1.5-1.8B" (Tencent - faster, lower VRAM requirements) | ||
|
|
||
|
|
||
| Or Local mode (using a set of LLM's) for audio file translation only, using a WEB UI (Gradio) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion (typo): Fix pluralization and consider standardizing capitalization in this sentence.
Change "LLM's" to the plural "LLMs" and consider updating "WEB UI" to either "web UI" or "Web UI", for example: "Or Local mode (using a set of LLMs) for audio file translation only, using a web UI (Gradio)".
| Or Local mode (using a set of LLM's) for audio file translation only, using a WEB UI (Gradio) | |
| Or Local mode (using a set of LLMs) for audio file translation only, using a web UI (Gradio) |
|
|
||
| # Generate tokens on the GPU | ||
| generated_tokens = tt.generate(input_ids=input_ids, forced_bos_token_id=tokenizer.lang_code_to_id[model_Target_language]) | ||
| if local_model_name == 'Llama2-13b': |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue (complexity): Consider extracting the duplicated HY translation logic and Edge-TTS steps into focused helper functions to simplify branches and make the main methods easier to read.
You can keep all new functionality while reducing duplication and branching with a couple of focused extra helpers.
1. Consolidate HY-MT translation logic
The HY-MT1.5-7B and HY-MT1.5-1.8B branches are almost identical. You can extract the shared logic into a single helper and shared mappings:
# at class/module level
HY_CODE_TO_NAME = {
"en": "English", "es": "Spanish", "fr": "French", "de": "German",
"ja": "Japanese", "ko": "Korean", "tr": "Turkish", "ar": "Arabic",
"ru": "Russian", "hu": "Hebrew", "hi": "Hindi", "it": "Italian",
"pt": "Portuguese", "zh": "Chinese", "cs": "Czech", "nl": "Dutch",
"pl": "Polish",
}
def _build_hy_prompt(self, transcription: str, target_language: str) -> str:
target_lang_name = HY_CODE_TO_NAME.get(target_language, "English")
if target_language == "zh":
return f"将以下文本翻译为中文,注意只需要输出翻译后的结果,不要额外解释:\n\n{transcription}"
return (
f"Translate the following segment into {target_lang_name}, "
f"without additional explanation.\n\n{transcription}"
)
def _translate_with_hy(self, model, tokenizer, transcription: str, target_language: str) -> str:
prompt_text = self._build_hy_prompt(transcription, target_language)
messages = [{"role": "user", "content": prompt_text}]
tokenized_chat = tokenizer.apply_chat_template(
messages,
tokenize=True,
add_generation_prompt=False,
return_tensors="pt",
).to(device)
input_length = tokenized_chat.shape[1]
generated_ids = model.generate(
tokenized_chat,
max_new_tokens=512,
do_sample=True,
top_k=20,
top_p=0.6,
temperature=0.7,
repetition_penalty=1.05,
eos_token_id=tokenizer.eos_token_id,
pad_token_id=tokenizer.pad_token_id,
)
new_tokens = generated_ids[:, input_length:]
translated = tokenizer.decode(new_tokens[0], skip_special_tokens=True).strip()
return translatedThen the three branches inside process_audio_chunk become much smaller and easier to scan:
if local_model_name == "Llama2-13b":
# existing MBart logic unchanged
...
elif local_model_name == "HY-MT1.5-7B":
if self.hy_model is None:
self.load_hy_model()
translated_text = self._translate_with_hy(
self.hy_model, self.hy_tokenizer, transcription, target_language
)
elif local_model_name == "HY-MT1.5-1.8B":
if self.hy_small_model is None:
self.load_hy_small_model()
translated_text = self._translate_with_hy(
self.hy_small_model, self.hy_small_tokenizer, transcription, target_language
)This removes the duplicated mappings, prompt construction, chat template calls, and generation parameters while keeping behavior identical.
2. Reduce duplication in HY model loaders
load_hy_model and load_hy_small_model can share one internal loader:
def _load_hy_generic(self, model_id: str):
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
model_id,
device_map="auto",
torch_dtype=torch.float16,
trust_remote_code=True,
)
return model, tokenizer
def load_hy_model(self):
print("Loading Tencent HY-MT1.5-7B Model...")
try:
self.hy_model, self.hy_tokenizer = self._load_hy_generic("tencent/HY-MT1.5-7B")
print("HY-MT1.5-7B loaded successfully.")
except Exception as e:
print(f"Failed to load HY-MT1.5-7B: {e}")
raise
def load_hy_small_model(self):
print("Loading Tencent HY-MT1.5-1.8B Model...")
try:
self.hy_small_model, self.hy_small_tokenizer = self._load_hy_generic("tencent/HY-MT1.5-1.8B")
print("HY-MT1.5-1.8B loaded successfully.")
except Exception as e:
print(f"Failed to load HY-MT1.5-1.8B: {e}")
raiseThis keeps the public API and behavior but centralizes the repetitive loading pattern.
3. Split generate_audio responsibilities
generate_audio is doing voice selection, async TTS, and optional conversion. Pulling these into small helpers will make it easier to follow without changing behavior:
def _pick_edge_voice(self, target_language: str) -> str:
edge_tts_voices = {
"en": "en-US-AriaNeural",
"es": "es-ES-ElviraNeural",
"fr": "fr-FR-DeniseNeural",
# ... rest unchanged ...
"hi": "hi-IN-SwaraNeural",
}
return edge_tts_voices.get(target_language, "en-US-AriaNeural")
async def _run_edge_tts(self, text: str, voice: str, mp3_output: str):
import edge_tts
communicate = edge_tts.Communicate(text, voice)
await communicate.save(mp3_output)
def _maybe_convert_mp3_to_wav(self, mp3_output: str, wav_output: str):
import subprocess, os, shutil
try:
subprocess.run(
["ffmpeg", "-y", "-i", mp3_output, "-acodec", "pcm_s16le", "-ar", "24000", wav_output],
check=True,
capture_output=True,
)
os.remove(mp3_output)
except Exception as conv_err:
print(f" Note: Could not convert to WAV, using MP3: {conv_err}")
shutil.move(mp3_output, wav_output.replace(".wav", ".mp3"))Then generate_audio is mostly orchestration:
def generate_audio(self, text, output_path, target_language, input_path):
import asyncio
print("Generate audio using Edge-TTS")
start_time = time.time()
voice = self._pick_edge_voice(target_language)
mp3_output = output_path.replace(".wav", ".mp3") if output_path.endswith(".wav") else output_path
try:
asyncio.run(self._run_edge_tts(text, voice, mp3_output))
if output_path.endswith(".wav") and mp3_output != output_path:
self._maybe_convert_mp3_to_wav(mp3_output, output_path)
except Exception as e:
print(f" Edge-TTS error: {e}")
raise
print(f"Generate_audio Execution time: {(time.time() - start_time) / 60:.2f} minutes")This keeps all the current behavior (Edge-TTS, MP3/WAV handling, ffmpeg conversion) but makes the main method much easier to read and maintain.
Introduces support for Tencent HY-MT1.5-7B and HY-MT1.5-1.8B local translation models and integrates Edge-TTS for high-quality neural text-to-speech. Updates the GUI to allow selection of local translation models, improves compatibility with PyTorch Nightly and CUDA 12.8, and enhances requirements for broader hardware and software support. Adds Windows batch scripts for launching the GUI, Web UI, and text-to-speech utilities. Updates documentation to reflect new features and installation instructions.
Summary by Sourcery
Add support for new local translation models and Edge-TTS, expose local model selection in the GUI, and update dependencies, docs, and Windows entry-point scripts to support the new capabilities and newer PyTorch/CUDA stacks.
New Features:
Enhancements:
Build:
Documentation: