Skip to content

Commit

Permalink
Merge pull request #47 from Dadangdut33/dev
Browse files Browse the repository at this point in the history
1.3.0

addresses the following #10 #27 #31 #32 #33 #34 #35 #36 #40 #41 #42 #44 #46
  • Loading branch information
Dadangdut33 authored Nov 7, 2023
2 parents 59fb816 + 255fa9e commit 5f459a5
Show file tree
Hide file tree
Showing 117 changed files with 14,715 additions and 6,618 deletions.
2 changes: 2 additions & 0 deletions .editorconfig
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
indent_style=space
indent_size=4
20 changes: 15 additions & 5 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -2,24 +2,34 @@ __pycache__/

# Environments
.env
.envcpu
.envgpu
.venv
.venvcpu
.venvgpu
env/
envcpu/
envgpu/
venv/
venvcpu/
venvgpu/
ENV/
env.bak/
venv.bak/
venvtest/

# Project specific
user/
temp/
ignore/
speech_translate/_user/
speech_translate/temp/
speech_translate/debug/
speech_translate/export/
speech_translate/log/
build/
log/
dist/
output/
export/

# ignore
ignore/

# created when building
LICENSE.txt
Expand Down
28 changes: 26 additions & 2 deletions .vscode/settings.json
Original file line number Diff line number Diff line change
@@ -1,3 +1,27 @@
{
"python.analysis.typeCheckingMode": "basic"
}
"python.languageServer": "Pylance",
"python.analysis.typeCheckingMode": "basic",
"[python]": {
"editor.defaultFormatter": "eeyore.yapf",
"editor.formatOnSave": true,
"editor.formatOnPaste": true,
"editor.formatOnType": false,
"editor.codeActionsOnSave": {
"source.fixAll": false,
"source.organizeImports": false,
"source.organizeImports.ruff": false,
"source.organizeImports.python": false,
}
},
"yapf.args": ["--style", "{based_on_style: pep8, indent_width: 4, column_limit: 125, BLANK_LINE_BEFORE_NESTED_CLASS_OR_DEF: false, DEDENT_CLOSING_BRACKETS: true}"],
"ruff.enable": true,
"ruff.lint.args": [
"--line-length",
"125"
],
"ruff.format.args": [
"--line-length",
"125"
],
"python.analysis.autoImportCompletions": false,
}
73 changes: 49 additions & 24 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,23 +15,40 @@
<a href="https://github.com/Dadangdut33/Speech-Translate/network/members"><img alt="GitHub forks" src="https://img.shields.io/github/forks/Dadangdut33/Speech-Translate?style=social"></a>
</p>

Speech Translate is a practical application that combines OpenAI's Whisper ASR model with free translation APIs. It serves as a versatile tool for both real-time / live speech-to-text and speech translation, allowing the user to seamlessly convert spoken language into written text. Additionally, it has the option to import and transcribe audio / video files effortlessly. This application aims to expand whisper ability by combining it with some translation APIs while also providing a simple and easy to use interface to create a more practical application. This application is also open source, so you can contribute to this project if you want to.

<details open>
<summary>Preview</summary>
<p align="center">
<img src="https://raw.githubusercontent.com/Dadangdut33/Speech-Translate/master/speech_translate/assets/1.png" width="700" alt="Speech Translate Looks">
<img src="https://raw.githubusercontent.com/Dadangdut33/Speech-Translate/master/speech_translate/assets/2.png" width="700" alt="Setting transcription">
<img src="https://raw.githubusercontent.com/Dadangdut33/Speech-Translate/master/speech_translate/assets/3.png" width="700" alt="Setting textbox">
<img src="https://raw.githubusercontent.com/Dadangdut33/Speech-Translate/master/speech_translate/assets/4.png" width="700" alt="About window">
<img src="https://raw.githubusercontent.com/Dadangdut33/Speech-Translate/master/speech_translate/assets/5.png" alt="Detached window preview">
Detached window preview
<img src="https://raw.githubusercontent.com/Dadangdut33/Speech-Translate/master/speech_translate/assets/6.png" alt="Transcribe mode on detached window (English)">
Transcribe mode on detached window (English)
<img src="https://raw.githubusercontent.com/Dadangdut33/Speech-Translate/master/speech_translate/assets/7.png" alt="Translate mode on detached window (English to Indonesia)">
Translate mode on detached window (English to Indonesia)
</p>
</details>
Speech Translate is a practical application that combines OpenAI's Whisper ASR model with free translation APIs. It serves as a versatile tool for both real-time / live speech-to-text and speech translation, allowing the user to seamlessly convert spoken language into written text. Additionally, it has the option to import and transcribe audio / video files effortlessly.

Speech Translate aims to expand whisper ability by combining it with some translation APIs while also providing a simple and easy to use interface to create a more practical application. This application is also open source, so you can contribute to this project if you want to.

<p align="center">
<img src="preview/1.png" width="700" alt="Speech Translate Preview">
</p>

<details close>
<summary>Preview - Usage</summary>
<p align="center">
<img src="preview/7.png" width="700" alt="Record">
<img src="preview/8.png" width="700" alt="File import">
<img src="preview/9.png" width="700" alt="File import in progress">
<img src="preview/10.png" width="700" alt="Align result">
<img src="preview/11.png" width="700" alt="Refine result">
<img src="preview/12.png" width="700" alt="Translate Result">
<img src="preview/13.png" width="700" alt="Transcribe mode on subtitle window (English)"><br />
Transcribe mode on detached window (English)
<img src="preview/14.png" width="700" alt="Translate mode on subtitle window (English to Indonesia)"><br />
Translate mode on detached window (English to Indonesia)
</p>
</details>

<details close>
<summary>Preview - Setting</summary>
<p align="center">
<img src="preview/2.png" width="700" alt="Setting - General">
<img src="preview/3.png" width="700" alt="Setting - Record">
<img src="preview/4.png" width="700" alt="Setting - Transcribe">
<img src="preview/5.png" width="700" alt="Setting - Translate">
<img src="preview/6.png" width="700" alt="Setting - Textbox">
</p>
</details>

<br />

Expand Down Expand Up @@ -74,9 +91,16 @@ Speech Translate is a practical application that combines OpenAI's Whisper ASR m

- Speaker input only work on windows 8 and above.
- Internet connection (for translation with API)
- [FFmpeg](https://ffmpeg.org/) is required to be installed and added to the PATH environment variable. You can download it [here](https://ffmpeg.org/download.html) and add it to your path manually OR you can do it automatically using the following commands:
- [FFmpeg](https://ffmpeg.org/) is required to be installed and added to the PATH environment variable. You can do it when prompted in the app, or you can download it [here](https://ffmpeg.org/download.html) and add it to your path manually. Alternatively, you can also download and add it to path automatically by using the following commands:

```bash
# on Windows using powershell (Also included in the release page, and can be run by right clicking and selecting "Run with PowerShell")
# Must be run in an elevated PowerShell prompt (Run as administrator)
Set-ExecutionPolicy RemoteSigned -Scope CurrentUser # Optional: Needed to run a remote script the first time
& ([scriptblock]::Create(
(New-Object System.Net.WebClient).DownloadString('https://raw.githubusercontent.com/Dadangdut33/Speech-Translate/master/install_ffmpeg.ps1')
)) -webdl

```
# on Windows using Winget (Default package manager for Windows 10 and above)
winget install --id=Gyan.FFmpeg -e

Expand Down Expand Up @@ -106,20 +130,21 @@ brew install ffmpeg
| medium | 769 M | `medium.en` | `medium` | ~5 GB | ~2x |
| large | 1550 M | N/A | `large` | ~10 GB | 1x |

\* This information is also available in the app (hover over the model selection in the app and there will be a tooltip about the model info).
\* This information is also available in the app (hover over the model selection in the app and there will be a tooltip about the model info). Also note that when using faster-whisper, the speed will be significantly faster and the model size will be reduced depending on the usage, for more information about this please visit [faster-whisper repository](https://github.com/guillaumekln/faster-whisper)


# Installation

> [!IMPORTANT]
> Make sure that you have installed [FFmpeg](https://ffmpeg.org/) and added it to the PATH environment variable. [See here](#requirements) for more info
> Please take a look at the [Requirements](#requirements) first before installing. For more information about the usage of the app, please check the [wiki](https://github.com/Dadangdut33/Speech-Translate/wiki)
## From Prebuilt Binary

1. Download the [latest release](https://github.com/Dadangdut33/Speech-Translate/releases/latest) (There are 2 versions, CPU and GPU)
2. Install/extract the downloaded file
3. Run the program
4. Enjoy!
4. Set the settings to your liking
5. Enjoy!

## As A Module

Expand All @@ -143,9 +168,9 @@ You can then run the program by typing `speech-translate` in your terminal/conso

**Notes For Installation as Module:**

- If you are u**pdating from an older version**, you need to add `--upgrade --no-deps --force-reinstall` at the end of the command.
- If you are **updating from an older version**, you need to add `--upgrade --force-reinstall` at the end of the command, if the update does not need new dependencies you can add `--no-deps` at the end of the command to speed up the installation process.
- If you want to **install** from a **specific branch or commit**, you can do it by adding `@branch_name` or `@commit_hash` at the end of the url. Example: `pip install -U git+https://github.com/Dadangdut33/Speech-Translate.git@dev --extra-index-url https://download.pytorch.org/whl/cu118`
- The **--extra-index-url here might not always be up to date**, so you can check the latest version of pytorch [here](https://pytorch.org/get-started/locally/). You can also check the available version of pytorch [here](https://download.pytorch.org/whl/torch_stable.html).
- The **--extra-index-url here might not always be up to date**, so you can check the latest version of pytorch [here](https://pytorch.org/get-started/locally/). You can also check the available version of pytorch [here](https://download.pytorch.org/whl/torch_stable.html). If the newest version is not compatible then please keep using the current url shown here.

# More Information

Expand Down
2 changes: 1 addition & 1 deletion Run.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,4 +3,4 @@
if __name__ == "__main__":
main()

# can run the app from this file or by running `python -m speech_translate`
# can run the app from this file or by running `python -m speech_translate`
4 changes: 0 additions & 4 deletions _pyinstaller_hooks/add_lib.py

This file was deleted.

122 changes: 122 additions & 0 deletions build.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,122 @@
import sys
import os
import shutil
from cx_Freeze import setup, Executable

sys.setrecursionlimit(5000)


def get_env_name():
return os.path.basename(sys.prefix)


def version():
with open(os.path.join(os.path.dirname(__file__), "speech_translate/_version.py")) as f:
return f.readline().split("=")[1].strip().strip('"').strip("'")


# If you get cuda error try to remove your cuda from your system path because cx_freeze will try to include it from there
# instead of the one in the python folder
print(">> Building SpeechTranslate version", version())
print(">> Environment:", get_env_name())


def clear_dir(dir):
print(">> Clearing", dir)
try:
if not os.path.exists(dir):
return
if os.path.isdir(dir):
for f in os.listdir(dir):
os.remove(os.path.join(dir, f))

# remove the folder
os.rmdir(dir)
else:
os.remove(dir)
except Exception as e:
print(f">> Failed to clear {dir} reason: {e}")


print(">> Clearing code folder")
clear_dir("./speech_translate/export")
clear_dir("./speech_translate/debug")
clear_dir("./speech_translate/log")
clear_dir("./speech_translate/temp")
print(">> Done")

folder_name = f"build/SpeechTranslate {version()}"

build_exe_options = {
"excludes": ["yapf", "ruff"],
"packages": ["torch", "soundfile", "sounddevice", "av"],
"build_exe": folder_name
}

base = "Win32GUI" if sys.platform == "win32" else None

setup(
name="SpeechTranslate",
version=version(),
description="Speech Translate",
options={
"build_exe": build_exe_options,
},
executables=[
Executable(
"Run.py",
base=base,
icon="speech_translate/assets/icon.ico",
target_name="SpeechTranslate.exe",
)
],
)

# check if arg is build_exe
if len(sys.argv) < 2 or sys.argv[1] != "build_exe":
sys.exit(0)

print(">> Copying some more files...")

# we need to copy av.libs to foldername/lib because cx_freeze doesn't copy it for some reason
print(">> Copying av.libs to lib folder")
shutil.copytree(f"{get_env_name()}/Lib/site-packages/av.libs", f"{folder_name}/lib/av.libs")

# copy Lincese as license.txt to build folder
print(">> Creating license.txt to build folder")
with open("LICENSE", "r", encoding="utf-8") as f:
with open(f"{folder_name}/license.txt", "w", encoding="utf-8") as f2:
f2.write(f.read())

# copy README.md as README.txt to build folder
print(">> Creating README.txt to build folder")
with open("build/pre_install_note.txt", "r", encoding="utf-8") as f:
with open(f"{folder_name}/README.txt", "w", encoding="utf-8") as f2:
f2.write(f.read())

# create version.txt
print(">> Creating version.txt")
with open(f"{folder_name}/version.txt", "w", encoding="utf-8") as f:
f.write(version())

# copy install_ffmpeg.ps1 to build folder
print(">> Copying install_ffmpeg.ps1 to build folder")
with open("install_ffmpeg.ps1", "r", encoding="utf-8") as f:
with open(f"{folder_name}/install_ffmpeg.ps1", "w", encoding="utf-8") as f2:
f2.write(f.read())

# create link to repo
print(">> Creating link to repo")
with open(f"{folder_name}/homepage.url", "w", encoding="utf-8") as f:
f.write("[InternetShortcut]\n")
f.write("URL=https://github.com/Dadangdut33/Speech-Translate")

print(">> Opening output folder")
output_folder = os.path.abspath(folder_name)
try:
os.startfile(output_folder)
except Exception:
# linux
import subprocess

subprocess.call(["xdg-open", output_folder])
3 changes: 3 additions & 0 deletions build/post_install_note.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
The app has been successfully installed, for more information about its usage please visit the wiki at https://github.com/Dadangdut33/Speech-Translate/wiki.

For any questions or suggestions, feel free to add any issues or open a discussion on the repository.
20 changes: 20 additions & 0 deletions build/pre_install_note.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
Thanks for downloading Speech Translate.

Speech Translate is a practical application that combines OpenAI's Whisper ASR model with free translation APIs. It serves as a versatile tool for both real-time / live speech-to-text and speech translation, allowing the user to seamlessly convert spoken language into written text. Additionally, it has the option to import and transcribe audio / video files effortlessly.

Requirements:
- Windows 8.1 or higher for speaker input
- FFmpeg installed in your system (the app will prompt you to install it if you don't have it)
- Internet connection (for translation with API)
- Each whisper model requires the following VRAM:
* tiny (~1 GB)
* base (~1 GB)
* small (~2 GB)
* medium (~5 GB)
* large (~10 GB)

Whisper can be used with CPU but will be very limited when doing so. It is recommended to use a cuda compatible GPU for better performance.

Please also note that when using faster-whisper, the speed will be significantly faster and the model size will be reduced depending on the usage. For more information about this please visit https://github.com/guillaumekln/faster-whisper

For more information about the app, user settings, how to use it, and more please visit the wiki at https://github.com/Dadangdut33/Speech-Translate/wiki
Loading

0 comments on commit 5f459a5

Please sign in to comment.