Merge pull request #47 from Dadangdut33/dev

1.3.0 addresses the following #10 #27 #31 #32 #33 #34 #35 #36 #40 #41 #42 #44 #46
Dadangdut33 · Nov 7, 2023 · 5f459a5 · 5f459a5
2 parents 59fb816 + 255fa9e
commit 5f459a5
Show file tree

Hide file tree

Showing 117 changed files with 14,715 additions and 6,618 deletions.
diff --git a/.editorconfig b/.editorconfig
@@ -0,0 +1,2 @@
+indent_style=space
+indent_size=4
diff --git a/.gitignore b/.gitignore
@@ -2,24 +2,34 @@ __pycache__/
 
 # Environments
 .env
+.envcpu
+.envgpu
 .venv
+.venvcpu
+.venvgpu
 env/
+envcpu/
+envgpu/
 venv/
 venvcpu/
+venvgpu/
 ENV/
 env.bak/
 venv.bak/
 venvtest/
 
 # Project specific
-user/
-temp/
-ignore/
+speech_translate/_user/
+speech_translate/temp/
+speech_translate/debug/
+speech_translate/export/
+speech_translate/log/
 build/
-log/
 dist/
 output/
-export/
+
+# ignore
+ignore/
 
 # created when building
 LICENSE.txt 

diff --git a/.vscode/settings.json b/.vscode/settings.json
@@ -1,3 +1,27 @@
 {
-	"python.analysis.typeCheckingMode": "basic"
-}
+	"python.languageServer": "Pylance",
+	"python.analysis.typeCheckingMode": "basic",
+	"[python]": {
+		"editor.defaultFormatter": "eeyore.yapf",
+		"editor.formatOnSave": true,
+		"editor.formatOnPaste": true,
+		"editor.formatOnType": false,
+		"editor.codeActionsOnSave": {
+			"source.fixAll": false,
+			"source.organizeImports": false,
+			"source.organizeImports.ruff": false,
+			"source.organizeImports.python": false,
+		}
+	},
+	"yapf.args": ["--style", "{based_on_style: pep8, indent_width: 4, column_limit: 125, BLANK_LINE_BEFORE_NESTED_CLASS_OR_DEF: false, DEDENT_CLOSING_BRACKETS: true}"],
+	"ruff.enable": true,
+	"ruff.lint.args": [
+		"--line-length",
+		"125"
+	],
+	"ruff.format.args": [
+		"--line-length",
+		"125"
+	],
+	"python.analysis.autoImportCompletions": false,
+}
diff --git a/README.md b/README.md
@@ -15,23 +15,40 @@
     <a href="https://github.com/Dadangdut33/Speech-Translate/network/members"><img alt="GitHub forks" src="https://img.shields.io/github/forks/Dadangdut33/Speech-Translate?style=social"></a>
 </p>
 
-Speech Translate is a practical application that combines OpenAI's Whisper ASR model with free translation APIs. It serves as a versatile tool for both real-time / live speech-to-text and speech translation, allowing the user to seamlessly convert spoken language into written text. Additionally, it has the option to import and transcribe audio / video files effortlessly. This application aims to expand whisper ability by combining it with some translation APIs while also providing a simple and easy to use interface to create a more practical application. This application is also open source, so you can contribute to this project if you want to. 
-
-<details open>
-    <summary>Preview</summary>
-    <p align="center">
-      <img src="https://raw.githubusercontent.com/Dadangdut33/Speech-Translate/master/speech_translate/assets/1.png" width="700" alt="Speech Translate Looks">
-      <img src="https://raw.githubusercontent.com/Dadangdut33/Speech-Translate/master/speech_translate/assets/2.png" width="700" alt="Setting transcription">
-      <img src="https://raw.githubusercontent.com/Dadangdut33/Speech-Translate/master/speech_translate/assets/3.png" width="700" alt="Setting textbox">
-      <img src="https://raw.githubusercontent.com/Dadangdut33/Speech-Translate/master/speech_translate/assets/4.png" width="700" alt="About window">
-      <img src="https://raw.githubusercontent.com/Dadangdut33/Speech-Translate/master/speech_translate/assets/5.png" alt="Detached window preview">
-      Detached window preview
-      <img src="https://raw.githubusercontent.com/Dadangdut33/Speech-Translate/master/speech_translate/assets/6.png" alt="Transcribe mode on detached window (English)">
-      Transcribe mode on detached window (English)
-      <img src="https://raw.githubusercontent.com/Dadangdut33/Speech-Translate/master/speech_translate/assets/7.png" alt="Translate mode on detached window (English to Indonesia)">
-      Translate mode on detached window (English to Indonesia)
-    </p>
-  </details>
+Speech Translate is a practical application that combines OpenAI's Whisper ASR model with free translation APIs. It serves as a versatile tool for both real-time / live speech-to-text and speech translation, allowing the user to seamlessly convert spoken language into written text. Additionally, it has the option to import and transcribe audio / video files effortlessly. 
+
+Speech Translate aims to expand whisper ability by combining it with some translation APIs while also providing a simple and easy to use interface to create a more practical application. This application is also open source, so you can contribute to this project if you want to. 
+
+<p align="center">
+  <img src="preview/1.png" width="700" alt="Speech Translate Preview">
+</p>
+
+<details close>
+  <summary>Preview - Usage</summary>
+  <p align="center">
+    <img src="preview/7.png" width="700" alt="Record">
+    <img src="preview/8.png" width="700" alt="File import">
+    <img src="preview/9.png" width="700" alt="File import in progress">
+    <img src="preview/10.png" width="700" alt="Align result">
+    <img src="preview/11.png" width="700" alt="Refine result">
+    <img src="preview/12.png" width="700" alt="Translate Result">
+    <img src="preview/13.png" width="700" alt="Transcribe mode on subtitle window (English)"><br />
+    Transcribe mode on detached window (English)    
+    <img src="preview/14.png" width="700" alt="Translate mode on subtitle window (English to Indonesia)"><br />
+    Translate mode on detached window (English to Indonesia)
+  </p>
+</details>
+
+<details close>
+  <summary>Preview - Setting</summary>
+  <p align="center">
+    <img src="preview/2.png" width="700" alt="Setting - General">
+    <img src="preview/3.png" width="700" alt="Setting - Record">
+    <img src="preview/4.png" width="700" alt="Setting - Transcribe">
+    <img src="preview/5.png" width="700" alt="Setting - Translate">
+    <img src="preview/6.png" width="700" alt="Setting - Textbox">
+  </p>
+</details>
 
 <br />
 
@@ -74,9 +91,16 @@ Speech Translate is a practical application that combines OpenAI's Whisper ASR m
 
 - Speaker input only work on windows 8 and above.
 - Internet connection (for translation with API)
-- [FFmpeg](https://ffmpeg.org/) is required to be installed and added to the PATH environment variable. You can download it [here](https://ffmpeg.org/download.html) and add it to your path manually OR you can do it automatically using the following commands:
+- [FFmpeg](https://ffmpeg.org/) is required to be installed and added to the PATH environment variable. You can do it when prompted in the app, or you can download it [here](https://ffmpeg.org/download.html) and add it to your path manually. Alternatively, you can also download and add it to path automatically by using the following commands:
+
+```bash
+# on Windows using powershell (Also included in the release page, and can be run by right clicking and selecting "Run with PowerShell")
+# Must be run in an elevated PowerShell prompt (Run as administrator)
+Set-ExecutionPolicy RemoteSigned -Scope CurrentUser # Optional: Needed to run a remote script the first time
+& ([scriptblock]::Create(
+     (New-Object System.Net.WebClient).DownloadString('https://raw.githubusercontent.com/Dadangdut33/Speech-Translate/master/install_ffmpeg.ps1')
+  )) -webdl
 
-```
 # on Windows using Winget (Default package manager for Windows 10 and above)
 winget install --id=Gyan.FFmpeg  -e
 
@@ -106,20 +130,21 @@ brew install ffmpeg
 | medium |   769 M    |    `medium.en`     |      `medium`      |     ~5 GB     |      ~2x       |
 | large  |   1550 M   |        N/A         |      `large`       |    ~10 GB     |       1x       |
 
-\* This information is also available in the app (hover over the model selection in the app and there will be a tooltip about the model info). 
+\* This information is also available in the app (hover over the model selection in the app and there will be a tooltip about the model info). Also note that when using faster-whisper, the speed will be significantly faster and the model size will be reduced depending on the usage, for more information about this please visit [faster-whisper repository](https://github.com/guillaumekln/faster-whisper)
 
 
 # Installation
 
 > [!IMPORTANT]  
-> Make sure that you have installed [FFmpeg](https://ffmpeg.org/) and added it to the PATH environment variable. [See here](#requirements) for more info
+> Please take a look at the [Requirements](#requirements) first before installing. For more information about the usage of the app, please check the [wiki](https://github.com/Dadangdut33/Speech-Translate/wiki)
 
 ## From Prebuilt Binary
 
 1. Download the [latest release](https://github.com/Dadangdut33/Speech-Translate/releases/latest) (There are 2 versions, CPU and GPU)
 2. Install/extract the downloaded file
 3. Run the program
-4. Enjoy!
+4. Set the settings to your liking
+5. Enjoy!
 
 ## As A Module
 
@@ -143,9 +168,9 @@ You can then run the program by typing `speech-translate` in your terminal/conso
 
 **Notes For Installation as Module:**
 
-- If you are u**pdating from an older version**, you need to add `--upgrade --no-deps --force-reinstall` at the end of the command.
+- If you are **updating from an older version**, you need to add `--upgrade --force-reinstall` at the end of the command, if the update does not need new dependencies you can add `--no-deps` at the end of the command to speed up the installation process.
 - If you want to **install** from a **specific branch or commit**, you can do it by adding `@branch_name` or `@commit_hash` at the end of the url. Example: `pip install -U git+https://github.com/Dadangdut33/Speech-Translate.git@dev --extra-index-url https://download.pytorch.org/whl/cu118`
-- The **--extra-index-url here might not always be up to date**, so you can check the latest version of pytorch [here](https://pytorch.org/get-started/locally/). You can also check the available version of pytorch [here](https://download.pytorch.org/whl/torch_stable.html).
+- The **--extra-index-url here might not always be up to date**, so you can check the latest version of pytorch [here](https://pytorch.org/get-started/locally/). You can also check the available version of pytorch [here](https://download.pytorch.org/whl/torch_stable.html). If the newest version is not compatible then please keep using the current url shown here.
 
 # More Information
 

diff --git a/Run.py b/Run.py
@@ -3,4 +3,4 @@
 if __name__ == "__main__":
     main()
 
-# can run the app from this file or by running `python -m speech_translate`
+# can run the app from this file or by running `python -m speech_translate`
diff --git a/_pyinstaller_hooks/add_lib.py b/_pyinstaller_hooks/add_lib.py
diff --git a/build.py b/build.py
@@ -0,0 +1,122 @@
+import sys
+import os
+import shutil
+from cx_Freeze import setup, Executable
+
+sys.setrecursionlimit(5000)
+
+
+def get_env_name():
+    return os.path.basename(sys.prefix)
+
+
+def version():
+    with open(os.path.join(os.path.dirname(__file__), "speech_translate/_version.py")) as f:
+        return f.readline().split("=")[1].strip().strip('"').strip("'")
+
+
+# If you get cuda error try to remove your cuda from your system path because cx_freeze will try to include it from there
+# instead of the one in the python folder
+print(">> Building SpeechTranslate version", version())
+print(">> Environment:", get_env_name())
+
+
+def clear_dir(dir):
+    print(">> Clearing", dir)
+    try:
+        if not os.path.exists(dir):
+            return
+        if os.path.isdir(dir):
+            for f in os.listdir(dir):
+                os.remove(os.path.join(dir, f))
+
+            # remove the folder
+            os.rmdir(dir)
+        else:
+            os.remove(dir)
+    except Exception as e:
+        print(f">> Failed to clear {dir} reason: {e}")
+
+
+print(">> Clearing code folder")
+clear_dir("./speech_translate/export")
+clear_dir("./speech_translate/debug")
+clear_dir("./speech_translate/log")
+clear_dir("./speech_translate/temp")
+print(">> Done")
+
+folder_name = f"build/SpeechTranslate {version()}"
+
+build_exe_options = {
+    "excludes": ["yapf", "ruff"],
+    "packages": ["torch", "soundfile", "sounddevice", "av"],
+    "build_exe": folder_name
+}
+
+base = "Win32GUI" if sys.platform == "win32" else None
+
+setup(
+    name="SpeechTranslate",
+    version=version(),
+    description="Speech Translate",
+    options={
+        "build_exe": build_exe_options,
+    },
+    executables=[
+        Executable(
+            "Run.py",
+            base=base,
+            icon="speech_translate/assets/icon.ico",
+            target_name="SpeechTranslate.exe",
+        )
+    ],
+)
+
+# check if arg is build_exe
+if len(sys.argv) < 2 or sys.argv[1] != "build_exe":
+    sys.exit(0)
+
+print(">> Copying some more files...")
+
+# we need to copy av.libs to foldername/lib because cx_freeze doesn't copy it for some reason
+print(">> Copying av.libs to lib folder")
+shutil.copytree(f"{get_env_name()}/Lib/site-packages/av.libs", f"{folder_name}/lib/av.libs")
+
+# copy Lincese as license.txt to build folder
+print(">> Creating license.txt to build folder")
+with open("LICENSE", "r", encoding="utf-8") as f:
+    with open(f"{folder_name}/license.txt", "w", encoding="utf-8") as f2:
+        f2.write(f.read())
+
+# copy README.md as README.txt to build folder
+print(">> Creating README.txt to build folder")
+with open("build/pre_install_note.txt", "r", encoding="utf-8") as f:
+    with open(f"{folder_name}/README.txt", "w", encoding="utf-8") as f2:
+        f2.write(f.read())
+
+# create version.txt
+print(">> Creating version.txt")
+with open(f"{folder_name}/version.txt", "w", encoding="utf-8") as f:
+    f.write(version())
+
+# copy install_ffmpeg.ps1 to build folder
+print(">> Copying install_ffmpeg.ps1 to build folder")
+with open("install_ffmpeg.ps1", "r", encoding="utf-8") as f:
+    with open(f"{folder_name}/install_ffmpeg.ps1", "w", encoding="utf-8") as f2:
+        f2.write(f.read())
+
+# create link to repo
+print(">> Creating link to repo")
+with open(f"{folder_name}/homepage.url", "w", encoding="utf-8") as f:
+    f.write("[InternetShortcut]\n")
+    f.write("URL=https://github.com/Dadangdut33/Speech-Translate")
+
+print(">> Opening output folder")
+output_folder = os.path.abspath(folder_name)
+try:
+    os.startfile(output_folder)
+except Exception:
+    # linux
+    import subprocess
+
+    subprocess.call(["xdg-open", output_folder])
diff --git a/build/post_install_note.txt b/build/post_install_note.txt
@@ -0,0 +1,3 @@
+The app has been successfully installed, for more information about its usage please visit the wiki at https://github.com/Dadangdut33/Speech-Translate/wiki.
+
+For any questions or suggestions, feel free to add any issues or open a discussion on the repository.
diff --git a/build/pre_install_note.txt b/build/pre_install_note.txt
@@ -0,0 +1,20 @@
+Thanks for downloading Speech Translate.
+
+Speech Translate is a practical application that combines OpenAI's Whisper ASR model with free translation APIs. It serves as a versatile tool for both real-time / live speech-to-text and speech translation, allowing the user to seamlessly convert spoken language into written text. Additionally, it has the option to import and transcribe audio / video files effortlessly.  
+
+Requirements:
+- Windows 8.1 or higher for speaker input
+- FFmpeg installed in your system (the app will prompt you to install it if you don't have it)
+- Internet connection (for translation with API)
+- Each whisper model requires the following VRAM:
+  * tiny (~1 GB)
+  * base (~1 GB)
+  * small (~2 GB)
+  * medium (~5 GB)
+  * large (~10 GB)
+
+Whisper can be used with CPU but will be very limited when doing so. It is recommended to use a cuda compatible GPU for better performance.
+
+Please also note that when using faster-whisper, the speed will be significantly faster and the model size will be reduced depending on the usage. For more information about this please visit https://github.com/guillaumekln/faster-whisper
+
+For more information about the app, user settings, how to use it, and more please visit the wiki at https://github.com/Dadangdut33/Speech-Translate/wiki