A desktop application for transcribing and translating audio files especially made for anime to generate subtitles, built with Python and PyQt6.
AniSubsAI provides a user-friendly interface to:
- Transcribe Audio & Video: Process both audio and video files, automatically extracting audio from videos.
- Multi-Engine Translation: Translate text using a variety of services, including Gemini, Google Translate, DeepL, and more.
- Flexible Subtitle Generation: Generate subtitles in multiple standard formats, including SRT, VTT, and ASS.
- Advanced Customization:
- Select from a wide range of source and target languages.
- Automatically detect the source language.
- Choose your processing device (CPU/GPU) for transcription.
- Easily switch between different
faster-whispermodels.
The application is designed with a modern, dark-themed UI and a modular architecture that allows for future expansion, such as adding more translation services.
- Video and Audio Support: Transcribe directly from
.mp3,.wav,.mp4, and.mkvfiles. - High-Quality Transcription: Powered by
faster-whisperfor fast and accurate speech-to-text. - Multi-Engine Translation: Choose between Gemini, Google Translate, DeepL, Microsoft Translator, and MyMemory.
- Automatic Language Detection: Let the application automatically detect the source language of your media.
- Multiple Export Formats: Save your subtitles as
.srt,.vtt,.ass, or plain.txtfiles. - Drag and Drop: Easily add files to the application by dragging them onto the window.
- Device Selection: Choose between CPU and GPU for transcription to balance speed and resource usage.
- Customizable Models: Easily switch between different
faster-whispermodels (e.g., small, medium, large) via the.envfile. - Modern UI: A sleek, responsive interface built with PyQt6.
-
Clone the repository:
git clone <repository-url> cd AniSubsAI
-
Create a virtual environment:
python -m venv venv source venv/bin/activate # On Windows, use `venv\Scripts\activate`
-
Install dependencies:
pip install -r requirements.txt
-
Configure your environment:
- Create a
.envfile in the root of the project. - Add your Gemini API key to the
.envfile if you plan to use the Gemini translator:GEMINI_API_KEY="YOUR_API_KEY_HERE" - (Optional) Customize the
WHISPER_MODELandGEMINI_MODELin the.envfile.
- Create a
-
(Optional) GPU Support: For GPU-accelerated transcription, you will need to have the NVIDIA CUDA Toolkit installed and then install the following dependencies:
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
Once the setup is complete, you can run the application with the following command:
python main.pyThe project is organized into the following directories:
core/: Shared utilities and configuration.gui/: All PyQt6 GUI components, stylesheets, and workers.models/: The directory wherefaster-whispermodels are downloaded and stored.transcriber/: Thefaster-whispertranscription logic.translator/: The Gemini translation logic.main.py: The main entry point for the application.