A beginner's guide to using OpenAI's Whisper, a powerful and free to use transcription/translation model. If you find this guide helpful, please consider smashing that ⭐ button! 😎
Follow the TL;DR to get started right away!
This repository contains a practical guide designed to help users, especially those without a technical background, utilize OpenAI's Whisper for speech transcription and translation. We will utilize Google Colab to speed up the process via their free GPU. The guide includes a step-by-step walkthrough on setting up and executing transcription commands with various options. It's tailored to make the process of speech-to-text conversion accessible and straightforward.
You may also view the accompanying supplamentary tutorial video
The tutorial assumes you have an audio file (mp3, flac, wav, etc.) ready to use in the demonstration for translation/trascription. If you don't have one handy, feel free to download the sample audio file provided for transcription or translation.
- Accessing the Notebook: Open the Whisper_Tutorial.ipynb file and look for the "Open in Colab" badge at the top of the file. You may also click here
- Making a Copy in Colab: Once the notebook is open in Google Colab,
- Go to the 'File' menu in the Colab toolbar.
- Select 'Save a copy in Drive' from the dropdown menu. This will create a copy of the notebook in your Google Drive, allowing you to run and edit it without affecting the original version.
- Running the Notebook: Follow the instructions in the notebook to transcribe/translate your audio file!
Download audio files for transcription and translation. Assuming you are using these files (or a file with the same name):
- Open the Whisper_Tutorial in Colab.
- Enable the GPU (Runtime > Change runtime type > Hardware accelerator > GPU).
- Upload the audio files to Colab (click the folder icon on the left, then click the upload icon).
- Run all the cells in the notebook (Runtime > Run all).
- Download the zip folders with the transcription files (right-click
transcriptions.zip
andtranslations.zip
in the file explorer and select "Download"). - Replace the audio file names in the commands below with your own audio file names to generate custom transcriptions/translations.
To create files in the SubRip format (SRT) which is frequently used in video editing software/YouTube:
whisper audio_file.mp3 --task transcribe --output-format srt # english transcription
whisper audio_file.mp3 --task translate --output-format srt --language Mandarin # Chinese translation
To translate the audio file to a different language:
whisper audio_file.mp3 --task translate --output-format srt --language es # Spanish
You can find call Whisper's help output to get information such as supported languages by running:
whisper --help
If you want caption segment to be a 3 words max opposed to sentences:
whisper audio_file.mp3 --task translate --language Korean --output_format srt --word_timestamps True --max_words_per_line 3 # for Korean translation
whisper audio_file.mp3 --task transcribe --output_format srt --word_timestamps True --max_words_per_line 3 # for transcription
This tutorial just follow's OpenAI's Whisper's official documentation. For more information, please refer to the official documentation here.