A simple command-line tool to generate transcripts for podcast episodes or other audio files containing speech.
- Download and process podcast episodes or other audio content from a given URL or file path.
- Automatically resamples audio to 16kHz mono because Groq will do this anyway.
- Splits large audio files into manageable chunks.
- Transcribes audio locally using whisper-cpp
- Optionally transcribes audio locally using mlx-whisper.
- Optionally transcribes audio using the Groq API.
- Outputs transcripts in multiple formats:
- DOTe JSON
- Podlove JSON
- WebVTT (subtitle format)
- Plaintext
- Python >=3.10 (mlx does not run with 3.13, but works with 3.12)
- ffmpeg installed and available in your system’s PATH.
- A Groq API key for transcription services.
1Install the package:
pip install podcast-transcript # or pipx/uvx install podcast-transcript
Using the Groq backend requires a Groq API key to function. You can set the API key in one of the following ways:
- Environment Variable:
Set the GROQ_API_KEY environment variable in your shell:
export GROQ_API_KEY=your_api_key_here
# or
GROQ_API_KEY=your_api_key_here podcast-transcript ...
- .env File:
Create a .env file in the transcript directory (default is ~/.podcast-transcripts/) and add the following line:
GROQ_API_KEY=your_api_key_here
By default, the transcripts home directory is ~/.podcast-transcripts/. You can change this by setting the TRANSCRIPT_HOME environment variable:
export TRANSCRIPT_HOME=/path/to/your/transcripts_home
The transcript home directory is the place where you could store your .env file. The model files
for the whisper-cpp backend are also stored in the transcript home directory in a directory
called whisper-cpp-models
. The transcripts themselves are stored in a directory called transcripts
unless you specify a different directory.
By default, transcripts are stored in ~/.podcast-transcripts/transcripts/
.
You can change this by setting the TRANSCRIPT_DIR environment variable:
export TRANSCRIPT_DIR=/path/to/your/transcripts
You can also set the following environment variables or specify them in the .env file:
- TRANSCRIPT_MODEL_NAME: The name of the model to use for the transcript (default is "ggml-large-v3.bin" for whisper-cpp, "whisper-large-v3" for Groq and "mlx-community/whisper-large-v3-mlx" for MLX).
- TRANSCRIPT_PROMPT: The prompt to use for the transcription (default is "podcast-transcript").
- TRANSCRIPT_LANGUAGE: The language code for the transcription (default is en, you could set it to de for example).
To transcribe a podcast episode, run the transcribe command followed by the URL of the MP3 file:
transcribe <mp3_url>
Example:
transcribe https://d2mmy4gxasde9x.cloudfront.net/cast_audio/pp_53.mp3
Or if you want to use the Groq API:
transcribe --backend=groq https://d2mmy4gxasde9x.cloudfront.net/cast_audio/pp_53.mp3
The transcription process involves the following steps:
- Download the audio file from the provided URL or copy it from the file path if one was given.
- Convert the audio to mp3 and resample to 16kHz mono for optimal transcription.
- Split the audio into chunks if it exceeds the size limit (25 MB).
- Transcribe each audio chunk using either whisper-cpp (converts mp3 to wav first), mlx-whisper, or the Groq API.
- Combine the transcribed chunks into a single transcript.
- Generate output files in DOTe JSON, Podlove JSON, and WebVTT formats.
The output files are saved in a directory named after the episode, within the transcript directory.
- DOTe JSON (*.dote.json): A JSON format suitable for further processing or integration with other tools.
- Podlove JSON (*.podlove.json): A JSON format compatible with Podlove transcripts.
- WebVTT (*.vtt): A subtitle format that can be used for captioning in media players.
- Plaintext: Just the plain text of the transcription.
- Support for multitrack transcripts with speaker identification.
- Add support for other transcription backends (e.g., openAI, speechmatics, local whisper via pytorch).
- Add support for other audio formats (e.g., AAC, WAV, FLAC).
- Add more output formats (e.g., SRT, TTML).
- Clone the repository:
git clone https://github.com/yourusername/podcast-transcript.git
cd podcast-transcript
- Create a virtual environment:
uv venv
- Install the package in editable mode:
uv sync
The project uses pytest for testing. To run tests:
pytest
Show coverage:
coverage run -m pytest && coverage html && open htmlcov/index.html
Install pre-commit hooks to ensure code consistency:
pre-commit install
Check the type hints:
mypy src/
Build the distribution package:
uv build
Publish the package to PyPI:
uv publish --token your_pypi_token
This project is licensed under the MIT License.