A Python desktop application that uses AssemblyAI's transcription API to transcribe audio files with speaker identification. The application supports various audio formats and provides a modern GUI interface with advanced features.
- Modern GUI Interface: Built with CustomTkinter for a sleek, modern appearance
- Multiple File Selection: Select single files, multiple files, or entire directories
- Speaker Identification: Automatic speaker labeling in transcriptions
- Real-time Progress Tracking: Live progress updates with performance metrics
- Smart Output Management: Save transcripts alongside source files or in custom directories
- Performance Metrics: Detailed timing and throughput statistics
- API Key Management: Built-in settings menu for easy API key configuration
- Same as Input Directory: Intelligent handling of mixed source directories
- Enhanced File Display: Organized file list with full paths and counts
- Error Recovery: Robust error handling with per-file error reporting
- Cross-Platform: Works on Windows, macOS, and Linux
- Python 3.9 or higher
- AssemblyAI API key (get one free at https://www.assemblyai.com/)
- FFmpeg (for audio file handling)
- Clone the repository:
git clone <repository-url>
cd recall- Create a virtual environment and activate it:
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate- Install the required packages:
pip install -r requirements.txt- Install FFmpeg:
- Windows: Download from https://ffmpeg.org/download.html and add to PATH
- macOS:
brew install ffmpeg - Linux:
sudo apt-get install ffmpeg
You have two options for setting up your AssemblyAI API key:
- Run the application:
python run.py - Go to Settings > API Key in the menu bar
- Enter your AssemblyAI API key in the secure dialog
- Click Save - the key will be automatically saved for future use
Create a .env file in the project root:
ASSEMBLYAI_API_KEY=your_api_key_here
OUTPUT_DIRECTORY=transcriptionspython run.py- Select Files: Choose multiple audio files from anywhere on your system
- Select Directory: Process all supported audio files in a directory
- Mixed Sources: Files from different directories are handled intelligently
- Custom Directory: Specify where to save all transcriptions
- Same as Input: Save each transcript in the same directory as its source file
- For files from the same directory: Uses that directory
- For mixed directories: Each transcript saved with its source file
- Select your audio files using either selection method
- Choose output directory preference
- Click Start Transcription
- Monitor real-time progress with performance metrics
- View detailed logs and completion status
The application supports all major audio formats:
- AMR (.amr)
- MP3 (.mp3)
- WAV (.wav)
- M4A (.m4a)
- OGG (.ogg)
- FLAC (.flac)
- AAC (.aac)
- WMA (.wma)
- Automatic speaker detection and labeling
- Output format:
Speaker A: [text],Speaker B: [text] - Uses AssemblyAI's advanced speaker diarization
- Nano Model: Fast, efficient transcription engine
- Batch Processing: Handle multiple files seamlessly
- Progress Tracking: Real-time status updates
- Performance Metrics: Speed and throughput monitoring
- File Selection Buttons: Choose files or directories
- File List: Organized display with counts and full paths
- Progress Tracking: Real-time status and progress bar
- Performance Panel: Live metrics and timing information
- Output Log: Detailed transcription log with timestamps
- API Key Management: Secure API key storage and configuration
- About Dialog: Application information and supported formats
Robust error handling includes:
- Missing API Key: Clear guidance to Settings menu
- Invalid Audio Files: Per-file error reporting
- Network Issues: Retry logic and informative error messages
- File System Problems: Graceful handling of permissions/access issues
- API Errors: Detailed AssemblyAI error reporting
- API Key Storage: Saved to
~/.recall/config.json - Cross-Platform: Uses standard user config directories
- Persistent Settings: Automatically loads saved configuration
You can still use environment variables if preferred:
ASSEMBLYAI_API_KEY=your_api_key_here
OUTPUT_DIRECTORY=transcriptions- GUI Framework: CustomTkinter for modern appearance
- Transcription Engine: AssemblyAI Nano model with speaker labels
- Audio Processing: PyDub with FFmpeg backend
- Threading: Non-blocking UI with background processing
- Configuration: JSON-based user settings
- Concurrent Processing: Efficient handling of multiple files
- Memory Management: Automatic cleanup of temporary files
- Progress Tracking: Real-time updates without UI blocking
- Error Recovery: Continue processing remaining files on individual failures
MIT License
In addition to the desktop GUI, a simple Flask-based web interface is available. Run it locally with:
python -m src.webappYou can build and run the web app in a container:
# Build the image
docker build -t audio-transcriber .
# Run with API key as environment variable
docker run -p 5000:5000 -e ASSEMBLYAI_API_KEY=your_api_key_here audio-transcriber
# Or configure API key through the web interface after starting
docker run -p 5000:5000 audio-transcriberNote: The container will be available at http://localhost:5000. If no API key is provided via environment variable, you can configure it through the web interface.