Lightweight desktop voice-to-text transcription with OpenAI Whisper and system tray integration
Complete documentation available at: nouuu.github.io/voice-transcriber
- π― System Tray Integration: Click to record, visual state feedback (green=idle, red=recording, purple=processing)
- βοΈ Live Configuration Management: Edit config and reload without restart - switch backends, languages, API keys on-the-fly
- ποΈ High-Quality Recording: Audio capture using arecord on Linux
- π Multilingual Support: French, English, Spanish, German, Italian with strong language enforcement
- βοΈ Text Formatting: Optional GPT-based grammar improvement
- π Clipboard Integration: Automatic result copying to clipboard
- π Self-Hosted Option: Run 100% offline with Speaches - same quality as OpenAI Whisper, zero cost, complete privacy
- π Smart Reload: Configuration validation with automatic rollback on errors
Before installing, ensure you have:
-
Bun runtime (β₯1.2.0)
curl -fsSL https://bun.sh/install | bash -
System dependencies (Ubuntu/Linux)
sudo apt-get update sudo apt-get install alsa-utils xsel
One-Command Setup (Recommended)
# Clone and setup everything
git clone https://github.com/Nouuu/voice-transcriber.git
cd voice-transcriber
make setupThis command will:
- β Check all system dependencies (Bun, arecord, xsel)
- β Install Bun dependencies
- β
Create configuration file at
~/.config/voice-transcriber/config.json
Configure OpenAI API key
nano ~/.config/voice-transcriber/config.jsonAdd your OpenAI API key:
{
"language": "en",
"formatterEnabled": true,
"transcription": {
"backend": "openai",
"openai": {
"apiKey": "sk-your-api-key-here"
}
}
}Get your OpenAI API key: https://platform.openai.com/api-keys
For detailed configuration options, see the Configuration Guide
Install the voice-transcriber command globally:
make install-globalThis allows you to run the application from anywhere.
# If installed globally
voice-transcriber
# Or from project directory
make run
# Enable debug mode for detailed logging (benchmarks, file sizes, timings)
voice-transcriber --debug
# or
make run ARGS="--debug"Enable debug mode to see detailed information about:
- File sizes: WAV and MP3 file sizes with compression ratios
- Audio format: Sample rate, channels, conversion details
- Processing times: Breakdown of upload, processing, and response times
- Transcription details: Character count, duration metrics
Example debug output:
2025-10-11T10:30:15.123Z [DEBUG] WAV file size: 2.45 MB (2569216 bytes)
2025-10-11T10:30:15.125Z [DEBUG] WAV format: 2 channel(s), 44100 Hz sample rate
2025-10-11T10:30:15.234Z [DEBUG] MP3 file size: 0.62 MB (650240 bytes)
2025-10-11T10:30:15.234Z [DEBUG] Compression ratio: 74.7% size reduction
2025-10-11T10:30:15.234Z [DEBUG] WAV to MP3 conversion completed in 0.11 seconds
2025-10-11T10:30:16.789Z [INFO] OpenAI transcription completed in 1.55s
2025-10-11T10:30:16.789Z [DEBUG] ββ Estimated breakdown: upload ~0.47s, processing ~0.93s, receive ~0.16s
2025-10-11T10:30:16.789Z [DEBUG] ββ Transcription length: 142 characters
- Look for the system tray icon (green circle when idle)
- Click the tray icon or menu to start/stop recording
- Transcribed text is automatically copied to your clipboard
To remove the global command:
make uninstall-globalπ’ Idle/Ready State β π΄ Recording β π£ Processing
Click to start Speaking... AI transcribing
Right-click the tray icon for menu options:
π€ Voice Transcriber
βββ βοΈ Open Config - Open configuration in default editor
βββ π Reload Config - Reload config without restart (idle only)
βββ ποΈ Start Recording - Begin voice capture
βββ βΉοΈ Stop Recording - End recording and transcribe
βββ β Exit - Exit the application
- When idle (π’): Start/Open/Reload/Exit enabled, Stop disabled
- When recording (π΄): Stop/Open/Exit enabled, Start/Reload disabled
- When processing (π£): Open/Exit enabled, Start/Stop/Reload disabled
New: Live Configuration Management - Edit your config file and reload without restarting the app. Perfect for testing different languages, switching backends, or updating API keys.
- When recording (π΄): "Start Recording" is disabled, "Stop Recording" is enabled
- When processing (π£): Both recording options are disabled
For detailed configuration, language support, backends (OpenAI vs Speaches), and benchmark mode, see the Configuration Guide
make help # Show all available commands
# π Setup & Installation
make install-global # Install voice-transcriber command globally
make uninstall-global # Uninstall global voice-transcriber command
make setup # Complete setup (system deps + bun deps + config)
make check-system-deps # Check system dependencies (Bun, arecord, xsel)
make init-config # Initialize config file in ~/.config/voice-transcriber/
make install # Install bun dependencies only
# βΆοΈ Running
make run # Run the application
make dev # Run in development mode with watch
make test-file FILE=... # Run specific test file
# π§ͺ Testing & Quality
make test # Run all tests
make test-watch # Run tests in watch mode
make test-file # Run specific test (usage: make test-file FILE=path/to/test.ts)
# π Documentation
make docs-install # Install MkDocs and required plugins
make docs-build # Build documentation site
make docs-serve # Serve documentation locally at http://127.0.0.1:8000
make docs-deploy # Deploy documentation to GitHub Pages
make lint # Run ESLint linting
make format # Format code with Prettier
make format-check # Check code formatting and linting
# π οΈ Utilities
make clean # Clean build artifacts and temporary files
make build # Build for production
make check-deps # Alias for check-system-deps (legacy)
make audit # Run security audit on dependencies
make release-patch # Create patch release (x.x.X) - Bug fixes
make release-minor # Create minor release (x.X.0) - New features
make release-major # Create major release (X.0.0) - Breaking changes
make get-version # Show current version from package.json
make pre-release # Validate code before release (linting, tests, git status)voice-transcriber/
βββ src/
β βββ index.ts # Main application entry point
β βββ config/
β β βββ config.ts # Configuration management
β β βββ config.test.ts
β βββ services/
β β βββ audio-recorder.ts # Audio recording service
β β βββ transcription.ts # OpenAI Whisper integration
β βββ logger.ts # Simple logging utility
β βββ mp3-encoder.ts # MP3 audio compression
β β βββ clipboard.ts # Cross-platform clipboard
β β βββ system-tray.ts # System tray management
β βββ utils/
β βββ logger.ts # Simple logging utility
βββ documentation/ # MkDocs documentation source
β βββ icon-recording.png # Tray icon (recording)
β βββ icon-processing.png # Tray icon (processing)
βββ dist/ # Built application (generated)
β βββ index.js # Bundled application
βββ Makefile # Development commands
βββ config.example.json # Configuration template
βββ package.json
# First-time setup (if not done already)
make setup
# Check system requirements
make check-system-deps
# Run tests (recommended before development)
make test
# Start development with auto-reload
make dev
# Run specific test file
make test-file FILE=src/services/system-tray.test.ts
# Format and lint code
make format-check
# Clean up temporary files
make cleanπ― Keep It Simple - No Overengineering
- β
Basic error handling (
{ success: boolean, error?: string }) - β Simple configuration loading from JSON
- β Direct API calls to OpenAI (Whisper + GPT)
- β Basic audio recording (start/stop/save)
- β Simple system tray with 3 states
- β Console logging (info/error only)
β What We Avoid:
- Complex retry logic with exponential backoff
- Advanced statistics tracking and usage metrics
- Batch processing capabilities
- Complex validation with detailed error messages
- Advanced logging with rotation and file management
Each service has 3-5 core methods maximum, following single responsibility principle.
All services have comprehensive test coverage with simple mocks:
# Run all tests
make test
# Run tests in watch mode during development
make test-watch
# Run specific test file
make test-file FILE=src/services/system-tray.test.tsTesting Philosophy:
- Test core functionality, not edge cases
- Use simple mocks, avoid complex scenarios
- Maximum 5-6 tests per service
- Focus on: success cases, basic error handling, input validation
Phase 1: Foundation β
- Configuration system with API key management (37 lines, simplified from 164)
- Logging system with console output (37 lines, simplified from 280)
Phase 2: Core Services β
- Audio recording with arecord backend (80 lines, simplified from 280)
- OpenAI Whisper transcription service (73 lines, simplified from complex)
- OpenAI GPT formatting service (70 lines, simplified from complex)
Phase 3: System Integration β
- Live configuration management with validation and rollback
- All 93 tests passing with comprehensive coverage (including config management tests)
- Cross-platform clipboard service (66 lines, simplified from 460)
Phase 4: Main Application β
- Complete workflow: Record β Transcribe β Format β Clipboard
- Graceful shutdown handling and error management
- All 49 tests passing with comprehensive coverage (including MP3 encoder tests)
- KEEP IT SIMPLE - No overengineering
- Minimal viable functionality only
- Simple interfaces:
{ success: boolean, error?: string } - Test-driven development approach
- French/English auto-detection support
- β Config.json deletion: FIXED - Unit tests no longer delete production config.json
- β System tray icon updates: FIXED - Implemented systray recreation workaround with recreation method
- β CI/CD workflows: FIXED - GitHub Actions now work properly with optimized caching and semantic versioning
- β Release automation: FIXED - Automatic changelog generation for both PRs and direct commits
- β Asset resolution: FIXED - Modern import.meta.dirname-based asset paths for development and npm package compatibility
- β npm version workflow: FIXED - Automated release workflow with npm version, pre-release validation, and conventional commit messages
- β Live Configuration Management: FIXED - Open and reload configuration from system tray menu without restart (with validation and rollback)
- β Linting Migration: FIXED - Successfully migrated from Biome to ESLint + Prettier with updated CI workflows
- β Mixed Language Transcription: FIXED - Enhanced Whisper prompt to better preserve French/English mixed speech
- β System Tray Library: FIXED - Migrated from systray2 to node-systray-v2 for better reliability and distribution
- β Config Wizard: FIXED - Improved first-run setup with better guidance for API key configuration
- β FrenchβEnglish Language Switching: FIXED - Strong language-specific prompts prevent Whisper from switching languages during long transcriptions
- β Configuration Architecture: FIXED - Centralized config system with single source of truth and clear documentation
- β Audio compression: WAV to MP3 conversion implemented with lamejs (mono 16kHz at 64kbps for voice optimization)
- Long audio handling: Need proper handling for long audio files (still pending)
- β βοΈ Live Config Management: Open and reload configuration from system tray without restart
- β π Config Validation: Automatic validation and rollback on configuration errors
- β π User Config Directory: Config now uses ~/.config/voice-transcriber/ with first-run setup wizard
- β π§ Local Installation: Streamlined local-only Bun installation with automated setup
- β π Dynamic Asset Resolution: Modern import.meta.dirname-based asset resolution
- β
π Automated Setup: Complete
make setupcommand for one-step installation - β π Multilingual Support: Spanish, German, Italian support with strong language enforcement
- β βοΈ Custom Prompts: User-configurable transcription and formatting prompts
- β π Configuration System: Centralized config with comprehensive documentation
- β π₯οΈ System Tray Library: COMPLETED - Migrated from systray2 to node-systray-v2 for better reliability and binary distribution
- β π Mixed Language Support: COMPLETED - Enhanced Whisper prompt for better French/English mixed speech preservation
- β ποΈ Audio Optimization: COMPLETED - WAV to MP3 conversion with lamejs (mono 16kHz, 64kbps voice optimization)
- π Local Inference Support: Add faster-whisper integration for offline transcription (4x faster, no API costs)
- πΎ File Saving: Add option to save transcriptions to file instead of just clipboard
- β³ Long Audio Support: Handle audio files longer than API limits
- π― Quick Actions Menu: Toggle features and switch modes on-the-fly without config reload
- βοΈ Formatter Toggle: Enable/disable GPT formatting instantly
- π Formatter Personalities: Quick switch between formatting styles (Professional, Technical, Creative)
- π€ Backend Selector: Choose between OpenAI GPT or Speaches LLM for formatting
- π» CLI Interface: Command-line interface for automation and scripting
- πͺ Windows Support: Replace arecord with Windows-compatible audio recording
- π macOS Support: Add macOS audio recording and system tray integration
- β¨οΈ Keyboard Shortcuts: Global shortcuts to trigger transcription
- πΌοΈ Graphical Interface: Desktop GUI for easier configuration and usage
- π§ System Dependencies Elimination: Replace system dependencies (alsa-utils, xsel) with pure JS alternatives or bundled binaries for zero-dependency npm installation
- π Enhanced Logging: More detailed logging with file output and rotation
- π Usage Statistics: Track and display transcription stats
- π‘οΈ Better Error Handling: More robust error recovery and user feedback
Use Conventional Commits with minimal descriptions:
git commit -m "feat: add system tray icon updates"
git commit -m "fix: resolve menu actions not working"
git commit -m "refactor: simplify clipboard service"
git commit -m "test: add system tray state tests"
git commit -m "docs: update README with roadmap"Rules:
- Keep descriptions under 50 characters
- Use present tense ("add" not "added")
- No capitalization after colon
- No period at the end
make install- Install dependenciesmake check-deps- Verify system requirementsmake test- Run tests before development- Follow the simplification guidelines
- Write tests first (TDD approach)
- Keep services under 100 lines each
- Use simple interfaces and avoid overengineering
make test- Run tests before committing
This project was created using bun init with Bun runtime.
- Runtime: Bun (β₯1.2.0) with TypeScript
- Audio: node-audiorecorder (arecord backend)
- AI: OpenAI SDK (Whisper + GPT)
- System Tray: node-systray-v2 (native binary distribution)
- Clipboard: clipboardy
- Testing: Bun test runner
- Build: Makefile with development commands
- Linting: ESLint + Prettier
- CI/CD: GitHub Actions with APT and Bun dependency caching
- Distribution: Local installation with automated setup