🎤 Voice-to-Text Dictation System

A minimal, elegant voice dictation application for macOS that lets you speak into any text field using the Globe/Fn key. Hold the key, speak, and watch your words appear instantly in any application.

✨ Features

🌐 Globe/Fn Key Dictation - Hold Globe/Fn key to record, release to transcribe
📝 Universal Text Input - Works in any application (browsers, documents, chat apps, etc.)
🗣️ Real-time Speech Recognition - Powered by Google Speech Recognition API (requires internet)
🎙️ macOS Integration - Uses built-in microphone indicator (no custom animation needed)
🔒 Privacy - Audio sent to Google servers for processing; by default, Google does not retain audio recordings (check your Google Account "Voice & Audio Activity" settings)
🌐 Internet Required - Speech recognition requires active internet connection
⚡ Stable & Reliable - Robust multi-use design that doesn't freeze or crash
🎯 Simple Workflow - Click in text field → Hold Globe/Fn → Speak → Release → Text appears
📱 Background Operation - Minimal UI that stays out of your way

🚀 Quick Start

Prerequisites

macOS 10.15+ (optimized for Apple Silicon)
Python 3.9+ (Python 3.14 recommended)
Karabiner-Elements (for Globe/Fn key remapping)
Homebrew (for installing dependencies)
Internet Connection (required for speech recognition)

Step 1: Install Homebrew (if not already installed)

/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

Step 2: Install Python and Dependencies

# Install Python 3.14 (or latest Python 3.x)
brew install python@3.14

# Install Python Tkinter (required for UI)
brew install python-tk@3.14

# Install PortAudio (required for PyAudio)
brew install portaudio

Step 3: Clone and Setup

# Clone the repository
git clone https://github.com/yourusername/voice-to-text.git
cd voice-to-text

# Create virtual environment
python3 -m venv venv
source venv/bin/activate

# Install Python dependencies
pip install -r requirements.txt

Step 4: Install Karabiner-Elements

# Install Karabiner-Elements via Homebrew
brew install --cask karabiner-elements

Or download manually: Karabiner-Elements

Step 5: Configure Karabiner-Elements

Open Karabiner-Elements (from Applications or Spotlight)
Go to "Simple Modifications" tab
Select your keyboard from the dropdown
Click "Add item" button
Set the mapping:
- From key: fn (or globe if available)
- To key: f13
Enable the rule (toggle should be ON)

Alternative: If you have a Globe key instead of Fn:

From key: globe
To key: f13

Step 6: Grant macOS Permissions

The app needs several permissions to work:

A. Accessibility Permission

Open System Settings (or System Preferences on older macOS)
Go to Privacy & Security → Accessibility
Click the + button and add:
- Terminal.app (or your terminal app: iTerm2, Warp, etc.)
- Cursor (if using Cursor IDE)
- Python Launcher (if available)
- The specific Python executable being used:
  - For Homebrew Python: /opt/homebrew/opt/python@3.14/bin/python3.14
  - Or: /usr/local/bin/python3
Enable the toggle for each added item

B. Input Monitoring Permission

In System Settings → Privacy & Security → Input Monitoring
Add the same applications as above
Enable the toggle for each

C. Microphone Permission

In System Settings → Privacy & Security → Microphone
Add Terminal.app (or your terminal app)
Enable the toggle

💡 Tip: If permissions don't work, try:

Restarting your terminal/IDE
Running the app once, then checking System Settings again
Adding the specific Python executable path

Step 7: Run the App

# Activate virtual environment (if not already active)
source venv/bin/activate

# Run the app
python voice_to_text.py

Or use the convenience script:

./start.sh

📖 Usage

Start the app (leave it running in the background)
Click in any text field (email, document, chat, browser, etc.)
Hold the Globe/Fn key (bottom-left corner of keyboard)
- Watch for macOS microphone indicator in the menu bar (orange dot)
- This confirms the app is recording
Speak clearly while holding the key
Release the Globe/Fn key when finished speaking
Watch your words appear automatically in the text field!

Visual Feedback

🟢 Ready - App is ready for dictation
🔴 Recording - Currently recording (macOS mic indicator also shows)
🟡 Processing - Transcribing your speech
✅ Done - Text successfully injected

🛠 How It Works

Click text field → Hold Globe/Fn → Speak → Release → Text appears instantly

Technical Stack:

Karabiner-Elements - Remaps Globe/Fn key to F13 for reliable detection
pynput - Global hotkey detection and text injection
PyAudio - Audio recording from microphone
SpeechRecognition - Google Speech Recognition API for speech-to-text (requires internet)
Text Injection - Types transcribed text into active application
macOS Integration - Uses system microphone indicator

Note: This version requires an active internet connection for speech recognition. Audio is sent to Google's servers for processing.

Privacy: By default, Google does not retain audio recordings from the Speech Recognition API. However, if you have "Voice & Audio Activity" enabled in your Google Account settings, Google may save audio data to improve services. You can manage this in your Google Account settings. The transcribed text is returned to the app and is not stored by Google.

⚙️ Configuration

Edit config/settings.json to customize behavior:

{
  "hotkey": {
    "combination": "f13",
    "hold_to_record": true,
    "toggle_mode": false
  },
  "audio": {
    "sample_rate": 16000,
    "channels": 1
  },
  "system": {
    "paste_mode": true,
    "typing_speed": 0
  }
}

Key Settings:

hold_to_record: true - Hold key to record (recommended)
sample_rate: 16000 - Audio quality (16kHz is optimal for speech)
paste_mode: true - Uses clipboard for faster injection

🏗 Project Structure

voice-to-text/
├── voice_to_text.py          # Main application
├── start.sh                   # Convenience startup script
├── requirements.txt           # Python dependencies
├── config/
│   └── settings.json          # Configuration file
├── src/
│   ├── hotkey_listener.py     # Global hotkey detection
│   ├── audio_recorder.py      # Audio recording system
│   ├── text_injector.py       # Text injection into apps
│   └── ui/                    # UI components
├── recordings/                # Temporary audio files (auto-deleted after 24hrs)
└── README.md                  # This file

🔧 Troubleshooting

Globe/Fn Key Not Working

Check Karabiner-Elements:
- Is it running? (check menu bar icon)
- Is the rule enabled? (toggle should be ON)
- Try restarting Karabiner-Elements
Check Permissions:
- Verify Accessibility permission is granted
- Verify Input Monitoring permission is granted
- Try restarting terminal/app after granting permissions

Test the key:

python -c "from pynput import keyboard; print('Press F13...'); keyboard.Listener(on_press=lambda k: print(f'Pressed: {k}')).start(); input()"

Audio Not Recording

Check Microphone Permission:
- System Settings → Privacy & Security → Microphone
- Ensure Terminal (or your terminal app) is enabled

Test Microphone:

python -c "import pyaudio; p = pyaudio.PyAudio(); print('Microphones:'); [print(f'{i}: {p.get_device_info_by_index(i)[\"name\"]}') for i in range(p.get_device_count())]"

Check Audio Levels:
- Speak louder
- Check microphone isn't muted
- Try a different microphone

Text Not Injecting

Click in the text field first - The app needs focus on the target field
Check Accessibility Permission - Required for text injection
Try different applications - Some apps may have restrictions
Check the app status - Look for error messages in the app window

"Could not understand audio" Errors

Check internet connection - Google API requires active internet connection
Speak more clearly - Enunciate words
Hold key longer - Recordings shorter than 0.5 seconds are rejected
Speak louder - Audio may be too quiet
Reduce background noise - Find a quieter environment

App Freezes or Crashes

Check Python version:
```
python3 --version  # Should be 3.9+
```

Reinstall dependencies:

pip install --upgrade -r requirements.txt

Check for conflicting apps - Other apps using microphone/hotkeys
Restart the app - Close and reopen

Permission Issues

If permissions keep getting denied:

Add specific Python executable:
```
which python3  # Get the path
```
Then add that exact path to Accessibility and Input Monitoring
Try running from different terminal:
- Terminal.app
- iTerm2
- Cursor/VSCode integrated terminal
Check System Integrity Protection (SIP):
- Usually not an issue, but can block some permissions
- Check: csrutil status

🎯 Tips for Best Results

Speak clearly and at normal pace - Not too fast, not too slow
Hold the key for at least 1 second - Gives the app time to start recording
Click in text field first - Ensures text goes to the right place
Use in quiet environment - Reduces background noise
Keep app running in background - Minimize window to top-right corner
Watch macOS mic indicator - Orange dot confirms recording is active

🔄 Auto-Cleanup

The app automatically deletes audio recordings older than 24 hours to save disk space. Recordings are stored in the recordings/ directory and are temporary files used only for transcription.

📄 License

Apache 2.0 License - see LICENSE for details.

🤝 Contributing

Contributions welcome! Please feel free to submit a Pull Request.

🙏 Acknowledgments

Google Speech Recognition API - Speech-to-text conversion
Karabiner-Elements - Key remapping functionality
SpeechRecognition - Python speech recognition library
pynput - Global hotkey and input control

📝 Changelog

Version 1.0.0

Initial release
Globe/Fn key hold-to-record functionality
Google Speech Recognition integration
Minimal UI with macOS integration
Auto-cleanup of old recordings
Universal text injection

Built for seamless voice dictation into any application on macOS 🚀

Enjoy dictating! 🎤✨

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
config		config
recordings		recordings
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
install.sh		install.sh
requirements.txt		requirements.txt
start.sh		start.sh
voice_to_text.py		voice_to_text.py

License

noobdev93/voice-to-text

Folders and files

Latest commit

History

Repository files navigation