🎙️ Voice-Text Keyboard

A lightweight background utility that lets you dictate text anywhere using a hotkey. Hold F8, speak, release — your words are instantly typed into whatever app is in focus.

Supports English spoken in any accent (Indian, American, Australian, British, French, German, Dutch, and more).

✨ Features

🔴 Push-to-talk — Hold F8 to record, release to transcribe
🌍 Accent-aware — Works with all English accents out of the box
🔇 Smart noise filtering — WebRTC VAD removes silence and background noise before sending audio to the API
🔊 Audio normalization — FFmpeg lightly normalizes volume for optimal AI clarity
🤖 Auto-types the result — Transcribed text is typed directly into your active window
🔔 Audio feedback — Beeps signal recording start, stop, and success
🚀 Runs on startup — Automatically launches invisibly in the background on boot

📋 Requirements

Python 3.8+
FFmpeg — must be placed in the project root as ffmpeg.exe (or added to system PATH)
OpenAI API Key

Python Dependencies

pyaudio
pynput
webrtcvad-wheels
openai
python-dotenv
pyinstaller

⚙️ Setup

1. Clone / Download the project

git clone https://github.com/your-username/Voice-text-keyboard.git
cd Voice-text-keyboard

2. Add your OpenAI API Key

Create a .env file in the project root:

OPENAI_API_KEY=sk-your-key-here

3. Run Setup

Double-click setup.bat or run it from a terminal:

setup.bat

This will:

Install all Python dependencies via pip
Add the app to your Windows Startup folder so it launches automatically on boot
Launch the app immediately in the background (no console window)

🎮 Usage

Once running:

Action	Result
Hold `F8`	Starts recording (you'll hear a high beep 🔔)
Speak	Talk naturally in English — any accent
Release `F8`	Stops recording (low beep 🔕), transcription begins
Wait ~1–2 sec	Text is auto-typed into your active window ✅

The app runs invisibly in the background. You can use it in any app — browsers, Word, Notepad, chat apps, etc.

🛠️ Configuration

All settings are at the top of main.py:

Constant	Default	Description
`HOTKEY`	`'f8'`	Push-to-talk key
`RATE`	`16000`	Audio sample rate (Hz) — required by WebRTC VAD
`CHUNK`	`480`	Audio frame size (30ms at 16kHz)
`RAW_FILE`	`temp_raw.wav`	Temp file for raw recorded audio
`NORM_FILE`	`temp_norm.wav`	Temp file for normalized audio

To change the hotkey to e.g. F9, edit:

HOTKEY = 'f9'

🧠 How It Works

Hold F8
   ↓
PyAudio captures mic input in 30ms chunks
   ↓
WebRTC VAD filters out silence and noise (only speech frames kept)
   ↓
FFmpeg normalizes audio volume (loudnorm filter)
   ↓
Audio sent to OpenAI gpt-4o-transcribe (Whisper API)
   with accent-aware prompt for best accuracy
   ↓
Transcribed text auto-typed at cursor via pynput

Accent Support

The transcription API call includes a prompt that informs the model the speaker may have an Indian, American, Australian, British, French, German, Dutch, or other English accent. Combined with webrtcvad.Vad(0) (least aggressive VAD mode) to avoid clipping speech patterns with different cadence, this gives the best accuracy across accents.

📁 Project Structure

Voice-text-keyboard/
├── main.py           # Main app logic
├── setup.bat         # One-click setup & launcher
├── requirements.txt  # Python dependencies
├── .env              # Your OpenAI API key (not committed)
├── ffmpeg.exe        # FFmpeg binary (not committed)
└── README.md         # This file

🔑 API Key Distribution

If sharing the .exe with others, each user must provide their own OPENAI_API_KEY. On first launch, the app reads the key from the .env file located in the same directory as main.py / the .exe.

📄 License

MIT — free to use and modify.

Name		Name	Last commit message	Last commit date
Latest commit History 59 Commits
.github/workflows		.github/workflows
.vscode		.vscode
Output		Output
backend		backend
.gitignore		.gitignore
README.md		README.md
build_exe.bat		build_exe.bat
ffmpeg.exe		ffmpeg.exe
installer.iss		installer.iss
main.py		main.py
requirements.txt		requirements.txt
setup.bat		setup.bat
setup_mac_linux.sh		setup_mac_linux.sh
test_linux.py		test_linux.py
test_signal.py		test_signal.py
xvoice.spec		xvoice.spec

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🎙️ Voice-Text Keyboard

✨ Features

📋 Requirements

Python Dependencies

⚙️ Setup

1. Clone / Download the project

2. Add your OpenAI API Key

3. Run Setup

🎮 Usage

🛠️ Configuration

🧠 How It Works

Accent Support

📁 Project Structure

🔑 API Key Distribution

📄 License

About

Uh oh!

Releases 16

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🎙️ Voice-Text Keyboard

✨ Features

📋 Requirements

Python Dependencies

⚙️ Setup

1. Clone / Download the project

2. Add your OpenAI API Key

3. Run Setup

🎮 Usage

🛠️ Configuration

🧠 How It Works

Accent Support

📁 Project Structure

🔑 API Key Distribution

📄 License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases 16

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages