Experience the future of voice-to-text with Groq DEV Tier - Ultra-fast transcription that leaves OpenAI's free tier in the dust!
┌─────────────────┐
│ 🎤 maVoice │ ← Tiny floating widget (72x20px)
│ ▶ ■ ▪ ▪ ▪ ▪ │ Always on top of your screen
└─────────────────┘ Double-click to start!
- ✨ Features
- 🎯 What is maVoice?
- 🚀 Quick Start
- 🎮 How to Use
- 🔧 Troubleshooting
- 🧑💻 Developer Guide
- ❓ FAQ
- 🏎️ Performance
- 🤝 Contributing
- ⚡ Blazing Fast: Powered by Groq's Whisper Large v3 Turbo model - the fastest inference in the game
- 🎯 Native Performance: Built with Rust and Tauri for minimal resource usage
- 🎨 Floating Widget: Tiny, draggable overlay that stays out of your way
- 🔒 Privacy First: Your API key, your data - everything stays local
- 🌐 Cross-Platform: Works on Linux (Windows and macOS coming soon!)
- 🎤 Smart Recording: Real-time audio visualization and voice detection
- 📋 Instant Copy: Automatic clipboard integration for seamless workflow
- ⚙️ Advanced Settings: Comprehensive configuration panel with model selection
- 🎛️ Intuitive Controls: Double-click to start, single-click to stop
- 🌍 Multi-Language: Support for 100+ languages with custom prompts
maVoice is a floating voice dictation widget that lives on your desktop. Unlike traditional apps with windows and menus, maVoice is a tiny, always-accessible button that floats above your other applications.
Normal State Recording Processing Success
┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ 🎤 maVoice │ → │ 🔴 ▶▶▶▶ │ → │ 🟠 ◈◈◈◈◈ │ → │ ✅ Done! │
└─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘
(Blue) (Red) (Orange) (Green)
- Size: 72x20 pixels (about the size of a small button)
- Position: Fixed at coordinates x:300, y:800 by default
- Behavior: Always on top, transparent background, no window borders
- Dragging: Right-click or Ctrl+Left-click to drag to a new position
✨ BREAKTHROUGH: WSL2 + WSLg provides PERFECT voice dictation with zero audio issues!
🪟 WSL2 Setup - THE BEST WAY (Recommended for Windows)
Why WSL2 is SUPERIOR to native Windows:
- ✅ ZERO audio configuration issues (no WASAPI problems)
- ✅ Native Linux performance with Windows GUI
- ✅ Perfect microphone integration through WSLg
- ✅ Flawless clipboard integration (no Windows API headaches)
- ✅ Instant setup - no Visual Studio Build Tools needed!
-
Update WSL2 (from Windows PowerShell as Administrator):
wsl --update wsl --version # Ensure version 2 with WSLg
-
Install Debian/Ubuntu if you don't have it:
wsl --install -d Debian
💡 COPY-PASTE EACH BLOCK SEPARATELY FOR SUCCESS:
Step 1: Install Rust (Required)
# In your WSL2 terminal - paste and press Enter
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh1 then Enter to proceed with default installation
# Restart your shell or run this:
source $HOME/.cargo/envStep 2: Install System Dependencies (One Line)
# Copy-paste this ENTIRE block as one command:
sudo apt update && sudo apt install -y \
build-essential pkg-config libgtk-3-dev libwebkit2gtk-4.1-dev \
libsoup-3.0-dev libjavascriptcoregtk-4.1-dev libdbus-1-dev \
libappindicator3-dev librsvg2-dev libasound2-dev \
xdotool wl-clipboard wtypeStep 3: Clone & Install (AUTOMATED)
git clone https://github.com/Cwilliams333/maVoice-Enhanced.git
cd maVoice-Enhanced
./install.sh✅ This installs ALL npm dependencies automatically!
Step 4: Add Your Groq API Key
# Replace "your_groq_api_key_here" with your ACTUAL key from console.groq.com
echo "VITE_GROQ_API_KEY=your_groq_api_key_here" > src-tauri/aquavoice-frontend/.envStep 5: Launch maVoice!
npm run dev🎉 Look for a tiny floating widget at position (300, 800) on your screen!
❌ "I don't see the widget!"
- It's TINY (72x20px) - look carefully around coordinates (300, 800)
- Check if
npm run devshows errors - if so, something failed above
❌ "Permission denied during sudo apt install"
- You need to enter your WSL2 password (it won't show characters as you type)
❌ "Rust installation failed"
- Restart your terminal after installing Rust:
source $HOME/.cargo/env
❌ "My API key doesn't work"
- Get it from console.groq.com (must start with
gsk_) - Replace
your_groq_api_key_herewith your ACTUAL key
❌ "No audio recording"
- Make sure your microphone isn't being used by another app
- WSL2 + WSLg handles audio automatically - no extra config needed!
🎯 What you'll see when it works:
- A tiny floating widget (72x20px) appears on your Windows desktop
- Right-click + drag to move it anywhere
- Double-click to start recording
- Single-click to stop and get instant transcription
- Text automatically copied to Windows clipboard!
After running npm run dev, verify these work:
-
Widget appears ✓
- Look for tiny floating button at (300, 800)
- Says "🎤 TALK" in blue
-
Recording works ✓
- Double-click widget → turns red with moving bars
- Speak for 2-3 seconds
- Single-click to stop → turns orange then green
-
Transcription works ✓
- Your speech appears as text
- Text automatically copied to clipboard
- Paste (Ctrl+V) in any Windows app to verify
-
Dragging works ✓
- Right-click + drag moves the widget anywhere
🎉 If all 4 work, you're GOOD TO GO!
- WSLg magic: Linux app appears as native Windows app
- Audio perfection: Linux ALSA handles microphone flawlessly
- Zero configuration: No Windows audio driver headaches
- Best performance: Native Linux speed with Windows convenience
🐧 Native Linux Setup
# Check if you have all requirements
command -v node >/dev/null 2>&1 && echo "✅ Node.js installed" || echo "❌ Node.js missing"
command -v rustc >/dev/null 2>&1 && echo "✅ Rust installed" || echo "❌ Rust missing"
command -v cargo >/dev/null 2>&1 && echo "✅ Cargo installed" || echo "❌ Cargo missing"
# Check display server
echo "Display Server: ${XDG_SESSION_TYPE:-$([[ -n $WAYLAND_DISPLAY ]] && echo wayland || echo x11)}"Debian/Ubuntu:
sudo apt update
sudo apt install -y \
build-essential \
pkg-config \
libgtk-3-dev \
libwebkit2gtk-4.1-dev \
libsoup-3.0-dev \
libjavascriptcoregtk-4.1-dev \
libdbus-1-dev \
libappindicator3-dev \
librsvg2-dev \
# For audio
libasound2-dev \
# For text injection
xdotool \
wl-clipboard \
wtypeFedora:
sudo dnf install -y \
gcc \
pkg-config \
gtk3-devel \
webkit2gtk4.0-devel \
libsoup-devel \
javascriptcoregtk4.0-devel \
dbus-devel \
libappindicator-gtk3-devel \
librsvg2-devel \
alsa-lib-devel \
xdotool \
wl-clipboard \
wtypeArch Linux:
sudo pacman -S --needed \
base-devel \
gtk3 \
webkit2gtk \
libsoup \
dbus \
libappindicator-gtk3 \
librsvg \
alsa-lib \
xdotool \
wl-clipboard \
wtype# 1. Clone the repository
git clone https://github.com/lliWcWill/maVoice-Linux.git
cd maVoice-Linux
# 2. Run automated installer
./install.sh
# 3. Configure API key
echo "VITE_GROQ_API_KEY=your_groq_api_key_here" > src-tauri/aquavoice-frontend/.env
# 4. Run in development
npm run dev
# 5. Or build for production
npm run build
# Find .deb package in: src-tauri/target/release/bundle/deb/⚠️ Native Windows Setup (Not Recommended - Use WSL2 Instead)
🚨 WARNING: Native Windows has known issues:
- ❌ WASAPI audio problems (microphone conflicts)
- ❌ Complex setup (Visual Studio Build Tools required)
- ❌ Audio driver conflicts with other applications
- ❌ Clipboard integration issues
- ❌ Performance overhead
👆 USE WSL2 SETUP ABOVE INSTEAD! It solves all these problems.
🪟 WSL2 Setup (Legacy - Superseded by Enhanced WSL2 Above)
-
Update WSL2 (from Windows PowerShell as Administrator):
wsl --update wsl --version # Ensure version 2
-
Verify WSLg Support:
# In WSL2 terminal echo $DISPLAY # Should show :0 echo $WAYLAND_DISPLAY # Should show wayland-0
-
Install Dependencies:
# First, install Rust curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh source $HOME/.cargo/env # Then install system dependencies sudo apt update sudo apt install -y \ build-essential \ pkg-config \ libgtk-3-dev \ libwebkit2gtk-4.1-dev \ libsoup-3.0-dev \ libjavascriptcoregtk-4.1-dev \ libdbus-1-dev \ libappindicator3-dev \ librsvg2-dev \ libasound2-dev \ xdotool \ wl-clipboard \ wtype # Test GUI support sudo apt install -y x11-apps xclock # Should display a clock window
- No GUI appears: Ensure WSLg is enabled and GPU drivers are updated on Windows
- Audio not working: WSL2 audio passthrough may need configuration
- Widget hard to find: The 72x20px window is tiny - look carefully or temporarily increase size in
tauri.conf.json
🐳 Docker Setup (Experimental)
# Build and run with X11 forwarding
docker-compose -f docker-compose.dev.yml up
# Or manually
docker build -t mavoice .
docker run -it \
-e DISPLAY=$DISPLAY \
-v /tmp/.X11-unix:/tmp/.X11-unix \
-v ~/.Xauthority:/root/.Xauthority \
--device /dev/snd \
mavoiceNote: Audio and clipboard integration may be limited in Docker.
| Component | Minimum | Recommended |
|---|---|---|
| OS | Linux (X11/Wayland) | Ubuntu 22.04+, Fedora 38+ |
| Node.js | 18.0.0 | 20.0.0+ |
| Rust | 1.70.0 | Latest stable |
| RAM | 2GB | 4GB+ |
| Display | Any | 1920x1080+ |
| Audio | PulseAudio/ALSA | PulseAudio |
- Visit console.groq.com
- Sign up for a free account
- Navigate to API Keys section
- Create a new API key
- Copy and save it securely
When you first launch maVoice, look for a tiny floating widget:
Your Desktop:
┌─────────────────────────────────────────┐
│ File Edit View Help │
│ │
│ ┌─────────────┐ ← Look here! │
│ │ 🎤 maVoice │ (x:300, y:800) │
│ └─────────────┘ │
│ │
└─────────────────────────────────────────┘
-
Start Recording: Double-click the widget
- Widget turns red with animated bars
- Microphone activates immediately
-
Stop Recording: Single-click while recording
- Widget turns orange (processing)
- Then green (success) with transcribed text
- Text automatically copied to clipboard
-
Move the Widget: Right-click and drag (or Ctrl+Left-click)
-
Access Settings: Click the gear icon on the web interface
| Shortcut | Action |
|---|---|
Ctrl+Shift+, |
Start/stop recording (global) |
Alt+Space |
Toggle recording |
Double Alt |
Quick record |
Spacebar |
Stop recording (while active) |
Access settings at http://localhost:5173 or click the gear icon:
- API Key: Securely store your Groq API key
- Model Selection: Choose Whisper model variant
- Temperature: Adjust creativity (0.0-1.0)
- Language: Select from 100+ languages
- Custom Prompt: Add technical terms, names, or style preferences
- Max Tokens: Control response length
Create and run this script to check your setup:
#!/bin/bash
# Save as check-mavoice.sh and run with: bash check-mavoice.sh
echo "🔍 maVoice Diagnostic Check"
echo "=========================="
# Check Node.js
if command -v node >/dev/null 2>&1; then
echo "✅ Node.js: $(node --version)"
else
echo "❌ Node.js: Not installed"
fi
# Check Rust
if command -v rustc >/dev/null 2>&1; then
echo "✅ Rust: $(rustc --version | cut -d' ' -f2)"
else
echo "❌ Rust: Not installed"
fi
# Check display server
if [[ -n $WAYLAND_DISPLAY ]]; then
echo "✅ Display: Wayland ($WAYLAND_DISPLAY)"
elif [[ -n $DISPLAY ]]; then
echo "✅ Display: X11 ($DISPLAY)"
else
echo "❌ Display: No display server detected"
fi
# Check audio
if command -v pactl >/dev/null 2>&1; then
echo "✅ Audio: PulseAudio available"
elif [[ -d /dev/snd ]]; then
echo "⚠️ Audio: ALSA only (may have issues)"
else
echo "❌ Audio: No audio system detected"
fi
# Check critical dependencies
deps=("pkg-config" "xdotool" "wl-copy")
for dep in "${deps[@]}"; do
if command -v $dep >/dev/null 2>&1; then
echo "✅ $dep: Installed"
else
echo "❌ $dep: Missing"
fi
done🚫 "Widget doesn't appear"
-
Check if process is running:
ps aux | grep mavoice -
Look in the correct location (x:300, y:800):
- Top-left area of your screen
- It's only 72x20 pixels!
-
Temporarily increase widget size:
// Edit src-tauri/tauri.conf.json "width": 200, // Instead of 72 "height": 100, // Instead of 20 "transparent": false, // Make it visible
-
Check logs:
# Run with console output npm run dev 2>&1 | tee mavoice.log
🎤 "No audio recording"
-
Check audio permissions:
# List audio devices pactl list sources short # Test microphone arecord -d 5 test.wav && aplay test.wav
-
Select correct audio device:
- The app uses the system default
- Change default in your system audio settings
-
For WSL2 users:
- Audio passthrough may not work
- Consider running native Linux or dual-boot
📋 "Clipboard not working"
-
Install clipboard utilities:
# For X11 sudo apt install xclip xsel # For Wayland sudo apt install wl-clipboard
-
Test clipboard:
echo "test" | xclip -selection clipboard # or echo "test" | wl-copy
-
Manual copy fallback:
- If auto-copy fails, the text appears in the widget
- You can manually select and copy
🌐 "Groq API errors"
-
Verify API key:
# Check if env file exists cat src-tauri/aquavoice-frontend/.env -
Test API directly:
curl https://api.groq.com/openai/v1/models \ -H "Authorization: Bearer YOUR_API_KEY" -
Common API issues:
- Rate limit (400 requests/minute)
- Invalid API key format
- Network connectivity
| Error | Meaning | Solution |
|---|---|---|
| "Failed to start audio recording" | Microphone access issue | Check audio permissions and devices |
| "Groq API request failed" | API key or network issue | Verify API key and internet connection |
| "Clipboard operation failed" | Missing clipboard utility | Install xclip/wl-clipboard |
| "Window manager not detected" | No X11/Wayland | Ensure GUI environment is running |
maVoice Architecture
┌─────────────────────────────────────────────┐
│ Frontend (React) │
│ ┌─────────────────┬──────────────────┐ │
│ │ FloatingOverlay │ Settings UI │ │
│ │ Component │ Component │ │
│ └────────┬────────┴────────┬─────────┘ │
│ │ Tauri IPC │ │
└───────────┼─────────────────┼───────────────┘
│ │
┌───────────┼─────────────────┼───────────────┐
│ ▼ ▼ │
│ ┌──────────────┬─────────────────┐ │
│ │ Audio Module │ System Module │ │
│ │ (Recording) │ (Text Injection)│ │
│ └──────┬───────┴────────┬────────┘ │
│ │ │ │
│ ┌──────▼─────────────────▼───────┐ │
│ │ Groq API Module │ │
│ │ (Whisper Transcription) │ │
│ └────────────────────────────────┘ │
│ Backend (Rust/Tauri) │
└─────────────────────────────────────────────┘
main.rs: Tauri application entry point and command handlersaudio/groq_recorder.rs: Audio recording optimized for Groq (16kHz mono)api/groq.rs: Groq API client for Whisper transcriptionsystem/text_inject.rs: Cross-platform text injection (X11/Wayland)
App.tsx: Main application logic and global shortcutscomponents/FloatingOverlay.tsx: The floating widget UI- Real-time Features:
- Audio level visualization
- Status transitions
- Drag-and-drop positioning
# 1. Watch for Rust changes
cd src-tauri
cargo watch -x check
# 2. Run frontend dev server (separate terminal)
cd src-tauri/aquavoice-frontend
npm run dev
# 3. Run Tauri in dev mode (separate terminal)
npm run dev # From project root
# 4. Build and test
npm run build
# Test the built app
./src-tauri/target/release/mavoice// src-tauri/tauri.conf.json
{
"windows": [{
"width": 100, // Default: 72
"height": 30, // Default: 20
"x": 500, // Default: 300
"y": 100, // Default: 800
"transparent": true,
"alwaysOnTop": true
}]
}// src-tauri/aquavoice-frontend/src/components/FloatingOverlay.tsx
const styles = {
ready: "bg-blue-500", // Change colors
recording: "bg-red-500",
processing: "bg-orange-500",
completed: "bg-green-500"
};# Run Rust tests
cd src-tauri
cargo test
# Lint frontend
cd src-tauri/aquavoice-frontend
npm run lint
# Check Tauri
npm run tauri checkQ: Why is the window so small? A: maVoice is designed as a minimal floating widget that stays out of your way. It's intentionally tiny (72x20px) to be unobtrusive while remaining always accessible.
Q: Can I resize the widget?
A: The widget size is fixed by design, but developers can modify the dimensions in tauri.conf.json.
Q: Does it work with Wayland? A: Yes! maVoice supports both X11 and Wayland with automatic detection and appropriate fallbacks.
Q: Can I use it without Groq? A: Currently, maVoice is built specifically for Groq's Whisper API. Other providers may be added in future versions.
Q: Why Tauri instead of Electron? A: Tauri provides native performance with minimal resource usage - perfect for an always-on widget. Our app uses <50MB RAM compared to Electron's 150MB+.
Q: How secure is my API key? A: Your API key is stored locally in your browser's localStorage and never transmitted except to Groq's API directly.
Q: Can I contribute? A: Absolutely! Check our Contributing Guide and feel free to submit PRs, report bugs, or suggest features.
Q: Will Windows/macOS be supported? A: Yes! The codebase is already cross-platform ready. Official support is coming soon.
Q: Widget appears but recording doesn't start? A: Check audio permissions and ensure your microphone is not in use by another application.
Q: Transcription is slow? A: While Groq is fast, network latency can affect speed. Ensure stable internet connection.
Q: Text injection not working?
A: Install the required tools (xdotool for X11, wtype for Wayland) and check system permissions.
maVoice leverages Groq's incredible inference speed:
| Metric | Value | Notes |
|---|---|---|
| Transcription Speed | < 500ms | For 30-second audio |
| Memory Usage (Idle) | < 50MB | Rust efficiency |
| Memory Usage (Active) | < 100MB | During recording |
| CPU Usage | < 5% | During transcription |
| Widget Render | 60 FPS | Smooth animations |
| Audio Format | 16kHz mono | Optimized for Groq |
# Test transcription speed
time curl -X POST "https://api.groq.com/openai/v1/audio/transcriptions" \
-H "Authorization: Bearer $GROQ_API_KEY" \
-F "file=@test.wav" \
-F "model=whisper-large-v3-turbo"We love contributions! Whether it's:
- 🐛 Bug reports
- 💡 Feature requests
- 🔧 Pull requests
- 📖 Documentation improvements
- 🌐 Translations
- Fork the repository
- Create your feature branch (
git checkout -b feature/AmazingFeature) - Make your changes
- Run tests (
cargo test && npm run lint) - Commit (
git commit -m 'Add some AmazingFeature') - Push (
git push origin feature/AmazingFeature) - Open a Pull Request
Check out our detailed Contributing Guide for more information.
- Local First: All processing happens on your machine
- No Telemetry: We don't track anything
- Secure API: Your Groq API key is stored locally and never shared
- Open Source: Audit the code yourself!
maVoice is MIT licensed. See LICENSE for details.
- Groq - For providing insanely fast inference
- Whisper - OpenAI's amazing speech recognition model
- Tauri - For making native apps actually enjoyable to build
- Community - For feedback, contributions, and support
- You - For choosing open-source!