🎙️ maVoice

🚀 Open-Source Voice Dictation Powered by Groq's Lightning-Fast Inference

Experience the future of voice-to-text with Groq DEV Tier - Ultra-fast transcription that leaves OpenAI's free tier in the dust!

┌─────────────────┐
│  🎤 maVoice     │  ← Tiny floating widget (72x20px)
│ ▶ ■ ▪ ▪ ▪ ▪    │    Always on top of your screen
└─────────────────┘    Double-click to start!

📋 Table of Contents

✨ Features
🎯 What is maVoice?
🚀 Quick Start
- Platform Setup Guides
- System Requirements
🎮 How to Use
🔧 Troubleshooting
🧑‍💻 Developer Guide
❓ FAQ
🏎️ Performance
🤝 Contributing

✨ Features

⚡ Blazing Fast: Powered by Groq's Whisper Large v3 Turbo model - the fastest inference in the game
🎯 Native Performance: Built with Rust and Tauri for minimal resource usage
🎨 Floating Widget: Tiny, draggable overlay that stays out of your way
🔒 Privacy First: Your API key, your data - everything stays local
🌐 Cross-Platform: Works on Linux (Windows and macOS coming soon!)
🎤 Smart Recording: Real-time audio visualization and voice detection
📋 Instant Copy: Automatic clipboard integration for seamless workflow
⚙️ Advanced Settings: Comprehensive configuration panel with model selection
🎛️ Intuitive Controls: Double-click to start, single-click to stop
🌍 Multi-Language: Support for 100+ languages with custom prompts

🎯 What is maVoice?

maVoice is a floating voice dictation widget that lives on your desktop. Unlike traditional apps with windows and menus, maVoice is a tiny, always-accessible button that floats above your other applications.

The Floating Widget Design

Normal State           Recording            Processing           Success
┌─────────────┐       ┌─────────────┐     ┌─────────────┐     ┌─────────────┐
│ 🎤 maVoice  │  →    │ 🔴 ▶▶▶▶     │  →  │ 🟠 ◈◈◈◈◈    │  →  │ ✅ Done!    │
└─────────────┘       └─────────────┘     └─────────────┘     └─────────────┘
   (Blue)                 (Red)              (Orange)            (Green)

Size: 72x20 pixels (about the size of a small button)
Position: Fixed at coordinates x:300, y:800 by default
Behavior: Always on top, transparent background, no window borders
Dragging: Right-click or Ctrl+Left-click to drag to a new position

🚀 Quick Start

🌟 RECOMMENDED: WSL2 Setup (Windows Users)

✨ BREAKTHROUGH: WSL2 + WSLg provides PERFECT voice dictation with zero audio issues!

🪟 WSL2 Setup - THE BEST WAY (Recommended for Windows)

Why WSL2 is SUPERIOR to native Windows:

✅ ZERO audio configuration issues (no WASAPI problems)
✅ Native Linux performance with Windows GUI
✅ Perfect microphone integration through WSLg
✅ Flawless clipboard integration (no Windows API headaches)
✅ Instant setup - no Visual Studio Build Tools needed!

Prerequisites

Update WSL2 (from Windows PowerShell as Administrator):

wsl --update
wsl --version  # Ensure version 2 with WSLg

Install Debian/Ubuntu if you don't have it:
```
wsl --install -d Debian
```

🎯 FOOLPROOF Setup (5 minutes!)

💡 COPY-PASTE EACH BLOCK SEPARATELY FOR SUCCESS:

Step 1: Install Rust (Required)

# In your WSL2 terminal - paste and press Enter
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

⚠️ When prompted, press 1 then Enter to proceed with default installation

# Restart your shell or run this:
source $HOME/.cargo/env

Step 2: Install System Dependencies (One Line)

# Copy-paste this ENTIRE block as one command:
sudo apt update && sudo apt install -y \
    build-essential pkg-config libgtk-3-dev libwebkit2gtk-4.1-dev \
    libsoup-3.0-dev libjavascriptcoregtk-4.1-dev libdbus-1-dev \
    libappindicator3-dev librsvg2-dev libasound2-dev \
    xdotool wl-clipboard wtype

⚠️ Enter your password when prompted

Step 3: Clone & Install (AUTOMATED)

git clone https://github.com/Cwilliams333/maVoice-Enhanced.git
cd maVoice-Enhanced
./install.sh

✅ This installs ALL npm dependencies automatically!

Step 4: Add Your Groq API Key

# Replace "your_groq_api_key_here" with your ACTUAL key from console.groq.com
echo "VITE_GROQ_API_KEY=your_groq_api_key_here" > src-tauri/aquavoice-frontend/.env

Step 5: Launch maVoice!

npm run dev

🎉 Look for a tiny floating widget at position (300, 800) on your screen!

🚨 COMMON PITFALLS (READ THIS TO AVOID PAIN!)

❌ "I don't see the widget!"

It's TINY (72x20px) - look carefully around coordinates (300, 800)
Check if npm run dev shows errors - if so, something failed above

❌ "Permission denied during sudo apt install"

You need to enter your WSL2 password (it won't show characters as you type)

❌ "Rust installation failed"

Restart your terminal after installing Rust: source $HOME/.cargo/env

❌ "My API key doesn't work"

Get it from console.groq.com (must start with gsk_)
Replace your_groq_api_key_here with your ACTUAL key

❌ "No audio recording"

Make sure your microphone isn't being used by another app
WSL2 + WSLg handles audio automatically - no extra config needed!

🎯 What you'll see when it works:

A tiny floating widget (72x20px) appears on your Windows desktop
Right-click + drag to move it anywhere
Double-click to start recording
Single-click to stop and get instant transcription
Text automatically copied to Windows clipboard!

✅ SUCCESS VALIDATION CHECKLIST

After running npm run dev, verify these work:

Widget appears ✓
- Look for tiny floating button at (300, 800)
- Says "🎤 TALK" in blue
Recording works ✓
- Double-click widget → turns red with moving bars
- Speak for 2-3 seconds
- Single-click to stop → turns orange then green
Transcription works ✓
- Your speech appears as text
- Text automatically copied to clipboard
- Paste (Ctrl+V) in any Windows app to verify
Dragging works ✓
- Right-click + drag moves the widget anywhere

🎉 If all 4 work, you're GOOD TO GO!

Why This Works So Well

WSLg magic: Linux app appears as native Windows app
Audio perfection: Linux ALSA handles microphone flawlessly
Zero configuration: No Windows audio driver headaches
Best performance: Native Linux speed with Windows convenience

Platform Setup Guides

🐧 Native Linux Setup

Prerequisites Check

# Check if you have all requirements
command -v node >/dev/null 2>&1 && echo "✅ Node.js installed" || echo "❌ Node.js missing"
command -v rustc >/dev/null 2>&1 && echo "✅ Rust installed" || echo "❌ Rust missing"
command -v cargo >/dev/null 2>&1 && echo "✅ Cargo installed" || echo "❌ Cargo missing"

# Check display server
echo "Display Server: ${XDG_SESSION_TYPE:-$([[ -n $WAYLAND_DISPLAY ]] && echo wayland || echo x11)}"

Install System Dependencies

Debian/Ubuntu:

sudo apt update
sudo apt install -y \
    build-essential \
    pkg-config \
    libgtk-3-dev \
    libwebkit2gtk-4.1-dev \
    libsoup-3.0-dev \
    libjavascriptcoregtk-4.1-dev \
    libdbus-1-dev \
    libappindicator3-dev \
    librsvg2-dev \
    # For audio
    libasound2-dev \
    # For text injection
    xdotool \
    wl-clipboard \
    wtype

Fedora:

sudo dnf install -y \
    gcc \
    pkg-config \
    gtk3-devel \
    webkit2gtk4.0-devel \
    libsoup-devel \
    javascriptcoregtk4.0-devel \
    dbus-devel \
    libappindicator-gtk3-devel \
    librsvg2-devel \
    alsa-lib-devel \
    xdotool \
    wl-clipboard \
    wtype

Arch Linux:

sudo pacman -S --needed \
    base-devel \
    gtk3 \
    webkit2gtk \
    libsoup \
    dbus \
    libappindicator-gtk3 \
    librsvg \
    alsa-lib \
    xdotool \
    wl-clipboard \
    wtype

Installation Steps

# 1. Clone the repository
git clone https://github.com/lliWcWill/maVoice-Linux.git
cd maVoice-Linux

# 2. Run automated installer
./install.sh

# 3. Configure API key
echo "VITE_GROQ_API_KEY=your_groq_api_key_here" > src-tauri/aquavoice-frontend/.env

# 4. Run in development
npm run dev

# 5. Or build for production
npm run build
# Find .deb package in: src-tauri/target/release/bundle/deb/

⚠️ Native Windows Setup (Not Recommended - Use WSL2 Instead)

🚨 WARNING: Native Windows has known issues:

❌ WASAPI audio problems (microphone conflicts)
❌ Complex setup (Visual Studio Build Tools required)
❌ Audio driver conflicts with other applications
❌ Clipboard integration issues
❌ Performance overhead

👆 USE WSL2 SETUP ABOVE INSTEAD! It solves all these problems.

🪟 WSL2 Setup (Legacy - Superseded by Enhanced WSL2 Above)

Prerequisites

Update WSL2 (from Windows PowerShell as Administrator):
```
wsl --update
wsl --version  # Ensure version 2
```

Verify WSLg Support:

# In WSL2 terminal
echo $DISPLAY  # Should show :0
echo $WAYLAND_DISPLAY  # Should show wayland-0

Install Dependencies:

# First, install Rust
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
source $HOME/.cargo/env

# Then install system dependencies
sudo apt update
sudo apt install -y \
    build-essential \
    pkg-config \
    libgtk-3-dev \
    libwebkit2gtk-4.1-dev \
    libsoup-3.0-dev \
    libjavascriptcoregtk-4.1-dev \
    libdbus-1-dev \
    libappindicator3-dev \
    librsvg2-dev \
    libasound2-dev \
    xdotool \
    wl-clipboard \
    wtype

# Test GUI support
sudo apt install -y x11-apps
xclock  # Should display a clock window

Common WSL2 Issues

No GUI appears: Ensure WSLg is enabled and GPU drivers are updated on Windows
Audio not working: WSL2 audio passthrough may need configuration
Widget hard to find: The 72x20px window is tiny - look carefully or temporarily increase size in tauri.conf.json

🐳 Docker Setup (Experimental)

# Build and run with X11 forwarding
docker-compose -f docker-compose.dev.yml up

# Or manually
docker build -t mavoice .
docker run -it \
    -e DISPLAY=$DISPLAY \
    -v /tmp/.X11-unix:/tmp/.X11-unix \
    -v ~/.Xauthority:/root/.Xauthority \
    --device /dev/snd \
    mavoice

Note: Audio and clipboard integration may be limited in Docker.

System Requirements

Component	Minimum	Recommended
OS	Linux (X11/Wayland)	Ubuntu 22.04+, Fedora 38+
Node.js	18.0.0	20.0.0+
Rust	1.70.0	Latest stable
RAM	2GB	4GB+
Display	Any	1920x1080+
Audio	PulseAudio/ALSA	PulseAudio

Get Your Groq API Key

Visit console.groq.com
Sign up for a free account
Navigate to API Keys section
Create a new API key
Copy and save it securely

🎮 How to Use

Finding the Widget

When you first launch maVoice, look for a tiny floating widget:

Your Desktop:
┌─────────────────────────────────────────┐
│  File  Edit  View  Help                 │
│                                         │
│     ┌─────────────┐ ← Look here!       │
│     │ 🎤 maVoice  │   (x:300, y:800)    │
│     └─────────────┘                     │
│                                         │
└─────────────────────────────────────────┘

Basic Operations

Start Recording: Double-click the widget
- Widget turns red with animated bars
- Microphone activates immediately
Stop Recording: Single-click while recording
- Widget turns orange (processing)
- Then green (success) with transcribed text
- Text automatically copied to clipboard
Move the Widget: Right-click and drag (or Ctrl+Left-click)
Access Settings: Click the gear icon on the web interface

Keyboard Shortcuts

Shortcut	Action
`Ctrl+Shift+,`	Start/stop recording (global)
`Alt+Space`	Toggle recording
`Double Alt`	Quick record
`Spacebar`	Stop recording (while active)

Settings Configuration

Access settings at http://localhost:5173 or click the gear icon:

API Key: Securely store your Groq API key
Model Selection: Choose Whisper model variant
Temperature: Adjust creativity (0.0-1.0)
Language: Select from 100+ languages
Custom Prompt: Add technical terms, names, or style preferences
Max Tokens: Control response length

🔧 Troubleshooting

Quick Diagnostic Script

Create and run this script to check your setup:

#!/bin/bash
# Save as check-mavoice.sh and run with: bash check-mavoice.sh

echo "🔍 maVoice Diagnostic Check"
echo "=========================="

# Check Node.js
if command -v node >/dev/null 2>&1; then
    echo "✅ Node.js: $(node --version)"
else
    echo "❌ Node.js: Not installed"
fi

# Check Rust
if command -v rustc >/dev/null 2>&1; then
    echo "✅ Rust: $(rustc --version | cut -d' ' -f2)"
else
    echo "❌ Rust: Not installed"
fi

# Check display server
if [[ -n $WAYLAND_DISPLAY ]]; then
    echo "✅ Display: Wayland ($WAYLAND_DISPLAY)"
elif [[ -n $DISPLAY ]]; then
    echo "✅ Display: X11 ($DISPLAY)"
else
    echo "❌ Display: No display server detected"
fi

# Check audio
if command -v pactl >/dev/null 2>&1; then
    echo "✅ Audio: PulseAudio available"
elif [[ -d /dev/snd ]]; then
    echo "⚠️  Audio: ALSA only (may have issues)"
else
    echo "❌ Audio: No audio system detected"
fi

# Check critical dependencies
deps=("pkg-config" "xdotool" "wl-copy")
for dep in "${deps[@]}"; do
    if command -v $dep >/dev/null 2>&1; then
        echo "✅ $dep: Installed"
    else
        echo "❌ $dep: Missing"
    fi
done

Common Issues & Solutions

🚫 "Widget doesn't appear"

Check if process is running:
```
ps aux | grep mavoice
```
Look in the correct location (x:300, y:800):
- Top-left area of your screen
- It's only 72x20 pixels!

Temporarily increase widget size:

// Edit src-tauri/tauri.conf.json
"width": 200,   // Instead of 72
"height": 100,  // Instead of 20
"transparent": false,  // Make it visible

Check logs:

# Run with console output
npm run dev 2>&1 | tee mavoice.log

🎤 "No audio recording"

Check audio permissions:

# List audio devices
pactl list sources short

# Test microphone
arecord -d 5 test.wav && aplay test.wav

Select correct audio device:
- The app uses the system default
- Change default in your system audio settings
For WSL2 users:
- Audio passthrough may not work
- Consider running native Linux or dual-boot

📋 "Clipboard not working"

Install clipboard utilities:

# For X11
sudo apt install xclip xsel

# For Wayland
sudo apt install wl-clipboard

Test clipboard:

echo "test" | xclip -selection clipboard
# or
echo "test" | wl-copy

Manual copy fallback:
- If auto-copy fails, the text appears in the widget
- You can manually select and copy

🌐 "Groq API errors"

Verify API key:

# Check if env file exists
cat src-tauri/aquavoice-frontend/.env

Test API directly:

curl https://api.groq.com/openai/v1/models \
  -H "Authorization: Bearer YOUR_API_KEY"

Common API issues:
- Rate limit (400 requests/minute)
- Invalid API key format
- Network connectivity

Error Messages Explained

Error	Meaning	Solution
"Failed to start audio recording"	Microphone access issue	Check audio permissions and devices
"Groq API request failed"	API key or network issue	Verify API key and internet connection
"Clipboard operation failed"	Missing clipboard utility	Install xclip/wl-clipboard
"Window manager not detected"	No X11/Wayland	Ensure GUI environment is running

🧑‍💻 Developer Guide

Architecture Overview

maVoice Architecture
┌─────────────────────────────────────────────┐
│                Frontend (React)              │
│  ┌─────────────────┬──────────────────┐    │
│  │ FloatingOverlay │   Settings UI    │    │
│  │   Component     │   Component      │    │
│  └────────┬────────┴────────┬─────────┘    │
│           │ Tauri IPC       │               │
└───────────┼─────────────────┼───────────────┘
            │                 │
┌───────────┼─────────────────┼───────────────┐
│           ▼                 ▼               │
│   ┌──────────────┬─────────────────┐       │
│   │ Audio Module │  System Module  │       │
│   │ (Recording)  │ (Text Injection)│       │
│   └──────┬───────┴────────┬────────┘       │
│          │                 │                │
│   ┌──────▼─────────────────▼───────┐       │
│   │        Groq API Module         │       │
│   │    (Whisper Transcription)     │       │
│   └────────────────────────────────┘       │
│             Backend (Rust/Tauri)            │
└─────────────────────────────────────────────┘

Key Components

Backend (`src-tauri/src/`)

main.rs: Tauri application entry point and command handlers
audio/groq_recorder.rs: Audio recording optimized for Groq (16kHz mono)
api/groq.rs: Groq API client for Whisper transcription
system/text_inject.rs: Cross-platform text injection (X11/Wayland)

Frontend (`src-tauri/aquavoice-frontend/src/`)

App.tsx: Main application logic and global shortcuts
components/FloatingOverlay.tsx: The floating widget UI
Real-time Features:
- Audio level visualization
- Status transitions
- Drag-and-drop positioning

Development Workflow

# 1. Watch for Rust changes
cd src-tauri
cargo watch -x check

# 2. Run frontend dev server (separate terminal)
cd src-tauri/aquavoice-frontend
npm run dev

# 3. Run Tauri in dev mode (separate terminal)
npm run dev  # From project root

# 4. Build and test
npm run build
# Test the built app
./src-tauri/target/release/mavoice

Modifying the Widget

Change Size/Position

// src-tauri/tauri.conf.json
{
  "windows": [{
    "width": 100,    // Default: 72
    "height": 30,    // Default: 20
    "x": 500,        // Default: 300
    "y": 100,        // Default: 800
    "transparent": true,
    "alwaysOnTop": true
  }]
}

Customize Appearance

// src-tauri/aquavoice-frontend/src/components/FloatingOverlay.tsx
const styles = {
  ready: "bg-blue-500",      // Change colors
  recording: "bg-red-500",
  processing: "bg-orange-500",
  completed: "bg-green-500"
};

Testing

# Run Rust tests
cd src-tauri
cargo test

# Lint frontend
cd src-tauri/aquavoice-frontend
npm run lint

# Check Tauri
npm run tauri check

❓ FAQ

General Questions

Q: Why is the window so small? A: maVoice is designed as a minimal floating widget that stays out of your way. It's intentionally tiny (72x20px) to be unobtrusive while remaining always accessible.

Q: Can I resize the widget? A: The widget size is fixed by design, but developers can modify the dimensions in tauri.conf.json.

Q: Does it work with Wayland? A: Yes! maVoice supports both X11 and Wayland with automatic detection and appropriate fallbacks.

Q: Can I use it without Groq? A: Currently, maVoice is built specifically for Groq's Whisper API. Other providers may be added in future versions.

Technical Questions

Q: Why Tauri instead of Electron? A: Tauri provides native performance with minimal resource usage - perfect for an always-on widget. Our app uses <50MB RAM compared to Electron's 150MB+.

Q: How secure is my API key? A: Your API key is stored locally in your browser's localStorage and never transmitted except to Groq's API directly.

Q: Can I contribute? A: Absolutely! Check our Contributing Guide and feel free to submit PRs, report bugs, or suggest features.

Q: Will Windows/macOS be supported? A: Yes! The codebase is already cross-platform ready. Official support is coming soon.

Troubleshooting Questions

Q: Widget appears but recording doesn't start? A: Check audio permissions and ensure your microphone is not in use by another application.

Q: Transcription is slow? A: While Groq is fast, network latency can affect speed. Ensure stable internet connection.

Q: Text injection not working? A: Install the required tools (xdotool for X11, wtype for Wayland) and check system permissions.

🏎️ Performance

maVoice leverages Groq's incredible inference speed:

Metric	Value	Notes
Transcription Speed	< 500ms	For 30-second audio
Memory Usage (Idle)	< 50MB	Rust efficiency
Memory Usage (Active)	< 100MB	During recording
CPU Usage	< 5%	During transcription
Widget Render	60 FPS	Smooth animations
Audio Format	16kHz mono	Optimized for Groq

Benchmarks

# Test transcription speed
time curl -X POST "https://api.groq.com/openai/v1/audio/transcriptions" \
  -H "Authorization: Bearer $GROQ_API_KEY" \
  -F "file=@test.wav" \
  -F "model=whisper-large-v3-turbo"

🤝 Contributing

We love contributions! Whether it's:

🐛 Bug reports
💡 Feature requests
🔧 Pull requests
📖 Documentation improvements
🌐 Translations

Quick Contribution Guide

Fork the repository
Create your feature branch (git checkout -b feature/AmazingFeature)
Make your changes
Run tests (cargo test && npm run lint)
Commit (git commit -m 'Add some AmazingFeature')
Push (git push origin feature/AmazingFeature)
Open a Pull Request

Check out our detailed Contributing Guide for more information.

🔐 Privacy & Security

Local First: All processing happens on your machine
No Telemetry: We don't track anything
Secure API: Your Groq API key is stored locally and never shared
Open Source: Audit the code yourself!

📜 License

maVoice is MIT licensed. See LICENSE for details.

🙏 Acknowledgments

Groq - For providing insanely fast inference
Whisper - OpenAI's amazing speech recognition model
Tauri - For making native apps actually enjoyable to build
Community - For feedback, contributions, and support
You - For choosing open-source!

Built with ❤️ by developers who were tired of slow dictation

maVoice - Where speed meets simplicity

⭐ Star us on GitHub if you find this useful!

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
src-tauri		src-tauri
.gitignore		.gitignore
Dockerfile		Dockerfile
QUICK_REFERENCE.md		QUICK_REFERENCE.md
README.md		README.md
check-wsl-tauri-setup.sh		check-wsl-tauri-setup.sh
docker-compose.dev.yml		docker-compose.dev.yml
install-tauri2-deps.sh		install-tauri2-deps.sh
install.sh		install.sh
package-lock.json		package-lock.json
package.json		package.json
run-on-windows.md		run-on-windows.md
setup-mavoice.sh		setup-mavoice.sh
test-install-sequence.sh		test-install-sequence.sh
windows-text-injection-fix.md		windows-text-injection-fix.md

Cwilliams333/maVoice-Enhanced

Folders and files

Latest commit

History

Repository files navigation

🎙️ maVoice

🚀 Open-Source Voice Dictation Powered by Groq's Lightning-Fast Inference

📋 Table of Contents

✨ Features

🎯 What is maVoice?

The Floating Widget Design

🚀 Quick Start

🌟 RECOMMENDED: WSL2 Setup (Windows Users)

Prerequisites

🎯 FOOLPROOF Setup (5 minutes!)

🚨 COMMON PITFALLS (READ THIS TO AVOID PAIN!)

✅ SUCCESS VALIDATION CHECKLIST

Why This Works So Well

Platform Setup Guides

Prerequisites Check

Install System Dependencies

Installation Steps

Prerequisites

Common WSL2 Issues

System Requirements

Get Your Groq API Key

🎮 How to Use

Finding the Widget

Basic Operations

Keyboard Shortcuts

Settings Configuration

🔧 Troubleshooting

Quick Diagnostic Script

Common Issues & Solutions

Error Messages Explained

🧑‍💻 Developer Guide

Architecture Overview

Key Components

Backend (src-tauri/src/)

Frontend (src-tauri/aquavoice-frontend/src/)

Development Workflow

Modifying the Widget

Change Size/Position

Customize Appearance

Testing

❓ FAQ

General Questions

Technical Questions

Troubleshooting Questions

🏎️ Performance

Benchmarks

🤝 Contributing

Quick Contribution Guide

🔐 Privacy & Security

📜 License

🙏 Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Backend (`src-tauri/src/`)

Frontend (`src-tauri/aquavoice-frontend/src/`)

Packages