Skip to content

tatargabor/voicefield

Repository files navigation

Voicefield

voicefield.dev · Documentation · npm

Phone-powered voice input for any desktop text field. Turn your phone into a wireless microphone for any web application.

Scan a QR code → speak into your phone → text appears in the web field. Real-time, open source, self-hostable.

How it works

┌─────────────────────────────┐
│  voicefield.dev / your host │  Static phone page (no data stored)
└────────────┬────────────────┘
             │ loads phone page
             ▼
┌────────────────────┐         ┌──────────────────────┐
│  Phone browser     │  POST   │  Your server         │
│  STT runs here     │────────▶│  @voicefield/server  │
│  (client-side)     │  text   │  (relay only)        │
└────────────────────┘         └──────────┬───────────┘
                                          │ SSE
                                          ▼
                               ┌──────────────────────┐
                               │  Desktop browser     │
                               │  @voicefield/react   │
                               └──────────────────────┘
  • Works out of the box — uses the browser's built-in Web Speech API, no API key needed
  • Upgrade to Soniox — for higher accuracy, add a Soniox API key (or bring your own STT provider)
  • No audio leaves the phone — STT runs client-side, your server only relays text
  • Phone page is static — defaults to voicefield.dev, or self-host on your own domain

Quick Start

npm install @voicefield/react @voicefield/server

No API key needed — works immediately with the browser's built-in speech recognition.

1. Add API route (Next.js App Router)

// app/api/voice/[...voicefield]/route.ts
import { createVoicefieldHandler } from '@voicefield/server'

const { GET, POST, OPTIONS } = createVoicefieldHandler({
  cors: { origins: ['https://voicefield.dev'] },
})

export { GET, POST, OPTIONS }

Want higher accuracy? Add Soniox or another cloud STT provider. See Upgrading to a cloud STT provider.

2. Mount the phone page (for local dev)

// app/mic/page.tsx
"use client"
export { Mic as default } from "@voicefield/react/phone"

3. Use in your component

import { useVoicefield, QRPopup } from '@voicefield/react'

function MyComponent() {
  const inputRef = useRef<HTMLInputElement>(null)

  const vf = useVoicefield({
    serverUrl: '/api/voice',
    language: 'en',
  })

  vf.register('search', 'Search', inputRef)

  return (
    <>
      <input ref={inputRef} />
      <button onClick={() => vf.showQR()}>🎤</button>
      <QRPopup
        pairingCode={vf.pairingCode}
        secret={vf.secret}
        serverUrl={vf.serverUrl}
        phoneUrl={vf.phoneUrl}
        isVisible={vf.isQRVisible}
        onClose={vf.hideQR}
      />
    </>
  )
}

That's it. 3 files, and any web field has voice input.

Example App

A working example lives in apps/example/:

pnpm install && pnpm build
cd apps/example && pnpm dev

Works immediately with Web Speech API. For Soniox, copy .env.local.example and add your key.

Testing with a phone (ngrok)

Phones need HTTPS for microphone access. Use ngrok to expose your local dev server:

# Terminal 1: start the example app
cd apps/example && pnpm dev   # runs on http://localhost:3000

# Terminal 2: expose via ngrok
ngrok http 3000

Open the ngrok HTTPS URL on your desktop, scan the QR code with your phone, and speak.

Upgrading to a Cloud STT Provider

Web Speech API works great for most use cases. For higher accuracy or more language support, add a cloud provider like Soniox:

npm install @soniox/node
// app/api/voice/[...voicefield]/route.ts
import { createVoicefieldHandler } from '@voicefield/server'
import { SonioxNodeClient } from '@soniox/node'

const soniox = new SonioxNodeClient({ api_key: process.env.SONIOX_API_KEY! })

const { GET, POST, OPTIONS } = createVoicefieldHandler({
  generateSttKey: async () => {
    const result = await soniox.auth.createTemporaryKey({
      usage_type: 'transcribe_websocket',
      expires_in_seconds: 1800,
    })
    return { temporaryApiKey: result.api_key, expiresAt: Date.now() + 1800_000 }
  },
  cors: { origins: ['https://voicefield.dev'] },
})

export { GET, POST, OPTIONS }

The provider is selected automatically — if generateSttKey is configured, the phone uses Soniox. Otherwise, it falls back to the browser's Web Speech API. You can also build your own provider.

Packages

Package Description npm
@voicefield/core Types and utilities (zero deps) npm
@voicefield/react React hook + QR popup + phone page npm
@voicefield/server Next.js API route handler (relay) npm

Deployment Modes

Mode Phone page Server HTTPS Setup effort Notes
Local (LAN) Your /mic page localhost Not needed Zero Desktop mic only — phones need HTTPS
ngrok voicefield.dev ngrok tunnel Automatic 1 command Phone mic works, best for dev
mkcert Your /mic page localhost + cert Manual Phone CA install Phone mic works
Production voicefield.dev Your domain Let's Encrypt Standard deploy Phone mic works
Self-hosted Your domain Your domain Let's Encrypt Deploy both Phone mic works

Local development (no tunnel needed)

For local dev, mount the phone page in your app and let Voicefield auto-detect your LAN IP:

const vf = useVoicefield({
  serverUrl: '/api/voice',
  phoneUrl: '',        // local mode — uses your server's /mic page
  language: 'en',
})

The QR code points to http://192.168.x.x:PORT/mic — phone connects over WiFi.

Important: This mode only works for desktop-to-desktop testing (mic in the same browser). Phones require HTTPS for microphone access — use ngrok or the default production mode instead:

ngrok http 3000

Then open the ngrok HTTPS URL on your desktop. The QR code will automatically point the phone to the HTTPS tunnel.

Production (hosted phone page)

const vf = useVoicefield({
  serverUrl: '/api/voice',
  // phoneUrl defaults to https://voicefield.dev
  language: 'en',
})

Phone loads voicefield.dev/mic (static, open source), all API calls go to your server.

Security

  • Audio stays on the phone — STT runs client-side, only text is relayed
  • In-memory sessions — no database, no persistence, 30-min TTL
  • Cryptographic pairing — 256-bit secret in QR, 384-bit session token
  • Single-use codes — 6-digit pairing code deleted after use
  • Your server controls everything — STT keys generated on your infra, provider of your choice

See Security Model for the full threat model and design.

Documentation

Document Description
Architecture System design, data flow, design decisions
API Reference All endpoints, request/response shapes, error codes
Security Threat model, auth flow, crypto primitives
Deployment Detailed setup for all deployment modes
Troubleshooting Common issues and fixes
Contributing Dev setup, branching, code style, testing

How-To Guides

Guide Description
Add voice to Next.js Step-by-step integration
Multi-field forms Register multiple fields, field switching
Controlled inputs Setter function pattern for React state
Custom STT provider Replace Soniox with another STT
Self-host phone page Deploy your own phone page

Why this architecture?

Why not just use the browser's SpeechRecognition API? That's exactly what Voicefield does by default — but with a twist: it runs on the phone's browser (better mic hardware) and relays only text to the desktop. For higher accuracy, you can upgrade to a cloud STT provider like Soniox without changing any client code.

Why a relay server? The phone needs a way to send transcripts to the desktop. The relay is minimal — in-memory, no persistence, only text passes through. When using a cloud STT provider, the server also generates temporary API keys.

Why voicefield.dev? The phone page needs HTTPS for microphone access. Rather than making every developer set up HTTPS locally, the phone loads its UI from voicefield.dev (static, open source) while making all API calls to your server. For production, you can self-host the phone page.

Development

# Clone and install
git clone https://github.com/tatargabor/voicefield.git
cd voicefield
pnpm install

# Build all packages
pnpm build

# Run example app (works immediately, no API key needed)
cd apps/example && pnpm dev

Testing & Linting

pnpm test           # unit tests (vitest)
pnpm lint           # eslint
pnpm format         # prettier
pnpm format:check   # check formatting

# E2E tests
cd apps/example && npx playwright test

Publishing

./scripts/publish.sh patch   # bump all → build → npm publish → git tag → GitHub release
./scripts/publish.sh minor
./scripts/publish.sh major
./scripts/publish.sh --dry-run patch  # preview without changes

All packages use lockstep versioning. Requires clean working tree, gh CLI, and npm auth.

License

MIT

About

Voice-enable any web field. Phone becomes mic via QR — open-source, self-hostable.

Topics

Resources

Contributing

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors