voicefield.dev · Documentation · npm
Phone-powered voice input for any desktop text field. Turn your phone into a wireless microphone for any web application.
Scan a QR code → speak into your phone → text appears in the web field. Real-time, open source, self-hostable.
┌─────────────────────────────┐
│ voicefield.dev / your host │ Static phone page (no data stored)
└────────────┬────────────────┘
│ loads phone page
▼
┌────────────────────┐ ┌──────────────────────┐
│ Phone browser │ POST │ Your server │
│ STT runs here │────────▶│ @voicefield/server │
│ (client-side) │ text │ (relay only) │
└────────────────────┘ └──────────┬───────────┘
│ SSE
▼
┌──────────────────────┐
│ Desktop browser │
│ @voicefield/react │
└──────────────────────┘
- Works out of the box — uses the browser's built-in Web Speech API, no API key needed
- Upgrade to Soniox — for higher accuracy, add a Soniox API key (or bring your own STT provider)
- No audio leaves the phone — STT runs client-side, your server only relays text
- Phone page is static — defaults to voicefield.dev, or self-host on your own domain
npm install @voicefield/react @voicefield/serverNo API key needed — works immediately with the browser's built-in speech recognition.
// app/api/voice/[...voicefield]/route.ts
import { createVoicefieldHandler } from '@voicefield/server'
const { GET, POST, OPTIONS } = createVoicefieldHandler({
cors: { origins: ['https://voicefield.dev'] },
})
export { GET, POST, OPTIONS }Want higher accuracy? Add Soniox or another cloud STT provider. See Upgrading to a cloud STT provider.
// app/mic/page.tsx
"use client"
export { Mic as default } from "@voicefield/react/phone"import { useVoicefield, QRPopup } from '@voicefield/react'
function MyComponent() {
const inputRef = useRef<HTMLInputElement>(null)
const vf = useVoicefield({
serverUrl: '/api/voice',
language: 'en',
})
vf.register('search', 'Search', inputRef)
return (
<>
<input ref={inputRef} />
<button onClick={() => vf.showQR()}>🎤</button>
<QRPopup
pairingCode={vf.pairingCode}
secret={vf.secret}
serverUrl={vf.serverUrl}
phoneUrl={vf.phoneUrl}
isVisible={vf.isQRVisible}
onClose={vf.hideQR}
/>
</>
)
}That's it. 3 files, and any web field has voice input.
A working example lives in apps/example/:
pnpm install && pnpm build
cd apps/example && pnpm devWorks immediately with Web Speech API. For Soniox, copy .env.local.example and add your key.
Phones need HTTPS for microphone access. Use ngrok to expose your local dev server:
# Terminal 1: start the example app
cd apps/example && pnpm dev # runs on http://localhost:3000
# Terminal 2: expose via ngrok
ngrok http 3000Open the ngrok HTTPS URL on your desktop, scan the QR code with your phone, and speak.
Web Speech API works great for most use cases. For higher accuracy or more language support, add a cloud provider like Soniox:
npm install @soniox/node// app/api/voice/[...voicefield]/route.ts
import { createVoicefieldHandler } from '@voicefield/server'
import { SonioxNodeClient } from '@soniox/node'
const soniox = new SonioxNodeClient({ api_key: process.env.SONIOX_API_KEY! })
const { GET, POST, OPTIONS } = createVoicefieldHandler({
generateSttKey: async () => {
const result = await soniox.auth.createTemporaryKey({
usage_type: 'transcribe_websocket',
expires_in_seconds: 1800,
})
return { temporaryApiKey: result.api_key, expiresAt: Date.now() + 1800_000 }
},
cors: { origins: ['https://voicefield.dev'] },
})
export { GET, POST, OPTIONS }The provider is selected automatically — if generateSttKey is configured, the phone uses Soniox. Otherwise, it falls back to the browser's Web Speech API. You can also build your own provider.
| Package | Description | npm |
|---|---|---|
@voicefield/core |
Types and utilities (zero deps) | |
@voicefield/react |
React hook + QR popup + phone page | |
@voicefield/server |
Next.js API route handler (relay) |
| Mode | Phone page | Server | HTTPS | Setup effort | Notes |
|---|---|---|---|---|---|
| Local (LAN) | Your /mic page |
localhost | Not needed | Zero | Desktop mic only — phones need HTTPS |
| ngrok | voicefield.dev | ngrok tunnel | Automatic | 1 command | Phone mic works, best for dev |
| mkcert | Your /mic page |
localhost + cert | Manual | Phone CA install | Phone mic works |
| Production | voicefield.dev | Your domain | Let's Encrypt | Standard deploy | Phone mic works |
| Self-hosted | Your domain | Your domain | Let's Encrypt | Deploy both | Phone mic works |
For local dev, mount the phone page in your app and let Voicefield auto-detect your LAN IP:
const vf = useVoicefield({
serverUrl: '/api/voice',
phoneUrl: '', // local mode — uses your server's /mic page
language: 'en',
})The QR code points to http://192.168.x.x:PORT/mic — phone connects over WiFi.
Important: This mode only works for desktop-to-desktop testing (mic in the same browser). Phones require HTTPS for microphone access — use ngrok or the default production mode instead:
ngrok http 3000Then open the ngrok HTTPS URL on your desktop. The QR code will automatically point the phone to the HTTPS tunnel.
const vf = useVoicefield({
serverUrl: '/api/voice',
// phoneUrl defaults to https://voicefield.dev
language: 'en',
})Phone loads voicefield.dev/mic (static, open source), all API calls go to your server.
- Audio stays on the phone — STT runs client-side, only text is relayed
- In-memory sessions — no database, no persistence, 30-min TTL
- Cryptographic pairing — 256-bit secret in QR, 384-bit session token
- Single-use codes — 6-digit pairing code deleted after use
- Your server controls everything — STT keys generated on your infra, provider of your choice
See Security Model for the full threat model and design.
| Document | Description |
|---|---|
| Architecture | System design, data flow, design decisions |
| API Reference | All endpoints, request/response shapes, error codes |
| Security | Threat model, auth flow, crypto primitives |
| Deployment | Detailed setup for all deployment modes |
| Troubleshooting | Common issues and fixes |
| Contributing | Dev setup, branching, code style, testing |
| Guide | Description |
|---|---|
| Add voice to Next.js | Step-by-step integration |
| Multi-field forms | Register multiple fields, field switching |
| Controlled inputs | Setter function pattern for React state |
| Custom STT provider | Replace Soniox with another STT |
| Self-host phone page | Deploy your own phone page |
Why not just use the browser's SpeechRecognition API? That's exactly what Voicefield does by default — but with a twist: it runs on the phone's browser (better mic hardware) and relays only text to the desktop. For higher accuracy, you can upgrade to a cloud STT provider like Soniox without changing any client code.
Why a relay server? The phone needs a way to send transcripts to the desktop. The relay is minimal — in-memory, no persistence, only text passes through. When using a cloud STT provider, the server also generates temporary API keys.
Why voicefield.dev? The phone page needs HTTPS for microphone access. Rather than making every developer set up HTTPS locally, the phone loads its UI from voicefield.dev (static, open source) while making all API calls to your server. For production, you can self-host the phone page.
# Clone and install
git clone https://github.com/tatargabor/voicefield.git
cd voicefield
pnpm install
# Build all packages
pnpm build
# Run example app (works immediately, no API key needed)
cd apps/example && pnpm devpnpm test # unit tests (vitest)
pnpm lint # eslint
pnpm format # prettier
pnpm format:check # check formatting
# E2E tests
cd apps/example && npx playwright test./scripts/publish.sh patch # bump all → build → npm publish → git tag → GitHub release
./scripts/publish.sh minor
./scripts/publish.sh major
./scripts/publish.sh --dry-run patch # preview without changesAll packages use lockstep versioning. Requires clean working tree, gh CLI, and npm auth.
MIT