feat: offline pronunciation pipeline with NeMo CTC + GOP scoring by fasuizu-br · Pull Request #20 · silvioprog/englitune

fasuizu-br · 2026-01-31T15:47:46Z

Summary

Adds fully offline, privacy-first pronunciation evaluation using NeMo Conformer CTC Small (INT4, 17.1MB) via ONNX Runtime Web
Per-phoneme GOP scoring with Viterbi forced alignment — identifies exactly which sounds the student mispronounces
L1-aware Brazilian Portuguese accent adaptation (21 phonemes across 3 tiers) with decoded transcript verification
Replaces the Web Speech API approach from PR feat: add speaking practice with Web Speech API #19 which auto-corrected student speech (useless for pronunciation evaluation)

Technical details

Metric	Value
Model size	17.1 MB (INT4, 61% smaller than FP32)
WER (native)	2.9%
WER (BR accent)	17.8% (does NOT auto-correct)
Unit tests	155 passing
Cross-browser	Chrome, Firefox, Safari (incl. iOS)
Privacy	100% offline — no audio leaves device

Pipeline

AudioWorklet → WebWorker → ONNX inference → Viterbi alignment → GOP scoring → L1 adaptation

L1 Scoring Tiers (Brazilian Portuguese)

Tier 1 (50% boost): TH, DH, R, NG, ZH — absent in Portuguese
Tier 2 (40% boost): AE, IH, AH, UH, EY, OW, ER, AY, AW, OY — vowel confusion
Tier 3 (25% boost): L, T, D, S, Z — context-dependent differences

Real Audio Validation (Speech Accent Archive, George Mason University)

12 Brazilian speakers + 6 native Americans
50% BR-to-native gap recovered by L1 scoring
Statistical significance: p=0.0017, Cohen's d=1.28 (large effect)

Files changed

src/lib/speechUtils.ts — CTC processing, Viterbi alignment, GOP scoring, L1 adaptation
src/workers/stt-worker.ts — WebWorker for ONNX inference
src/hooks/useSpeechRecognition.ts — React hook (audio capture + worker communication)
src/components/Study/SpeakingPractice.tsx — UI with per-phoneme colored feedback
src/lib/types.ts — TypeScript types for pronunciation results
public/models/ — INT4 ONNX model (17.1MB) + token vocabulary

Test plan

npm run build passes cleanly
npx vitest run — 155 tests passing
Manual test: record pronunciation, verify per-phoneme scores appear
Verify L1 feedback tooltips show BR-specific phoneme adjustments
Test on mobile (Android Chrome, iOS Safari)
Verify RAM usage < 80MB in Chrome DevTools

Replace Web Speech API approach with a fully offline, privacy-first pronunciation evaluation system using NeMo Conformer CTC Small (INT4, 17.1MB) running via ONNX Runtime Web. Key features: - Per-phoneme GOP scoring with Viterbi forced alignment - L1-aware Brazilian Portuguese accent adaptation (21 phonemes, 3 tiers) - Decoded transcript verification with 1.5x boost for confirmed BR patterns - 100% offline, 100% private — no audio leaves the device - Cross-browser: Chrome, Firefox, Safari (including iOS) Technical details: - Model: NeMo Conformer CTC Small INT4 (17.1MB, 61% smaller than FP32) - WER: 2.9% native, 17.8% BR accent (does NOT auto-correct speech) - Pipeline: AudioWorklet → WebWorker → ONNX inference → Viterbi → GOP - 155 unit tests passing Validated with real audio from Speech Accent Archive (George Mason Univ): - 12 Brazilian speakers + 6 native Americans - 50% BR-to-native gap recovered by L1 scoring - Statistical significance: p=0.0017, Cohen's d=1.28

fasuizu-br force-pushed the feat/ctc-gop-pronunciation-pipeline branch from 28e8dc4 to f823235 Compare February 16, 2026 22:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: offline pronunciation pipeline with NeMo CTC + GOP scoring#20

feat: offline pronunciation pipeline with NeMo CTC + GOP scoring#20
fasuizu-br wants to merge 1 commit intosilvioprog:mainfrom
fasuizu-br:feat/ctc-gop-pronunciation-pipeline

fasuizu-br commented Jan 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

fasuizu-br commented Jan 31, 2026

Summary

Technical details

Pipeline

L1 Scoring Tiers (Brazilian Portuguese)

Real Audio Validation (Speech Accent Archive, George Mason University)

Files changed

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant