feat(engines): GlowTTS / Larynx inference adapter by JarbasAl · Pull Request #143 · TigreGotico/phoonnx

JarbasAl · 2026-06-05T13:54:16Z

Adds GlowTTS support — the flow-based engine behind Larynx, the mimic3/piper precursor. GlowTTS is two-stage (text→mel + a separate vocoder), so it reuses the vocoder registry built for Matcha-TTS.

What's in

GlowTTSAdapter — input/input_lengths/scales=[noise_scale,length_scale] → mel; finds the mel by its n_mels axis (Larynx emits an extra output) and runs the vocoder from engine_params.
glowtts_config.py — voice_config_from_larynx() builds a native VoiceConfig from a Larynx config.json + phonemes.txt (gruut phonemizer, blank-interspersed, 46-symbol table).
Engine.GLOWTTS + registration. Priority: GlowTTS shares the scales input with VITS, so it's probed first — distinguished by its mel (not waveform) output. VITS/Matcha detection unaffected.
Mirror: Larynx voices (cmu_aew, ljspeech) → OpenVoiceOS/phoonnx-glowtts with modernized native configs; the HiFi-GAN vocoder → OpenVoiceOS/phoonnx-vocoders. voice_index/glowtts.json links them (vocoder_url).
Docs: docs/glowtts.md.

Verified

Voices load from the index (auto-download model + vocoder) and synthesize end-to-end (en-US, gruut → mel → HiFi-GAN). 9 unit tests; full suite 176 passed, 1 skipped.

gruut is an optional runtime dependency (phonemization only) — not needed for import/CI.

🤖 Generated with Claude Code

Summary by CodeRabbit

Release Notes

New Features
- Added GlowTTS (Larynx) text-to-speech engine with configurable synthesis parameters
- Added Griffin-Lim vocoder as a fallback option for mel-to-audio conversion
- Added 30+ pre-configured Larynx and Coqui voices to the registry
Documentation
- Added GlowTTS engine documentation with configuration and usage guidance
- Added comprehensive vocoder documentation covering all supported types and setup
- Updated Matcha engine documentation with vocoder integration details
Tests
- Added test coverage for GlowTTS engine and vocoder implementations

coderabbitai · 2026-06-05T13:54:23Z

Warning

Review limit reached

@JarbasAl, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 48 minutes and 15 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 1af27460-22e4-45f5-a773-9cbdb23c7f19

📥 Commits

Reviewing files that changed from the base of the PR and between 0a3bd38 and 3d66966.

📒 Files selected for processing (3)

docs/vocoders.md
phoonnx/engines/vocoders/griffinlim.py
tests/test_glowtts.py

📝 Walkthrough

Walkthrough

This PR adds complete GlowTTS (Larynx) TTS engine support with a parametric Griffin-Lim vocoder fallback. Changes include the GlowTTS ONNX adapter, config conversion from Larynx/Coqui formats, mel preprocessing and Griffin-Lim vocoder implementation, voice registry with 50+ voice definitions, model manager integration for parametric vocoders, comprehensive documentation, and test coverage.

Changes

GlowTTS Engine and Vocoder Support

Layer / File(s)	Summary
Engine enumeration and registration `phoonnx/config.py`, `phoonnx/engines/__init__.py`	Add `Engine.GLOWTTS = "glowtts"` enum member and register `GlowTTSAdapter` with detection priority 42 between matcha (40) and vits (50).
GlowTTS adapter implementation `phoonnx/engines/glowtts.py`	Implement `GlowTTSAdapter` with ONNX feed-dict construction from phoneme IDs/lengths and noise/length scales, mel output detection via shape heuristics, mel-to-audio conversion via injected vocoder, and engine detection from config fields or ONNX session metadata.
GlowTTS config conversion bridges `phoonnx/engines/glowtts_config.py`	Implement `voice_config_from_larynx()` to load tokenizer from `phonemes.txt`, enforce blank interleaving (PAD id 0), and populate `VoiceConfig` from model/audio fields; implement `voice_config_from_coqui()` to derive vocabulary from graphemes or phonemes with configurable EOS/BOS/blank handling.
Griffin-Lim vocoder and mel preprocessing `phoonnx/engines/vocoders/base.py`, `phoonnx/engines/vocoders/griffinlim.py`, `phoonnx/engines/vocoders/__init__.py`, `phoonnx/engines/vocoders/raw.py`	Add parametric `GriffinLimVocoder` with mel basis caching and `librosa.griffinlim`-based audio synthesis; add `BaseVocoder._preprocess_mel()` for optional stats-normalized mel (per-channel mean/std normalization); update vocoder registry with `griffinlim` (priority 99) and `melgan` alias; apply preprocessing in `RawWaveformVocoder`.
Voice index and model manager integration `phoonnx/voice_index/glowtts.json`, `phoonnx/model_manager.py`	Populate `glowtts.json` with 50+ Larynx and Coqui GlowTTS voices (engine, phoneme type, vocoder type, URLs); extend `TTSModelInfo.engine_params()` to download/cache `vocoder.json` for parametric vocoders; add `glowtts.json` to default voice index merge.
Engine and vocoder documentation `docs/glowtts.md`, `docs/vocoders.md`, `docs/matcha.md`	Document GlowTTS two-stage flow, inference parameters, config conversion, voice indexing, vocoder selection/fallback, Coqui model conversion; introduce comprehensive vocoder registry guide with families, selection, preprocessing (`stats_norm`), builder API, swapping, and custom implementation; cross-link from matcha.md.
Comprehensive test coverage `tests/test_glowtts.py`	Add tests for GlowTTS adapter registration, engine detection from session outputs, feed-dict construction with scales, mel output selection and vocoding, default parameters; validate Larynx/Coqui config bridges (vocabulary, tokenizer, special tokens, roundtripping); test Griffin-Lim registration, mel-to-audio output, stats normalization preprocessing, and symmetric denormalization.

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly Related PRs

TigreGotico/phoonnx#49: Both PRs extend phoonnx/config.py Engine enum and update phoonnx/model_manager.py voice loading/merging logic to support new TTS engines (GlowTTS in this PR).
TigreGotico/phoonnx#131: This PR registers GlowTTSAdapter using the same pluggable ONNX engine registry framework introduced in #131 (phoonnx/engines/__init__.py and BaseOnnxAdapter interface).

🐰 A GlowTTS hops into the garden,
With Griffin-Lim chirping in the breeze,
Mel and waveform dance together,
As vocoder chains blend with ease,
Larynx whispers, Coqui sings—
Two paths, one voice, infinite wings! ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 11.76% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The PR title clearly and concisely describes the main change: adding GlowTTS/Larynx inference adapter support to the codebase.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/glowtts-engine

Warning

Review ran into problems

🔥 Problems

Stopped waiting for pipeline failures after 30000ms. One of your pipelines takes longer than our 30000ms fetch window to run, so review may not consider pipeline-failure results for inline comments if any failures occurred after the fetch window. Increase the timeout if you want to wait longer or run a @coderabbit review after the pipeline has finished.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Add GlowTTS (flow-based acoustic + separate vocoder) support — the engine behind Larynx, the mimic3/piper precursor. It is two-stage like Matcha-TTS (text -> mel, then a vocoder), so the adapter reuses the vocoder registry. - GlowTTSAdapter: input/input_lengths/scales=[noise_scale, length_scale] -> mel, picks the mel by its n_mels axis (Larynx emits an extra output) and runs the vocoder from engine_params. - glowtts_config.py: voice_config_from_larynx() builds a native VoiceConfig from a Larynx config.json + phonemes.txt (gruut, blank-interspersed tokenization). - Engine.GLOWTTS; registered with detect_priority before VITS (both have a `scales` input, but GlowTTS is identified by its mel output). - Mirror Larynx voices (cmu_aew, ljspeech) to OpenVoiceOS/phoonnx-glowtts with modernized native configs + the HiFi-GAN vocoder to phoonnx-vocoders; voice_index/glowtts.json links them. Verified: voices load from the index (auto-download model + vocoder) and synthesize end-to-end. 9 unit tests; full suite 176 passed. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Mirror the full Larynx glow_tts voice set (9 languages: en/de/es/fr/it/nl/ru/sv/sw, 51 voices) to OpenVoiceOS/phoonnx-glowtts with native configs. Phonemizer is auto-detected per voice from phonemes.txt (IPA -> gruut, plain chars -> graphemes); all 51 are gruut. Each linked to the HiFi-GAN vocoder. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Convert + mirror coqui-TTS GlowTTS voices (official zoo) alongside Larynx, with their finetuned, mel-matched vocoders for neural quality where available. - GriffinLimVocoder: parametric mel->audio vocoder (no model file), matching coqui's AudioProcessor de-normalization (db_to_amp / symmetric norm). Universal fallback for voices with no mel-matched neural vocoder. - "melgan" vocoder alias (multiband-melgan is a 1-output mel->audio ONNX). - voice_config_from_coqui(): build a native VoiceConfig from a coqui GlowTTS config ([pad,eos,bos]+chars/phonemes vocab; graphemes or espeak). - GlowTTSAdapter + model_manager: support a parametric vocoder (vocoder_type + config, no vocoder_url) so Griffin-Lim voices load via the standard path. - voice_index/glowtts.json: 58 voices (51 Larynx + 7 coqui official); vocoders 53 hifigan / 2 melgan / 3 griffinlim. Acoustic + HiFi-GAN/MelGAN vocoders are converted by standalone exporters that vendor only coqui's pure-torch model code (no coqui-tts dependency). Verified: voices load from the index (auto-download model + vocoder) and synthesize. Full suite 182 passed. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Multiband-MelGAN expects stats-normalized mels (scale_stats.npy mean/std), while GlowTTS emits dB-scale mels — feeding one to the other produced garbage. Add a config-flagged _preprocess_mel step on BaseVocoder so a converted vocoder declares its input convention: - stats_norm + mel_mean/mel_std -> standard-scale the mel (Coqui StandardScaler). The melgan vocoder.json carries the stats (from the vocoder's scale_stats.npy), so the runtime applies (mel - mean)/std before the ONNX. Opt-in per flag — HiFi-GAN voices (no stats) are untouched. en/ljspeech + uk/mai are neural MelGAN again (no Griffin-Lim fallback). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Add docs/vocoders.md documenting the shared vocoder registry used by GlowTTS, Matcha-TTS and OptiSpeech: the vocoder families (vocos/wavenext/hifigan/melgan/ raw/griffinlim), how a voice links its vocoder in the index, the config-driven mel preprocessing flags (stats_norm), and how to use, replace, swap, and add vocoders. Cross-linked from glowtts.md and matcha.md. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

github-actions · 2026-06-05T16:05:04Z

Systems nominal. Checks complete. 🛸

I've aggregated the results of the automated checks for this PR below.

🔍 Lint

Checking if everything is still on track. 🛤️

❌ ruff: issues found — see job log

📊 Coverage

Calculating the safety margins of your changes. 📐

❌ 38.8% total coverage

Files below 80% coverage (37 files)

File	Coverage	Missing lines
`phoonnx/cli.py`	0.0%	98
`phoonnx/thirdparty/kog2p/__init__.py`	0.0%	203
`phoonnx/thirdparty/mantoq/unicode_symbol2label.py`	0.0%	1
`phoonnx/thirdparty/bw2ipa.py`	7.5%	86
`phoonnx/thirdparty/mantoq/pyarabic/number.py`	7.7%	371
`phoonnx/thirdparty/mantoq/buck/phonetise_buckwalter.py`	10.4%	180
`phoonnx/thirdparty/hangul2ipa.py`	16.6%	372
`phoonnx/phonemizers/en.py`	17.5%	104
`phoonnx/thirdparty/mantoq/pyarabic/trans.py`	18.2%	135
`phoonnx/model_manager.py`	20.1%	211
`phoonnx/voice.py`	21.7%	220
`phoonnx/thirdparty/zh_num.py`	23.1%	83
`phoonnx/phonemizers/mul.py`	23.9%	236
`phoonnx/thirdparty/tashkeel/__init__.py`	23.9%	89
`phoonnx/phonemizers/zh.py`	27.0%	92
`phoonnx/phonemizers/ko.py`	30.4%	32
`phoonnx/phonemizers/gl.py`	31.1%	42
`phoonnx/phonemizers/ar.py`	31.2%	44
`phoonnx/thirdparty/mantoq/buck/tokenization.py`	32.5%	27
`phoonnx/thirdparty/phonikud/__init__.py`	35.3%	11
`phoonnx/phonemizers/ja.py`	36.0%	32
`phoonnx/phonemizers/fa.py`	36.4%	14
`phoonnx/phonemizers/pt.py`	38.1%	13
`phoonnx/thirdparty/mantoq/pyarabic/normalize.py`	38.1%	13
`phoonnx/thirdparty/mantoq/pyarabic/araby.py`	39.7%	298
`phoonnx/phonemizers/he.py`	40.0%	12
`phoonnx/phonemizers/vi.py`	40.0%	12
`phoonnx/phonemizers/base.py`	40.8%	71
`phoonnx/thirdparty/mantoq/pyarabic/stack.py`	45.5%	6
`phoonnx/thirdparty/mantoq/num2words.py`	47.6%	11
`phoonnx/phonemizers/mwl.py`	50.0%	8
`phoonnx/tokenizer.py`	52.4%	147
`phoonnx/thirdparty/mantoq/__init__.py`	60.0%	10
`phoonnx/thirdparty/mantoq/pyarabic/arabrepr.py`	60.0%	6
`phoonnx/config.py`	60.8%	130
`phoonnx/engines/vocoders/griffinlim.py`	61.4%	27
`phoonnx/engines/optispeech.py`	69.6%	24

Full report: download the coverage-report artifact.

🔒 Security (pip-audit)

Checking for any potential privacy concerns. 🕶️

✅ No known vulnerabilities found (61 packages scanned).

🏷️ Release Preview

Ensuring the release schedule is still on track. 🗓️

Current: 1.8.0a1 → Next: 1.9.0a1

Signal	Value
Label	`feature`
PR title	`feat(engines): GlowTTS / Larynx inference adapter`
Bump	minor

⚠️ No conventional commit prefix — alpha-only bump.
Suggested: fix: update the thing or feat: update the thing

🚀 Release Channel Compatibility

Predicted next version: 1.9.0a1

Channel	Status	Note	Current Constraint
Stable	⚪	Not in channel	-
Testing	⚪	Not in channel	-
Alpha	⚪	Not in channel	-

⚖️ License Check

Scanning for any non-commercial-only restrictions. 💰

❌ License violations detected (43 packages) — review required before merging.

Dependency                          License Name                                            License Type         Misc                                    
phoonnx:1.3.3                       Error                                                   Error                                                        

License Type                        Found                                                  
Error                               1

License distribution: 14× MIT License, 7× Apache Software License, 5× MIT, 3× Apache-2.0, 2× BSD-3-Clause, 2× ISC License (ISCL), 1× 3-Clause BSD License, 1× Apache Software License; BSD License, +8 more

Full breakdown — 43 packages

Package	Version	License	URL
`build`	1.5.0	MIT	link
`certifi`	2026.5.20	Mozilla Public License 2.0 (MPL 2.0)	link
`charset-normalizer`	3.4.7	MIT	link
`click`	8.4.1	BSD-3-Clause	link
`combo_lock`	0.3.1	Apache-2.0	link
`dateparser`	1.4.0	BSD License	link
`filelock`	3.29.1	MIT	link
`flatbuffers`	25.12.19	Apache Software License	link
`idna`	3.18	BSD-3-Clause	link
`json-database`	0.10.1	MIT	link
`kthread`	0.2.3	MIT License	link
`langcodes`	3.5.1	MIT License	link
`markdown-it-py`	4.2.0	MIT License	link
`mdurl`	0.1.2	MIT License	link
`memory-tempfile`	2.2.3	MIT License	link
`numpy`	2.4.6	BSD-3-Clause AND 0BSD AND MIT AND Zlib AND CC0-1.0	link
`onnxruntime`	1.26.0	MIT License	link
`ovos-config`	2.1.1	Apache-2.0	link
`ovos-date-parser`	0.7.0a5	Apache Software License	link
`ovos-number-parser`	0.5.1	Apache Software License	link
`ovos-utils`	0.8.5	Apache-2.0	link
`packaging`	26.2	Apache-2.0 OR BSD-2-Clause	link
`pexpect`	4.9.0	ISC License (ISCL)	link
`phoonnx`	1.8.0a1	Apache Software License	link
`protobuf`	7.35.0	3-Clause BSD License	link
`ptyprocess`	0.7.0	ISC License (ISCL)	link
`pyee`	13.0.1	MIT License	link
`Pygments`	2.20.0	BSD-2-Clause	link
`pyproject_hooks`	1.2.0	MIT License	link
`python-dateutil`	2.9.0.post0	Apache Software License; BSD License	link
`pytz`	2026.2	MIT License	link
`PyYAML`	6.0.3	MIT License	link
`quebra-frases`	0.3.7	Apache Software License	link
`regex`	2026.5.9	Apache-2.0 AND CNRI-Python	link
`requests`	2.34.2	Apache Software License	link
`rich`	13.9.4	MIT License	link
`rich-click`	1.9.8	MIT License

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

Policy: Apache 2.0 (universal donor). StrongCopyleft / NetworkCopyleft / WeakCopyleft / Other / Error categories fail. MPL allowed.

🔨 Build Tests

Ensuring the gears are properly lubricated. 💧

✅ All versions pass

Python	Build	Install	Tests
3.10	✅	✅	✅
3.11	✅	✅	✅
3.12	✅	✅	✅
3.13	✅	✅	✅
3.14	✅	✅	✅

📋 Repo Health

Scanning for any signs of 'comment' bad breath. 🌬️

⚠️ Some required files are missing.

Latest Version: 1.8.0a1

✅ phoonnx/version.py — Version file
✅ README.md — README
❌ LICENSE — License file
✅ pyproject.toml — pyproject.toml
⚠️ setup.py — setup.py
✅ CHANGELOG.md — Changelog
✅ phoonnx/version.py has valid version block markers

Keeping the repository healthy and happy. 😊

librosa lives in the [train] extra, not core, so a core install hits ModuleNotFoundError when a Griffin-Lim voice loads, and CI build_tests failed on test_griffinlim_mel_to_audio. Give GriffinLimVocoder a clear ImportError with an install hint, and skip the GL synthesis test when librosa is absent. Neural vocoders (HiFi-GAN/MelGAN) and all other engines are unaffected. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

github-actions Bot added the feature label Jun 5, 2026

JarbasAl and others added 5 commits June 5, 2026 17:01

JarbasAl force-pushed the feat/glowtts-engine branch from ff37adb to 0a3bd38 Compare June 5, 2026 16:04

JarbasAl marked this pull request as ready for review June 5, 2026 16:05

github-actions Bot added feature and removed feature labels Jun 5, 2026

JarbasAl merged commit 7afefb6 into dev Jun 5, 2026
11 of 12 checks passed

coderabbitai Bot mentioned this pull request Jun 6, 2026

feat(voices): coqui VITS engine + 36 voices across 33 languages #149

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(engines): GlowTTS / Larynx inference adapter#143

feat(engines): GlowTTS / Larynx inference adapter#143
JarbasAl merged 6 commits into
devfrom
feat/glowtts-engine

JarbasAl commented Jun 5, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jun 5, 2026 •

edited

Loading

Review limit reached

Walkthrough

Changes

Possibly Related PRs

❌ Failed checks (1 warning)

Review ran into problems

Uh oh!

github-actions Bot commented Jun 5, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

JarbasAl commented Jun 5, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What's in

Verified

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai Bot commented Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review limit reached

Walkthrough

Changes

Possibly Related PRs

❌ Failed checks (1 warning)

Review ran into problems

Uh oh!

github-actions Bot commented Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Systems nominal. Checks complete. 🛸

🔍 Lint

📊 Coverage

🔒 Security (pip-audit)

🏷️ Release Preview

⚖️ License Check

🔨 Build Tests

📋 Repo Health

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

JarbasAl commented Jun 5, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 5, 2026 •

edited

Loading

github-actions Bot commented Jun 5, 2026 •

edited

Loading