Skip to content

feat(engines): GlowTTS / Larynx inference adapter#143

Merged
JarbasAl merged 6 commits into
devfrom
feat/glowtts-engine
Jun 5, 2026
Merged

feat(engines): GlowTTS / Larynx inference adapter#143
JarbasAl merged 6 commits into
devfrom
feat/glowtts-engine

Conversation

@JarbasAl
Copy link
Copy Markdown
Contributor

@JarbasAl JarbasAl commented Jun 5, 2026

Adds GlowTTS support — the flow-based engine behind Larynx, the mimic3/piper precursor. GlowTTS is two-stage (text→mel + a separate vocoder), so it reuses the vocoder registry built for Matcha-TTS.

What's in

  • GlowTTSAdapterinput/input_lengths/scales=[noise_scale,length_scale] → mel; finds the mel by its n_mels axis (Larynx emits an extra output) and runs the vocoder from engine_params.
  • glowtts_config.pyvoice_config_from_larynx() builds a native VoiceConfig from a Larynx config.json + phonemes.txt (gruut phonemizer, blank-interspersed, 46-symbol table).
  • Engine.GLOWTTS + registration. Priority: GlowTTS shares the scales input with VITS, so it's probed first — distinguished by its mel (not waveform) output. VITS/Matcha detection unaffected.
  • Mirror: Larynx voices (cmu_aew, ljspeech) → OpenVoiceOS/phoonnx-glowtts with modernized native configs; the HiFi-GAN vocoder → OpenVoiceOS/phoonnx-vocoders. voice_index/glowtts.json links them (vocoder_url).
  • Docs: docs/glowtts.md.

Verified

Voices load from the index (auto-download model + vocoder) and synthesize end-to-end (en-US, gruut → mel → HiFi-GAN). 9 unit tests; full suite 176 passed, 1 skipped.

gruut is an optional runtime dependency (phonemization only) — not needed for import/CI.

🤖 Generated with Claude Code

Summary by CodeRabbit

Release Notes

  • New Features

    • Added GlowTTS (Larynx) text-to-speech engine with configurable synthesis parameters
    • Added Griffin-Lim vocoder as a fallback option for mel-to-audio conversion
    • Added 30+ pre-configured Larynx and Coqui voices to the registry
  • Documentation

    • Added GlowTTS engine documentation with configuration and usage guidance
    • Added comprehensive vocoder documentation covering all supported types and setup
    • Updated Matcha engine documentation with vocoder integration details
  • Tests

    • Added test coverage for GlowTTS engine and vocoder implementations

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jun 5, 2026

Review Change Stack

Warning

Review limit reached

@JarbasAl, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 48 minutes and 15 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 1af27460-22e4-45f5-a773-9cbdb23c7f19

📥 Commits

Reviewing files that changed from the base of the PR and between 0a3bd38 and 3d66966.

📒 Files selected for processing (3)
  • docs/vocoders.md
  • phoonnx/engines/vocoders/griffinlim.py
  • tests/test_glowtts.py
📝 Walkthrough

Walkthrough

This PR adds complete GlowTTS (Larynx) TTS engine support with a parametric Griffin-Lim vocoder fallback. Changes include the GlowTTS ONNX adapter, config conversion from Larynx/Coqui formats, mel preprocessing and Griffin-Lim vocoder implementation, voice registry with 50+ voice definitions, model manager integration for parametric vocoders, comprehensive documentation, and test coverage.

Changes

GlowTTS Engine and Vocoder Support

Layer / File(s) Summary
Engine enumeration and registration
phoonnx/config.py, phoonnx/engines/__init__.py
Add Engine.GLOWTTS = "glowtts" enum member and register GlowTTSAdapter with detection priority 42 between matcha (40) and vits (50).
GlowTTS adapter implementation
phoonnx/engines/glowtts.py
Implement GlowTTSAdapter with ONNX feed-dict construction from phoneme IDs/lengths and noise/length scales, mel output detection via shape heuristics, mel-to-audio conversion via injected vocoder, and engine detection from config fields or ONNX session metadata.
GlowTTS config conversion bridges
phoonnx/engines/glowtts_config.py
Implement voice_config_from_larynx() to load tokenizer from phonemes.txt, enforce blank interleaving (PAD id 0), and populate VoiceConfig from model/audio fields; implement voice_config_from_coqui() to derive vocabulary from graphemes or phonemes with configurable EOS/BOS/blank handling.
Griffin-Lim vocoder and mel preprocessing
phoonnx/engines/vocoders/base.py, phoonnx/engines/vocoders/griffinlim.py, phoonnx/engines/vocoders/__init__.py, phoonnx/engines/vocoders/raw.py
Add parametric GriffinLimVocoder with mel basis caching and librosa.griffinlim-based audio synthesis; add BaseVocoder._preprocess_mel() for optional stats-normalized mel (per-channel mean/std normalization); update vocoder registry with griffinlim (priority 99) and melgan alias; apply preprocessing in RawWaveformVocoder.
Voice index and model manager integration
phoonnx/voice_index/glowtts.json, phoonnx/model_manager.py
Populate glowtts.json with 50+ Larynx and Coqui GlowTTS voices (engine, phoneme type, vocoder type, URLs); extend TTSModelInfo.engine_params() to download/cache vocoder.json for parametric vocoders; add glowtts.json to default voice index merge.
Engine and vocoder documentation
docs/glowtts.md, docs/vocoders.md, docs/matcha.md
Document GlowTTS two-stage flow, inference parameters, config conversion, voice indexing, vocoder selection/fallback, Coqui model conversion; introduce comprehensive vocoder registry guide with families, selection, preprocessing (stats_norm), builder API, swapping, and custom implementation; cross-link from matcha.md.
Comprehensive test coverage
tests/test_glowtts.py
Add tests for GlowTTS adapter registration, engine detection from session outputs, feed-dict construction with scales, mel output selection and vocoding, default parameters; validate Larynx/Coqui config bridges (vocabulary, tokenizer, special tokens, roundtripping); test Griffin-Lim registration, mel-to-audio output, stats normalization preprocessing, and symmetric denormalization.

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly Related PRs

  • TigreGotico/phoonnx#49: Both PRs extend phoonnx/config.py Engine enum and update phoonnx/model_manager.py voice loading/merging logic to support new TTS engines (GlowTTS in this PR).
  • TigreGotico/phoonnx#131: This PR registers GlowTTSAdapter using the same pluggable ONNX engine registry framework introduced in #131 (phoonnx/engines/__init__.py and BaseOnnxAdapter interface).

🐰 A GlowTTS hops into the garden,
With Griffin-Lim chirping in the breeze,
Mel and waveform dance together,
As vocoder chains blend with ease,
Larynx whispers, Coqui sings—
Two paths, one voice, infinite wings!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 11.76% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title clearly and concisely describes the main change: adding GlowTTS/Larynx inference adapter support to the codebase.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/glowtts-engine

Warning

Review ran into problems

🔥 Problems

Stopped waiting for pipeline failures after 30000ms. One of your pipelines takes longer than our 30000ms fetch window to run, so review may not consider pipeline-failure results for inline comments if any failures occurred after the fetch window. Increase the timeout if you want to wait longer or run a @coderabbit review after the pipeline has finished.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

JarbasAl and others added 5 commits June 5, 2026 17:01
Add GlowTTS (flow-based acoustic + separate vocoder) support — the engine behind
Larynx, the mimic3/piper precursor. It is two-stage like Matcha-TTS (text -> mel,
then a vocoder), so the adapter reuses the vocoder registry.

- GlowTTSAdapter: input/input_lengths/scales=[noise_scale, length_scale] -> mel,
  picks the mel by its n_mels axis (Larynx emits an extra output) and runs the
  vocoder from engine_params.
- glowtts_config.py: voice_config_from_larynx() builds a native VoiceConfig from a
  Larynx config.json + phonemes.txt (gruut, blank-interspersed tokenization).
- Engine.GLOWTTS; registered with detect_priority before VITS (both have a
  `scales` input, but GlowTTS is identified by its mel output).
- Mirror Larynx voices (cmu_aew, ljspeech) to OpenVoiceOS/phoonnx-glowtts with
  modernized native configs + the HiFi-GAN vocoder to phoonnx-vocoders;
  voice_index/glowtts.json links them.

Verified: voices load from the index (auto-download model + vocoder) and
synthesize end-to-end. 9 unit tests; full suite 176 passed.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Mirror the full Larynx glow_tts voice set (9 languages: en/de/es/fr/it/nl/ru/sv/sw,
51 voices) to OpenVoiceOS/phoonnx-glowtts with native configs. Phonemizer is
auto-detected per voice from phonemes.txt (IPA -> gruut, plain chars -> graphemes);
all 51 are gruut. Each linked to the HiFi-GAN vocoder.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Convert + mirror coqui-TTS GlowTTS voices (official zoo) alongside Larynx, with
their finetuned, mel-matched vocoders for neural quality where available.

- GriffinLimVocoder: parametric mel->audio vocoder (no model file), matching
  coqui's AudioProcessor de-normalization (db_to_amp / symmetric norm). Universal
  fallback for voices with no mel-matched neural vocoder.
- "melgan" vocoder alias (multiband-melgan is a 1-output mel->audio ONNX).
- voice_config_from_coqui(): build a native VoiceConfig from a coqui GlowTTS
  config ([pad,eos,bos]+chars/phonemes vocab; graphemes or espeak).
- GlowTTSAdapter + model_manager: support a parametric vocoder (vocoder_type +
  config, no vocoder_url) so Griffin-Lim voices load via the standard path.
- voice_index/glowtts.json: 58 voices (51 Larynx + 7 coqui official); vocoders
  53 hifigan / 2 melgan / 3 griffinlim.

Acoustic + HiFi-GAN/MelGAN vocoders are converted by standalone exporters that
vendor only coqui's pure-torch model code (no coqui-tts dependency). Verified:
voices load from the index (auto-download model + vocoder) and synthesize.
Full suite 182 passed.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Multiband-MelGAN expects stats-normalized mels (scale_stats.npy mean/std), while
GlowTTS emits dB-scale mels — feeding one to the other produced garbage. Add a
config-flagged _preprocess_mel step on BaseVocoder so a converted vocoder declares
its input convention:

- stats_norm + mel_mean/mel_std -> standard-scale the mel (Coqui StandardScaler).

The melgan vocoder.json carries the stats (from the vocoder's scale_stats.npy), so
the runtime applies (mel - mean)/std before the ONNX. Opt-in per flag — HiFi-GAN
voices (no stats) are untouched. en/ljspeech + uk/mai are neural MelGAN again
(no Griffin-Lim fallback).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Add docs/vocoders.md documenting the shared vocoder registry used by GlowTTS,
Matcha-TTS and OptiSpeech: the vocoder families (vocos/wavenext/hifigan/melgan/
raw/griffinlim), how a voice links its vocoder in the index, the config-driven
mel preprocessing flags (stats_norm), and how to use, replace, swap, and add
vocoders. Cross-linked from glowtts.md and matcha.md.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@JarbasAl JarbasAl force-pushed the feat/glowtts-engine branch from ff37adb to 0a3bd38 Compare June 5, 2026 16:04
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 5, 2026

Systems nominal. Checks complete. 🛸

I've aggregated the results of the automated checks for this PR below.

🔍 Lint

Checking if everything is still on track. 🛤️

ruff: issues found — see job log

📊 Coverage

Calculating the safety margins of your changes. 📐

38.8% total coverage

Files below 80% coverage (37 files)
File Coverage Missing lines
phoonnx/cli.py 0.0% 98
phoonnx/thirdparty/kog2p/__init__.py 0.0% 203
phoonnx/thirdparty/mantoq/unicode_symbol2label.py 0.0% 1
phoonnx/thirdparty/bw2ipa.py 7.5% 86
phoonnx/thirdparty/mantoq/pyarabic/number.py 7.7% 371
phoonnx/thirdparty/mantoq/buck/phonetise_buckwalter.py 10.4% 180
phoonnx/thirdparty/hangul2ipa.py 16.6% 372
phoonnx/phonemizers/en.py 17.5% 104
phoonnx/thirdparty/mantoq/pyarabic/trans.py 18.2% 135
phoonnx/model_manager.py 20.1% 211
phoonnx/voice.py 21.7% 220
phoonnx/thirdparty/zh_num.py 23.1% 83
phoonnx/phonemizers/mul.py 23.9% 236
phoonnx/thirdparty/tashkeel/__init__.py 23.9% 89
phoonnx/phonemizers/zh.py 27.0% 92
phoonnx/phonemizers/ko.py 30.4% 32
phoonnx/phonemizers/gl.py 31.1% 42
phoonnx/phonemizers/ar.py 31.2% 44
phoonnx/thirdparty/mantoq/buck/tokenization.py 32.5% 27
phoonnx/thirdparty/phonikud/__init__.py 35.3% 11
phoonnx/phonemizers/ja.py 36.0% 32
phoonnx/phonemizers/fa.py 36.4% 14
phoonnx/phonemizers/pt.py 38.1% 13
phoonnx/thirdparty/mantoq/pyarabic/normalize.py 38.1% 13
phoonnx/thirdparty/mantoq/pyarabic/araby.py 39.7% 298
phoonnx/phonemizers/he.py 40.0% 12
phoonnx/phonemizers/vi.py 40.0% 12
phoonnx/phonemizers/base.py 40.8% 71
phoonnx/thirdparty/mantoq/pyarabic/stack.py 45.5% 6
phoonnx/thirdparty/mantoq/num2words.py 47.6% 11
phoonnx/phonemizers/mwl.py 50.0% 8
phoonnx/tokenizer.py 52.4% 147
phoonnx/thirdparty/mantoq/__init__.py 60.0% 10
phoonnx/thirdparty/mantoq/pyarabic/arabrepr.py 60.0% 6
phoonnx/config.py 60.8% 130
phoonnx/engines/vocoders/griffinlim.py 61.4% 27
phoonnx/engines/optispeech.py 69.6% 24

Full report: download the coverage-report artifact.

🔒 Security (pip-audit)

Checking for any potential privacy concerns. 🕶️

✅ No known vulnerabilities found (61 packages scanned).

🏷️ Release Preview

Ensuring the release schedule is still on track. 🗓️

Current: 1.8.0a1Next: 1.9.0a1

Signal Value
Label feature
PR title feat(engines): GlowTTS / Larynx inference adapter
Bump minor

⚠️ No conventional commit prefix — alpha-only bump.
Suggested: fix: update the thing or feat: update the thing


🚀 Release Channel Compatibility

Predicted next version: 1.9.0a1

Channel Status Note Current Constraint
Stable Not in channel -
Testing Not in channel -
Alpha Not in channel -

⚖️ License Check

Scanning for any non-commercial-only restrictions. 💰

❌ License violations detected (43 packages) — review required before merging.

Dependency                          License Name                                            License Type         Misc                                    
phoonnx:1.3.3                       Error                                                   Error                                                        

License Type                        Found                                                  
Error                               1

License distribution: 14× MIT License, 7× Apache Software License, 5× MIT, 3× Apache-2.0, 2× BSD-3-Clause, 2× ISC License (ISCL), 1× 3-Clause BSD License, 1× Apache Software License; BSD License, +8 more

Full breakdown — 43 packages
Package Version License URL
build 1.5.0 MIT link
certifi 2026.5.20 Mozilla Public License 2.0 (MPL 2.0) link
charset-normalizer 3.4.7 MIT link
click 8.4.1 BSD-3-Clause link
combo_lock 0.3.1 Apache-2.0 link
dateparser 1.4.0 BSD License link
filelock 3.29.1 MIT link
flatbuffers 25.12.19 Apache Software License link
idna 3.18 BSD-3-Clause link
json-database 0.10.1 MIT link
kthread 0.2.3 MIT License link
langcodes 3.5.1 MIT License link
markdown-it-py 4.2.0 MIT License link
mdurl 0.1.2 MIT License link
memory-tempfile 2.2.3 MIT License link
numpy 2.4.6 BSD-3-Clause AND 0BSD AND MIT AND Zlib AND CC0-1.0 link
onnxruntime 1.26.0 MIT License link
ovos-config 2.1.1 Apache-2.0 link
ovos-date-parser 0.7.0a5 Apache Software License link
ovos-number-parser 0.5.1 Apache Software License link
ovos-utils 0.8.5 Apache-2.0 link
packaging 26.2 Apache-2.0 OR BSD-2-Clause link
pexpect 4.9.0 ISC License (ISCL) link
phoonnx 1.8.0a1 Apache Software License link
protobuf 7.35.0 3-Clause BSD License link
ptyprocess 0.7.0 ISC License (ISCL) link
pyee 13.0.1 MIT License link
Pygments 2.20.0 BSD-2-Clause link
pyproject_hooks 1.2.0 MIT License link
python-dateutil 2.9.0.post0 Apache Software License; BSD License link
pytz 2026.2 MIT License link
PyYAML 6.0.3 MIT License link
quebra-frases 0.3.7 Apache Software License link
regex 2026.5.9 Apache-2.0 AND CNRI-Python link
requests 2.34.2 Apache Software License link
rich 13.9.4 MIT License link
rich-click 1.9.8 MIT License

Copyright (c) 2022 Phil Ewels

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
| link |
| six | 1.17.0 | MIT License | link |
| typing_extensions | 4.15.0 | PSF-2.0 | link |
| tzlocal | 5.3.1 | MIT License | link |
| unicode-rbnf | 2.4.0 | MIT License | |
| urllib3 | 2.7.0 | MIT | link |
| watchdog | 6.0.0 | Apache Software License | link |

Policy: Apache 2.0 (universal donor). StrongCopyleft / NetworkCopyleft / WeakCopyleft / Other / Error categories fail. MPL allowed.

🔨 Build Tests

Ensuring the gears are properly lubricated. 💧

✅ All versions pass

Python Build Install Tests
3.10
3.11
3.12
3.13
3.14

📋 Repo Health

Scanning for any signs of 'comment' bad breath. 🌬️

⚠️ Some required files are missing.

Latest Version: 1.8.0a1

phoonnx/version.py — Version file
README.md — README
LICENSE — License file
pyproject.toml — pyproject.toml
⚠️ setup.py — setup.py
CHANGELOG.md — Changelog
phoonnx/version.py has valid version block markers


Keeping the repository healthy and happy. 😊

@JarbasAl JarbasAl marked this pull request as ready for review June 5, 2026 16:05
@github-actions github-actions Bot added feature and removed feature labels Jun 5, 2026
librosa lives in the [train] extra, not core, so a core install hits
ModuleNotFoundError when a Griffin-Lim voice loads, and CI build_tests failed on
test_griffinlim_mel_to_audio. Give GriffinLimVocoder a clear ImportError with an
install hint, and skip the GL synthesis test when librosa is absent. Neural
vocoders (HiFi-GAN/MelGAN) and all other engines are unaffected.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@JarbasAl JarbasAl merged commit 7afefb6 into dev Jun 5, 2026
11 of 12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant