feat(engines): VITS2 + StyleTTS2 family (pure StyleTTS2 + Kokoro, multilingual) by JarbasAl · Pull Request #153 · TigreGotico/phoonnx

JarbasAl · 2026-06-07T17:04:07Z

Two new architectures (VITS2, StyleTTS2) + the full Kokoro family (170 StyleTTS2 voices).

VITS2

frappuccino/vits2-ru-natasha — identical I/O to VITS, runs on the existing VitsAdapter. vits2.json.

StyleTTS2 engine

New StyleTTS2Adapter (Engine.STYLETTS2): single-graph tokens + speed [+ style] [+ attention_mask] → waveform, end-to-end. Per-voice style packs flow via style_url → engine_params['style_path'] → configure().

Pure StyleTTS2 (ddatt/en-styletts2) — the real 5-onnx DDATT pipeline stitched into one graph with onnx.compose (ref style baked, no diffusion at inference). No re-export.

Kokoro — every public variant (170 voices)

Variant	Voices	G2P	Notes
v1.0	55	misaki en/ja/zh + espeak es/fr/hi/it/pt	base; European langs use misaki's espeak fallback
v1.1-zh (finetune)	103	misaki zh v1.1 (bopomofo) + en	100 Chinese voices; ZHG2P version switch
v0.19 (legacy)	11	misaki en	older weights

Key fixes:

misaki zh version switch — ZHG2P(version=...) wired through get_phonemizer's model arg. v1.0 zh = IPA (tone marks ↓↗↘), v1.1 zh = bopomofo + tone numbers (ㄋㄧ2ㄏㄠ3); must match the model's vocab.
espeak for European Kokoro — verified misaki's EspeakG2P == phoonnx espeak (identical IPA).
fp16 NaNs on CPU (no fp16 kernels) → int8 model_quantized (potato-size, CPU-stable) for v1.1-zh/v0.19.

Coverage impact

Archs: VITS2 + StyleTTS2. The onnx-stitch trick generalizes to split-onnx pipelines.
G2P: misaki now exercised across en/ja/zh(both reprs); European Kokoro via espeak.
All 170 voices validated from the index (no NaN). Suite 226 green.

🤖 Generated with Claude Code

Summary by CodeRabbit

New Features
- Added support for StyleTTS2 and Kokoro TTS engines with style-conditioned synthesis capabilities.
- Added language-specific phonemizers for English, Japanese, Chinese, Korean, and Vietnamese.
- Added Bopomofo alphabet support.
- Added Russian VITS2 voice model to the voice index.
Chores
- Updated optional language dependencies to include spacy support.
Tests
- Added test coverage for language-specific phonemizers and StyleTTS2 engine functionality.

- StyleTTS2Adapter (Engine.STYLETTS2): single-graph 'tokens + speed [+ style] [+ attention_mask] -> waveform', covering pure StyleTTS2 (baked reference style) and Kokoro (per-voice style pack, length-indexed). StyleTTS2 $-pad convention. - Pure StyleTTS2 indexed (ddatt/en-styletts2): the DDATT 5-onnx pipeline STITCHED into one graph via onnx.compose (plbert->bert->final, ref_p/ref_s baked) -- no re-export. Validated through the pipeline (rms 0.14). - VITS2 (frappuccino/vits2-ru-natasha): runs on the VitsAdapter (identical I/O); Russian graphemes. vits2.json. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…le packs) - StyleTTS2Adapter.configure() loads a per-voice style pack from engine_params['style_path'], reshaped to [N, 256] (Kokoro: 510 rows indexed by token length; a fixed style is [1, 256]). - model_manager: style_url field + download_style() -> engine_params['style_path'] (mirrors the vocoder per-voice-asset flow). - Indexed 29 English Kokoro voices (af/am/bf/bm) on the shared Kokoro-82M fp16 onnx (potato-size) + misaki G2P; per-voice .bin styles. Voices verified distinct. - pyproject: spacy>=3.7 guard on the misaki extras (en/ja/vi/zh). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

coderabbitai · 2026-06-07T17:04:13Z

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: fe986cf0-9db2-455f-ae3d-205363a261e8

📥 Commits

Reviewing files that changed from the base of the PR and between b2faeef and fb175a4.

📒 Files selected for processing (11)

phoonnx/config.py
phoonnx/engines/__init__.py
phoonnx/engines/styletts2.py
phoonnx/model_manager.py
phoonnx/phonemizers/__init__.py
phoonnx/phonemizers/mul.py
phoonnx/voice_index/styletts2.json
phoonnx/voice_index/vits2.json
pyproject.toml
tests/test_misaki_split.py
tests/test_styletts2.py

📝 Walkthrough

Walkthrough

This PR introduces StyleTTS2/Kokoro engine support with optional per-voice style packs, expands MisakiPhonemizer with language-specific variants and alphabet selection, and updates config enums, model discovery, and dependencies to support both features.

Changes

StyleTTS2 Engine and Misaki Phonemizer Expansion

Layer / File(s)	Summary
Config enum and phonemizer dispatch extensions `phoonnx/config.py`	Engine enum gains `STYLETTS2`, Alphabet gains `BOPOMOFO`, and PhonemeType expands to language-specific `MISAKI_EN`, `MISAKI_JA`, `MISAKI_ZH`, `MISAKI_KO`, `MISAKI_VI` variants. The `get_phonemizer()` dispatch logic is updated to pass `alphabet` to phonemizer constructors and route new PhonemeType variants to their corresponding classes.
Misaki phonemizer alphabet support and language variants `phoonnx/phonemizers/mul.py`, `phoonnx/phonemizers/__init__.py`	MisakiPhonemizer constructor accepts `alphabet` parameter and introduces a `zh_version` property to select Chinese backend variant (IPA or BOPOMOFO). Five new subclasses (`MisakiEnPhonemizer`, `MisakiJaPhonemizer`, `MisakiZhPhonemizer`, `MisakiKoPhonemizer`, `MisakiViPhonemizer`) narrow `MISAKI_LANGS` by language while inheriting lazy-loading dispatch behavior.
Misaki phonemizer test suite `tests/test_misaki_split.py`	Tests validate that `get_phonemizer()` maps each `PhonemeType.MISAKI_*` to the correct Misaki subclass, verify `zh_version` is alphabet-driven, confirm language-scope narrowing via `MISAKI_LANGS`, and assert backward compatibility of the base `MisakiPhonemizer`.
StyleTTS2 adapter implementation and registration `phoonnx/engines/styletts2.py`, `phoonnx/engines/__init__.py`	New `StyleTTS2Adapter` handles token padding with StyleTTS2 pad ID at both ends, optional style pack indexing by token sequence length, and selects the largest output tensor as waveform. The adapter is registered with priority 33 and supports detection for both `styletts2` and `kokoro` engines. Phoneme IDs are converted to `int64`, attention mask is created, and speed parameter is passed through.
StyleTTS2 adapter test suite `tests/test_styletts2.py`	Tests confirm adapter registration and engine detection, validate feed dict padding (5 tokens + 2 pad = 7), verify style pack token-length indexing for Kokoro, validate waveform selection in output parsing, and confirm `configure()` loads style binary from engine parameters and reshapes to `[510, 256]`.
Model manager style URL and download support `phoonnx/model_manager.py`	TTSModelInfo adds optional `style_url` field and `download_style()` method to cache StyleTTS2/Kokoro style embeddings locally. `engine_params()` passes the resolved `style_path` into synthesis parameters. `merge_default_voices()` additionally loads `vits2.json` and `styletts2.json` bundled voice indexes.
Dependency and voice index updates `pyproject.toml`, `phoonnx/voice_index/vits2.json`	pyproject.toml adds `spacy>=3.7` to en, ja, vi, zh language extras. New vits2.json entry for `frappuccino/vits2-ru-natasha` with Hugging Face model/config URLs and phoneme/encoding metadata.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

TigreGotico/phoonnx#131: Introduces the BaseOnnxAdapter framework and phoonnx.engines registration mechanism that this PR extends with StyleTTS2Adapter.
TigreGotico/phoonnx#149: Also extends TTSModelManager.merge_default_voices() to load additional bundled voice indexes, though for different engines.
TigreGotico/phoonnx#70: Similarly modifies get_phonemizer() to incorporate Alphabet into phonemizer wiring for a different phonemizer family.

Poem

🐰 A hop through configs new and sound,
Language-split Misakis abound!
StyleTTS2 paints with style and speed,
For Kokoro's flowing voice we need.
Padded tokens, waveforms bright—
Speech synthesis takes flight! 🎵

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/vits2-styletts

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-06-07T17:04:43Z

Greetings! I've analyzed your changes and have some results to share. 🖖

I've aggregated the results of the automated checks for this PR below.

📋 Repo Health

Scanning for any signs of 'comment' bad breath. 🌬️

⚠️ Some required files are missing.

Latest Version: 1.15.0a1

✅ phoonnx/version.py — Version file
✅ README.md — README
❌ LICENSE — License file
✅ pyproject.toml — pyproject.toml
⚠️ setup.py — setup.py
✅ CHANGELOG.md — Changelog
✅ phoonnx/version.py has valid version block markers

🔍 Lint

Everything looks good so far! ✅

❌ ruff: issues found — see job log

📊 Coverage

How well-protected is our logic? Let's find out! 🛡️

❌ 40.5% total coverage

Files below 80% coverage (37 files)

File	Coverage	Missing lines
`phoonnx/cli.py`	0.0%	98
`phoonnx/thirdparty/kog2p/__init__.py`	0.0%	203
`phoonnx/thirdparty/mantoq/unicode_symbol2label.py`	0.0%	1
`phoonnx/thirdparty/bw2ipa.py`	7.5%	86
`phoonnx/thirdparty/mantoq/pyarabic/number.py`	7.7%	371
`phoonnx/thirdparty/mantoq/buck/phonetise_buckwalter.py`	10.4%	180
`phoonnx/thirdparty/hangul2ipa.py`	16.6%	372
`phoonnx/phonemizers/en.py`	17.5%	104
`phoonnx/thirdparty/mantoq/pyarabic/trans.py`	18.2%	135
`phoonnx/model_manager.py`	19.4%	229
`phoonnx/voice.py`	21.7%	220
`phoonnx/thirdparty/zh_num.py`	23.1%	83
`phoonnx/thirdparty/tashkeel/__init__.py`	23.9%	89
`phoonnx/phonemizers/zh.py`	27.0%	92
`phoonnx/phonemizers/mul.py`	27.6%	234
`phoonnx/phonemizers/ko.py`	30.4%	32
`phoonnx/phonemizers/gl.py`	31.1%	42
`phoonnx/phonemizers/ar.py`	31.2%	44
`phoonnx/thirdparty/mantoq/buck/tokenization.py`	32.5%	27
`phoonnx/thirdparty/phonikud/__init__.py`	35.3%	11
`phoonnx/phonemizers/ja.py`	36.0%	32
`phoonnx/phonemizers/fa.py`	36.4%	14
`phoonnx/phonemizers/pt.py`	38.1%	13
`phoonnx/thirdparty/mantoq/pyarabic/normalize.py`	38.1%	13
`phoonnx/thirdparty/mantoq/pyarabic/araby.py`	39.7%	298
`phoonnx/phonemizers/he.py`	40.0%	12
`phoonnx/phonemizers/vi.py`	40.0%	12
`phoonnx/phonemizers/base.py`	40.8%	71
`phoonnx/thirdparty/mantoq/pyarabic/stack.py`	45.5%	6
`phoonnx/thirdparty/mantoq/num2words.py`	47.6%	11
`phoonnx/phonemizers/mwl.py`	50.0%	8
`phoonnx/tokenizer.py`	52.4%	147
`phoonnx/thirdparty/mantoq/__init__.py`	60.0%	10
`phoonnx/thirdparty/mantoq/pyarabic/arabrepr.py`	60.0%	6
`phoonnx/engines/vocoders/griffinlim.py`	61.4%	27
`phoonnx/config.py`	65.8%	120
`phoonnx/engines/optispeech.py`	69.6%	24

Full report: download the coverage-report artifact.

⚖️ License Check

Checking for any restrictive patent clauses. 📜

❌ License violations detected (43 packages) — review required before merging.

Dependency                          License Name                                            License Type         Misc                                    
phoonnx:1.3.3                       Error                                                   Error                                                        

License Type                        Found                                                  
Error                               1

License distribution: 14× MIT License, 7× Apache Software License, 5× MIT, 3× Apache-2.0, 2× BSD-3-Clause, 2× ISC License (ISCL), 1× 3-Clause BSD License, 1× Apache Software License; BSD License, +8 more

Full breakdown — 43 packages

Package	Version	License	URL
`build`	1.5.0	MIT	link
`certifi`	2026.5.20	Mozilla Public License 2.0 (MPL 2.0)	link
`charset-normalizer`	3.4.7	MIT	link
`click`	8.4.1	BSD-3-Clause	link
`combo_lock`	0.3.1	Apache-2.0	link
`dateparser`	1.4.0	BSD License	link
`filelock`	3.29.1	MIT	link
`flatbuffers`	25.12.19	Apache Software License	link
`idna`	3.18	BSD-3-Clause	link
`json-database`	0.10.1	MIT	link
`kthread`	0.2.3	MIT License	link
`langcodes`	3.5.1	MIT License	link
`markdown-it-py`	4.2.0	MIT License	link
`mdurl`	0.1.2	MIT License	link
`memory-tempfile`	2.2.3	MIT License	link
`numpy`	2.4.6	BSD-3-Clause AND 0BSD AND MIT AND Zlib AND CC0-1.0	link
`onnxruntime`	1.26.0	MIT License	link
`ovos-config`	2.1.1	Apache-2.0	link
`ovos-date-parser`	0.7.0a5	Apache Software License	link
`ovos-number-parser`	0.5.1	Apache Software License	link
`ovos-utils`	0.8.5	Apache-2.0	link
`packaging`	26.2	Apache-2.0 OR BSD-2-Clause	link
`pexpect`	4.9.0	ISC License (ISCL)	link
`phoonnx`	1.15.0a1	Apache Software License	link
`protobuf`	7.35.0	3-Clause BSD License	link
`ptyprocess`	0.7.0	ISC License (ISCL)	link
`pyee`	13.0.1	MIT License	link
`Pygments`	2.20.0	BSD-2-Clause	link
`pyproject_hooks`	1.2.0	MIT License	link
`python-dateutil`	2.9.0.post0	Apache Software License; BSD License	link
`pytz`	2026.2	MIT License	link
`PyYAML`	6.0.3	MIT License	link
`quebra-frases`	0.3.7	Apache Software License	link
`regex`	2026.5.9	Apache-2.0 AND CNRI-Python	link
`requests`	2.34.2	Apache Software License	link
`rich`	13.9.4	MIT License	link
`rich-click`	1.9.8	MIT License

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

Policy: Apache 2.0 (universal donor). StrongCopyleft / NetworkCopyleft / WeakCopyleft / Other / Error categories fail. MPL allowed.

🔨 Build Tests

The build pipeline has finished its work. 🏁

✅ All versions pass

Python	Build	Install	Tests
3.10	✅	✅	✅
3.11	✅	✅	✅
3.12	✅	✅	✅
3.13	✅	✅	✅
3.14	✅	✅	✅

🏷️ Release Preview

A look ahead at the next milestone. 🚩

Current: 1.15.0a1 → Next: 1.16.0a1

Signal	Value
Label	`feature`
PR title	`feat(engines): VITS2 + StyleTTS2 family (pure StyleTTS2 + Kokoro, multilingual)`
Bump	minor

⚠️ No conventional commit prefix — alpha-only bump.
Suggested: fix: update the thing or feat: update the thing

🚀 Release Channel Compatibility

Predicted next version: 1.16.0a1

Channel	Status	Note	Current Constraint
Stable	⚪	Not in channel	-
Testing	⚪	Not in channel	-
Alpha	⚪	Not in channel	-

🔒 Security (pip-audit)

Ensuring our dependency tree is clean of rot. 🌳

✅ No known vulnerabilities found (61 packages scanned).

Thanks for making OVOS better today! 🙌

Index 13 Asian-language Kokoro voices on the StyleTTS2 engine, exercising the misaki ja (JAG2P, openjtalk/unidic) and zh (ZHG2P, pypinyin/jieba) G2Ps: - ja (5): jf_alpha/gongitsune/nezumi/tebukuro, jm_kumo - zh (8): zf_xiaobei/xiaoni/xiaoxiao/xiaoyi, zm_yunjian/yunxi/yunxia/yunyang Per-language config (lang_code drives the misaki dispatch); shared Kokoro-82M fp16 onnx + per-voice style packs. Validated from the index. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…uropean - misaki zh G2P version switch: ZHG2P(version=...) wired through get_phonemizer's model arg (phonemizer_model). v1.0 zh = IPA (tone marks), v1.1 zh = bopomofo + tone numbers; the version must match the model's vocab. - Kokoro v1.1-zh finetune (int8, CPU-stable potato-size): 100 Chinese (zf/zm, version 1.1) + 3 English voices. - Kokoro v1.0 European voices via espeak (misaki's EspeakG2P fallback): es/fr/hi/ it/pt (13). - Kokoro v0.19 legacy (int8): 11 English voices. - fp16 onnx NaNs on CPU (no fp16 kernels) -> int8 model_quantized for v1.1-zh/v0.19. styletts2.json: 170 voices. All validated from the index (no NaN). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…ia alphabet misaki is not a thin wrapper — en ships ~6MB curated lexicons + spacy, ja a cutlet romanizer + lexicon, zh adds tone sandhi + frontend on pypinyin; only the espeak fallback (es/fr/hi/it/pt) is a passthrough. Split the single dispatching MisakiPhonemizer into per-language phoneme types: MISAKI_EN MISAKI_JA MISAKI_ZH MISAKI_KO MISAKI_VI The zh IPA-vs-bopomofo difference is just a representation, so it's the ALPHABET, not a separate class or version param: MISAKI_ZH + Alphabet.IPA -> misaki v1.0 (IPA + tone marks), + Alphabet.BOPOMOFO -> v1.1 (bopomofo + tone numbers). Added Alphabet.BOPOMOFO; misaki phonemizers default to IPA. The base class stays a back-compat dispatcher for the legacy 'misaki' type. Kokoro voices re-indexed to the explicit types (v1.1-zh = misaki_zh + bopomofo). Suite 229. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

JarbasAl and others added 2 commits June 7, 2026 17:34

github-actions Bot added feature and removed feature labels Jun 7, 2026

JarbasAl changed the title ~~feat(engines): VITS2 + StyleTTS2 family (pure StyleTTS2 + Kokoro)~~ feat(engines): VITS2 + StyleTTS2 family (pure StyleTTS2 + Kokoro, multilingual) Jun 7, 2026

github-actions Bot added feature and removed feature labels Jun 7, 2026

JarbasAl marked this pull request as ready for review June 7, 2026 18:54

JarbasAl merged commit 9b07bf1 into dev Jun 7, 2026
11 of 12 checks passed

github-actions Bot added feature and removed feature labels Jun 7, 2026

coderabbitai Bot mentioned this pull request Jun 8, 2026

feat(engines): YourTTS engine + zero-shot voice cloning (speaker encoder + registry) #156

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(engines): VITS2 + StyleTTS2 family (pure StyleTTS2 + Kokoro, multilingual)#153

feat(engines): VITS2 + StyleTTS2 family (pure StyleTTS2 + Kokoro, multilingual)#153
JarbasAl merged 5 commits into
devfrom
feat/vits2-styletts

JarbasAl commented Jun 7, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jun 7, 2026 •

edited

Loading

Review failed

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

github-actions Bot commented Jun 7, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

JarbasAl commented Jun 7, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

VITS2

StyleTTS2 engine

Kokoro — every public variant (170 voices)

Coverage impact

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

github-actions Bot commented Jun 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greetings! I've analyzed your changes and have some results to share. 🖖

📋 Repo Health

🔍 Lint

📊 Coverage

⚖️ License Check

🔨 Build Tests

🏷️ Release Preview

🔒 Security (pip-audit)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

JarbasAl commented Jun 7, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 7, 2026 •

edited

Loading

github-actions Bot commented Jun 7, 2026 •

edited

Loading