documentation: supported voices and languages by JarbasAl · Pull Request #80 · TigreGotico/phoonnx

JarbasAl · 2025-11-23T22:54:08Z

Summary by CodeRabbit

Documentation
- README updated to note support for 1000+ languages and voices.
- New auto-generated VOICES.md listing available voices, languages, engines, and phoneme types.
New Features
- Automatic generation/update of the comprehensive voices documentation at startup.
Bug Fixes / Improvements
- Normalized language tags for more consistent language handling (some voice language labels updated).
- One voice entry was removed from the public listing.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2025-11-23T22:54:19Z

Warning

Rate limit exceeded

@JarbasAl has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 14 minutes and 3 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between 7d6d1d9 and 3f67255.

📒 Files selected for processing (1)

phoonnx/model_manager.py (4 hunks)

Walkthrough

Adds voice-language normalization, a new utility to generate VOICES.md from loaded voices, README content linking that document, and small script adjustments to canonicalize language tags and remove one voice entry; also removes an unnecessary clear() call in the model manager startup path.

Changes

Cohort / File(s)	Summary
Documentation `README.md`	Added a sentence referencing support for 1000+ languages and voices and linked `VOICES.md`.
Voice markdown generation `phoonnx/model_manager.py`	Added public `generate_voices_markdown(manager, output_file="../VOICES.md")` to produce a sorted Markdown table of all loaded voices (language, voice_id, Engine, PhonemeType, counts); invoked at module end after merging default voices; removed an explicit `clear()` call in `__main__`.
Language tag normalization `phoonnx/config.py`, `phoonnx/model_manager.py`	Use `langcodes.standardize_tag()` in `VoiceConfig.__post_init__` and TTSModelManager initialization to normalize `lang_code` with guarded fallback on failure.
Voice registration scripts `scripts/index_voices.py`	Normalized languages for MMS and updated two Hebrew entries to `he-IL`; commented out `jerichosiahaya/vits-tts-id` entry so it is no longer registered.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant App as __main__
    participant Mgr as TTSModelManager
    participant Gen as generate_voices_markdown()
    participant FS as FileSystem

    App->>Mgr: merge_default_voices(store=True)
    Mgr-->>App: voices loaded (lang codes normalized)
    App->>Gen: generate_voices_markdown(manager)
    Gen->>Mgr: request all voices
    Mgr-->>Gen: voice records (id, lang_code, engine, phoneme)
    note right of Gen `#E6F7FF`: sort by lang_code, then voice_id
    Gen->>Gen: format Markdown table, compute totals
    Gen->>FS: write "../VOICES.md"
    FS-->>Gen: write success / error
    Gen-->>App: complete (log result)

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Review voice markdown formatting and sorting (language tag ordering vs. human-readable).
Verify safe fallback behavior when standardize_tag() raises or returns unexpected values.
Check the effect of removing clear() on in-memory state and default-voice merge semantics.
Confirm the commented-out voice removal in scripts/index_voices.py is intentional.

Possibly related PRs

feat: more piper english community voices #76 — overlaps changes in voice registry and will affect the generated VOICES.md content.
feat: community piper voices + pygoruut support #73 — related voice/lang handling changes in model_manager and config.
feat: opm and cli interface #51 — similar extensions to TTSModelManager APIs and voice introspection.

Poem

🐇 I hopped through tags both grand and small,

I tuned each voice and trimmed the sprawl.
A markdown meadow, rows aligned,
A thousand songs for all to find.
— yours, the rabbit with a pen and a grin 🥕

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 55.56% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'documentation: supported voices and languages' is directly related to the main changes. The PR adds a new VOICES.md documentation file listing supported voices and languages, normalizes language codes for consistency, and integrates documentation generation into the startup flow.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

phoonnx/model_manager.py (1)

389-396: Consider using Path for more robust file handling.

The relative path "../VOICES.md" works but could be made more explicit and robust.

Consider this approach:

from pathlib import Path

def generate_voices_markdown(manager: TTSModelManager, output_file: str = None):
    """
    Generates a Markdown table of all supported voices and saves it to a file.

    Args:
        manager (TTSModelManager): The manager with loaded voices.
        output_file (str): The name of the file to save the markdown table to.
                          Defaults to VOICES.md in the repository root.
    """
    if output_file is None:
        output_file = Path(__file__).parent.parent / "VOICES.md"
    else:
        output_file = Path(output_file)

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7acb24d and d5fbd81.

📒 Files selected for processing (2)

README.md (1 hunks)
phoonnx/model_manager.py (2 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

phoonnx/model_manager.py (1)

scripts/index_voices.py (2)

all_voices (18-19)

supported_langs (22-23)

🪛 LanguageTool

README.md

[grammar] ~16-~16: Ensure spelling is correct
Context: ...ges and voices from various frameworks (phoonnx, piper, mimic3, coqui, MMS, transformer...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)

🔇 Additional comments (2)

README.md (1)

16-17: LGTM!

The documentation accurately reflects the library's extensive voice support and provides a helpful reference to VOICES.md. The static analysis hint flagging "phoonnx" is a false positive—it's the project name.

phoonnx/model_manager.py (1)

281-281: No issue found with the removal of manager.clear().

The change is correct for this context. Since TTSModelManager() creates a fresh instance with an empty cache, calling clear() before merge_default_voices(store=True) was redundant. The merged default voices load correctly without it.

coderabbitai

Actionable comments posted: 2

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d5fbd81 and 7d6d1d9.

📒 Files selected for processing (3)

phoonnx/config.py (2 hunks)
phoonnx/model_manager.py (4 hunks)
scripts/index_voices.py (4 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

phoonnx/model_manager.py (1)

scripts/index_voices.py (2)

all_voices (18-19)

supported_langs (22-23)

🪛 Ruff (0.14.5)

phoonnx/model_manager.py

66-66: Do not use bare except

(E722)

66-67: try-except-pass detected, consider logging the exception

(S110)

phoonnx/config.py

157-157: Do not use bare except

(E722)

🔇 Additional comments (5)

scripts/index_voices.py (3)

233-233: LGTM! Hebrew language code standardization.

The change from "he" to "he-IL" provides more specific locale information and aligns with the PR's language normalization goal.

Also applies to: 243-243

258-258: LGTM! Language tag normalization for MMS voices.

Applying standardize_tag to MMS language codes ensures consistency across the voice catalog and aligns with normalization in other parts of the PR.

1035-1044: LGTM! Voice entry appropriately disabled.

The TODO comment clearly indicates the issue (no ONNX model available). Commenting out rather than removing preserves the entry for future conversion.

phoonnx/model_manager.py (2)

394-447: LGTM! Function successfully moved to module level.

The generate_voices_markdown function has been properly refactored to module level, addressing the previous review feedback. The implementation is well-structured with:

Clear sorting logic (by language, then voice ID)

Helpful warnings about duplicates

Robust error handling for file I/O operations

Fallback console output on write failure

Based on learnings, this addresses the past review comment requesting module-level extraction.

286-286: The removal of manager.clear() is safe for this startup flow.

The concern about stale cache persisting is not applicable here. The constructor does not call load() or trigger a cache.reload(), so self.voices starts empty. The merge_default_voices() method then populates the cache from the bundled JSON files via cache.update(). Removing the redundant clear() call before merge_default_voices() is correct—it was clearing an already-empty cache.

If the cache needs explicit reloading from disk on subsequent runs, that would require a separate manager.load() call, which is not present in the startup flow.

@JarbasAl

* refactor!: tokenizer class + deprecate phoneme_ids.py (#70) * fix: coqui compatibility refactor!: tokenizer class + deprecate phoneme_ids.py fix: missing cotovia data files feat: add new galician models from proxecto nós * log * fix * fix * Merge pull request #71 from TigreGotico/coderabbitai/docstrings/cb634ab 📝 Add docstrings to `tokenizer` * adjust --------- Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> * Increment Version to 1.0.0a1 * Update Changelog * feat: community piper voices + pygoruut support (#73) * feat: community piper voices + pygoruut support update model manager voice index Total voices: 284 Total langs: 67 * fix neurlang voice-id * reorder funcs for readability * 📝 Add docstrings to `models_galore` (#74) Docstrings generation was requested by @JarbasAl. * #73 (comment) The following files were modified: * `phoonnx/model_manager.py` * `phoonnx/util.py` Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> --------- Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> * Increment Version to 1.1.0a1 * Update Changelog * feat: more piper english community voices (#76) Total voices: 314 * Increment Version to 1.2.0a1 * Update Changelog * feat: transformers support (#78) feat: MMS voices refactor: move index to static .json files * Increment Version to 1.3.0a1 * Update Changelog * documentation: supported voices and languages (#80) * documentation: supported voices and languages * documentation: supported voices and languages * documentation: supported voices and languages * Increment Version to 1.3.0a2 * Update Changelog * documentation: supported voices and languages (#82) * Increment Version to 1.3.0a3 * Update Changelog * fix: failing MMS models indexing (#84) * Increment Version to 1.3.0a4 * Update Changelog * fix: improve lang code standardization (#86) * fix: improve lang code standardization * siimplify error handling * Increment Version to 1.3.1a1 * Update Changelog * Add renovate.json (#89) Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com> * Increment Version to 1.3.1a2 * Update Changelog * fix: mantoq2ipa + improve lang code normalization (#90) * fix: improve lang code standardization * siimplify error handling * fix: better arabic ipa g2p * fix tests * rrm unused arg * Increment Version to 1.3.2a1 * chore(deps): update actions/checkout action to v6 (#92) Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com> * Increment Version to 1.3.2a2 * chore(deps): update actions/setup-python action to v6 (#96) Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com> * Increment Version to 1.3.2a3 * Update Changelog * add more voices (#99) * Increment Version to 1.3.2a4 * Update Changelog * 📝 Add docstrings to `patch-2` (#102) Docstrings generation was requested by @JarbasAl. * #101 (comment) The following files were modified: * `phoonnx/model_manager.py` Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> * Increment Version to 1.3.2a5 * Update Changelog * Add files via upload * fix: dont chunk on commas, update voice index (#104) * fix: dont chunk on commas, update voice index * fix: dont chunk on commas, update voice index * 📝 Add docstrings to `fixes` (#105) Docstrings generation was requested by @JarbasAl. * #104 (comment) The following files were modified: * `phoonnx/model_manager.py` * `phoonnx/opm.py` * `phoonnx/phonemizers/base.py` * `phoonnx_train/vits/dataset.py` * `phoonnx_train/vits/lightning.py` * `scripts/index_voices.py` Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> * fix: dont chunk on commas, update voice index --------- Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> * Increment Version to 1.3.3a1 * Update Changelog * fix: lazy load VoiceConfig (#107) delay network requests until needed 📝 Add docstrings to `fixes` (#108) Docstrings generation was requested by @JarbasAl. * #107 (comment) The following files were modified: * `phoonnx/model_manager.py` Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> * Increment Version to 1.3.3a2 * Update Changelog --------- Co-authored-by: JarbasAI <33701864+JarbasAl@users.noreply.github.com> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Co-authored-by: JarbasAl <JarbasAl@users.noreply.github.com> Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>

documentation: supported voices and languages

d5fbd81

coderabbitai Bot reviewed Nov 23, 2025

View reviewed changes

Comment thread phoonnx/model_manager.py

JarbasAl added 2 commits November 23, 2025 23:26

documentation: supported voices and languages

7d6d1d9

documentation: supported voices and languages

3f67255

coderabbitai Bot reviewed Nov 23, 2025

View reviewed changes

Comment thread phoonnx/config.py

Comment thread phoonnx/model_manager.py

JarbasAl merged commit 1366c63 into dev Nov 23, 2025
3 checks passed

coderabbitai Bot mentioned this pull request Nov 24, 2025

fix: improve lang code standardization #86

Merged

coderabbitai Bot mentioned this pull request Dec 27, 2025

fix: mantoq2ipa + improve lang code normalization #90

Merged

coderabbitai Bot mentioned this pull request Jun 5, 2026

feat(engines): pluggable multi-engine inference framework #131

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

documentation: supported voices and languages#80

documentation: supported voices and languages#80
JarbasAl merged 3 commits into
devfrom
voice_docs

JarbasAl commented Nov 23, 2025 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Nov 23, 2025 •

edited

Loading

Rate limit exceeded

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

JarbasAl commented Nov 23, 2025 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Nov 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

Pre-merge checks and finishing touches

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

JarbasAl commented Nov 23, 2025 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Nov 23, 2025 •

edited

Loading