fix: auto-download dicta ONNX model for Hebrew diacritization by voidborne-d · Pull Request #512 · resemble-ai/chatterbox

voidborne-d · 2026-04-22T01:00:07Z

Summary

Fixes #467 — Hebrew diacritization silently fails because add_hebrew_diacritics() calls Dicta() with no arguments, but dicta_onnx.Dicta.__init__ requires a model_path argument.

Problem

dicta_onnx.Dicta requires a path to a separately-downloaded ONNX model file (~300 MB). The current code:

_dicta = Dicta()  # TypeError: missing required argument 'model_path'

The TypeError is caught by the except Exception handler and logged as a warning. Hebrew TTS receives bare consonants with no niqqud, producing gibberish speech output.

Fix

Auto-download model: New _get_dicta_model_path() function auto-downloads the int8 ONNX model from the official dicta-onnx GitHub release on first use, caching it under $XDG_CACHE_HOME/chatterbox/dicta/ (or ~/.cache/chatterbox/dicta/)
Env-var override: DICTA_MODEL_PATH lets users point to a local .onnx file (useful for airgapped/Docker environments)
Atomic write: Uses tmpfile + os.replace so a partial download never poisons the cache
Better warnings: Tell users exactly how to install dicta-onnx or set the env-var, instead of a generic 'failed' message

Tests

17 regression tests in tests/test_hebrew_diacritization.py covering:

Env-var override (valid file, missing file, tilde expansion)
Cache creation and download on miss
Cache hit without re-download
Partial download cleanup on failure
model_path correctly passed to Dicta()
Clear warnings when dicta_onnx or model file is missing
Source-code audit: no bare Dicta() calls remain
Full round-trip integration with fake Dicta
Singleton reuse across multiple calls
Issue Multilingual Hebrew diacritization silently fails — dicta_onnx called without required model_path #467 exact reproduction

All tests are lightweight (no GPU, no real model download, no torch/torchaudio dependency).

python3 -m pytest tests/test_hebrew_diacritization.py -v
17 passed in 0.07s

…le-ai#467) `add_hebrew_diacritics()` called `Dicta()` with no arguments, but `dicta_onnx.Dicta.__init__` requires a `model_path` argument pointing to an ONNX model file. The resulting `TypeError` was swallowed by the `except Exception` handler, so Hebrew TTS silently received un-voweled text and produced gibberish speech output. Changes: - Add `_get_dicta_model_path()` that auto-downloads the int8 ONNX model (~300 MB) from the official dicta-onnx GitHub release on first use, caching it under `$XDG_CACHE_HOME/chatterbox/dicta/` (or `~/.cache/chatterbox/dicta/`) - Support `DICTA_MODEL_PATH` env-var to override auto-download with a local .onnx file (for airgapped/Docker environments) - Atomic write (tmpfile + os.replace) prevents partial downloads from poisoning the cache - Improved warning messages: tell users exactly how to install dicta-onnx or set the env-var, instead of a generic 'failed' message - 17 regression tests covering env-var override, cache creation, download, cache hit, partial cleanup, model_path passing to Dicta(), missing dependency warnings, source-code audit (no bare Dicta() calls), and full round-trip integration

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: auto-download dicta ONNX model for Hebrew diacritization#512

fix: auto-download dicta ONNX model for Hebrew diacritization#512
voidborne-d wants to merge 1 commit intoresemble-ai:masterfrom
voidborne-d:fix/hebrew-diacritization-model-path

voidborne-d commented Apr 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

voidborne-d commented Apr 22, 2026

Summary

Problem

Fix

Tests

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant