[Bug] Letter-by-letter pronunciation not possible #2619
Description
Describe the bug
I'd like to force the TTS model to pronounce a word letter by letter, e.g. "ARD" should be pronounced "A R D" (/ˌeɪˌɑːɹdˈiː/). In systems with SSML support (#752) you could use <speak><say-as interpret-as="verbatim">ard</say-as></speak>
, but another way would be fine as well.
Espeak supports this even for words not in its dictionary by adding periods between the characters: espeak-ng --ipa -v en-us "A.R.D."
is read /ˌeɪˌɑːɹdˈiː/.
This doesn't work in Coqui because the input for Espeak is split at punctuation characters and each chunk ["A", "R", "D"]
is phonemized separately:
TTS/TTS/tts/utils/text/phonemizers/base.py
Line 129 in bc0a532
This results in the word, not the letter pronunciation of "a" being chosen (ɐ instead of eɪ). I could change _phonemize_preprocess()
to pass the input to Espeak with punctuation included, but I'm not sure about the side effects. Is there a specific reason to do it this way?
To Reproduce
from TTS.api import TTS
p = TTS(model_name="tts_models/en/ljspeech/vits", gpu=False).synthesizer.tts_model.tokenizer.phonemizer
p.phonemize("A.R.D.")
Output: 'ˈɐ.ˈɑːɹ.d|ˈiː.'
Expected behavior
Expected output: ˌeɪˌɑːɹdˈiː
Logs
No response
Environment
{
"CUDA": {
"GPU": [],
"available": false,
"version": "11.7"
},
"Packages": {
"PyTorch_debug": false,
"PyTorch_version": "2.0.0+cu117",
"TTS": "0.10.2",
"numpy": "1.22.4"
},
"System": {
"OS": "Linux",
"architecture": [
"64bit",
"ELF"
],
"processor": "x86_64",
"python": "3.10.8",
"version": "#42~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Tue Apr 18 17:40:00 UTC 2"
}
}
Additional context
No response