The lexicon is a tool that tells Festival TTS how to say certain words using
an easy-to-use script that makes use of the CMU Pronouncing Dictionary.
This script is then translated to a LISP script for each voice during generation.
Each word in Festival is built from a simple set of syllables, and each syllable is comprised of phonemes, which represent each distinct sound used for speech.
The following table represents the current CMU Pronouncing Dictionary phonemes.
Phoneme | Example | Translation |
---|---|---|
AA | odd | "aa d" |
AE | at | "ae t" |
AH | hut | "hh ah t" |
AO | ought | "ao t" |
AW | cow | "k aw" |
AY | hide | "hh ay d" |
B | be | "b iy" |
CH | cheese | "ch iy z" |
D | dee | "d iy" |
DH | thee | "dh iy" |
EH | ed | "eh d" |
EH | ed | "eh d" |
ER | hurt | "hh er t" |
EY | ate | "ey t" |
F | fee | "f iy" |
G | green | "g r iy n" |
HH | he | "hh iy" |
IH | it | "ih t" |
IY | eat | "iy t" |
JH | gee | "jh iy" |
K | key | "k iy" |
L | lee | "l iy" |
M | me | "m iy" |
N | knee | "n iy" |
NG | ping | "p ih ng" |
OW | oat | "ow t" |
OY | toy | "t oy" |
P | pee | "p iy" |
R | read | "r iy d" |
S | sea | "s iy" |
SH | she | "sh iy" |
T | tea | "t iy" |
TH | theta | "th ey t ah" |
UH | hood | "hh uh d" |
UW | two | "t uw" |
V | vee | "v iy" |
W | we | "w iy" |
Y | yield | "y iy l d" |
Z | zee | "z iy" |
ZH | seizure | "S IY" 'ZH ER' |
Some other phonemes that appear to work:
pau
- A short pause.@
- Alias forae
? Not used, as it's non-standard and iffy with some voices.
The lexicon supports emphasis as a number from 0 to 2, leaving us three levels of emphasis.
However, to simplify things, ss13-vox only supports 0 to 1, and uses the enclosing quote marks as the indicator of emphasis.
Quote Mark | Emphasis Level | Notes |
---|---|---|
" | 1 | Primary emphasis. |
' | 0 | Secondary Emphasis. |
In general, we try to code one Primary Emphasis into each word, leaving the rest without emphasis.