Skip to content

'unicode-ipa-syls' does not preserve syllable breaks when running '--phones' #8

@necarlson97

Description

@necarlson97

I understand that unicode-ipa-syls was added for --phones2phones conversion from CMU - but would also be greatly helpful if this could be used to generate syllable separated phonemes from text, or for use with other formats.

Apologies if there is a technical limitation here - I am personally unsure of how to extract syllables from espeak phonemes, and perhaps that explains why the current behavior does not provide syllable breaks. (Assumedly CMU has some method?)

My ultimate goal is to extract syllable phonetics from whole-sentence text. (Of course, there are other tools to separate text by syllable - but generating phonemes from already-split text degrades the quality of the output greatly. Espeak seems to do a great job of identifying heteronyms and other context-specific phonetic rules when fed sentences rather than individual words/syllables. And so I am looking for a format that will preserve both the specific phonemes and the syllable breaks.)
So you can see why I was elated to find unicode-ipa-syls, and deflated to see it will not work.

(I am no linguist, forgive me if I misunderstand anything. There may be a trivial answer to this)

In any case, whether it is possible to get 'unicode-ipa-syls' to work more broadly or not, it might at least be worth a note in the docs.
And thank you again for sharing your work - it has already been greatly useful to me (and many others).

Just an example of the current behavior:

lexconvert --phones unicode-ipa "heretic"
hˈɛɹətˌɪk

lexconvert --phones unicode-ipa-syls "heretic"
hˈɛɹətˌɪk

lexconvert --phones2phones unicode-ipa unicode-ipa-syls "hˈɛɹətˌɪk"
hˈɛɹətˌɪk

lexconvert --phones2phones espeak unicode-ipa-syls "h'Er@t,Ik"
hˈɛɹətˌɪk

espeak -x -q "I'm excited to live in a time with live music"
 aIm Eks'aItI2d t@ l'Iv I2n a# t'aIm wID l'aIv mj'u:zIk

lexconvert --phones2phones espeak unicode-ipa-syls "aIm Eks'aItI2d t@ l'Iv I2n a# t'aIm w'Iht l'aIv mj'u:zIk"
aɪm ɛksˈaɪtɪd tə lˈɪv ɪn ə tˈaɪm wɪð lˈaɪv mjˈʉːzɪk

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions