Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is the rule 15 correctly implemented? #6

Closed
Joovvhan opened this issue Aug 30, 2020 · 0 comments
Closed

Is the rule 15 correctly implemented? #6

Joovvhan opened this issue Aug 30, 2020 · 0 comments

Comments

@Joovvhan
Copy link

Dear contributors,
Thank you for your great works!

I have been trying to improve TTS quality while keeping the amount of the data unchanged.

I thought using the g2pk package would improve the model by reducing the number of the token being fed into the model by a significant amount, rule 8 for example, reducing 21(Jongsung) tokens to 7(Pronounceable Jongsung) tokens.

I combined GlowTTS, g2pk, and Multi-band MelGan and trained with the KSS dataset and acquired the following result.

G2PK Comparison Demo

It seems that g2pk grapheme tokens are much better than just using Jamo tokens!

Yet, I found that the g2pk conversion result is slightly different from how common Korean usually pronounce.

Since I am no expert in the Korean language, I referred to 한국어 어문 규범 and 부산대학교 표준발음 변환기.

For a sample sentence from the KSS,

Source Result
Original Sentence 저는 귀가 어두운데 다른 사람의 얘기를 아주 잘 들어 준다는 말을 많이 들어왔어요.
G2PK 저는 귀가 어두운데 다른 사라믜 얘기르 라주 잘 드러 준다는 마를 마니 드러와써요.
부산대학교 저는 귀가 어두운데 다른 사라메 얘기를 아주 잘 드러 준다는 마를 마니 드러와써요

I suggest that "르 라" is the problem since common Korean does not speak like that.

I found the following function in the source, regular.py.


def link3(inp, descriptive=False, verbose=False):
    rule = rule_id2text["15"]
    out = inp

    pairs = [ ("ᆨ ᄋ", " ᄀ"),
                  ...
              ("ᆹ ᄋ", "ᆸ ᄊ") ]

    for str1, str2 in pairs:
        out = out.replace(str1, str2)

    gloss(verbose, out, inp, rule)
    return out

From 한국어 어문 규범,
제15항 받침 뒤에 모음 ‘ㅏ, ㅓ, ㅗ, ㅜ, ㅟ’ 들로 시작되는 실질 형태소가 연결되는 경우에는, 대표음으로 바꾸어서 뒤 음절 첫소리로 옮겨 발음한다.

And it seems that you do not consider 실질 형태소 or 모음 ‘ㅏ, ㅓ, ㅗ, ㅜ, ㅟ’.

Is consideration being taken in other parts of the source?

If not, I would like to implement it by myself. Please let me know if you have already improved this part.


The real question is that the g2pk conversion result above is the correct answer according to 한국어 어문 규범!

"아주" starts with "ㅏ" and it is a 실질 형태소 and 대표음 of "ㄹ" from "를" is "ㄹ".

So, "를 아주" should be pronounced "르 라주" according to 한국어 어문 규범.

I have been thinking of this issue for several weeks, and I have concluded that Korean tends to attach a comma to space " " between letters when they think it is needed. "애기를 아주" becomes "애기를, 아주" to highlight the pronunciation of "아" as "아", to distinguish it from "라". Yet, I have not found any good algorithm to selectively apply rule 15 in accordance with my common sense.

As a quick fix, I just nullified the link3 and named it G2PK no 15 on the demo page.

I have already achieved a satisfactory experimental result, and it seems OK to extend my research on Phoneme and Grapheme alignment.

But, as I mentioned earlier, I am not a professional in Korean or any other Linguistics.

So, I would appreciate an opinion from the real linguist to properly improve my TTS results and G2P conversion.

So if you have any opinion regarding my questions, please share it.

Thanks.

5Hyeons pushed a commit to 5Hyeons/g2pK that referenced this issue Aug 5, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant