You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Dear contributors,
Thank you for your great works!
I have been trying to improve TTS quality while keeping the amount of the data unchanged.
I thought using the g2pk package would improve the model by reducing the number of the token being fed into the model by a significant amount, rule 8 for example, reducing 21(Jongsung) tokens to 7(Pronounceable Jongsung) tokens.
I combined GlowTTS, g2pk, and Multi-band MelGan and trained with the KSS dataset and acquired the following result.
I suggest that "르 라" is the problem since common Korean does not speak like that.
I found the following function in the source, regular.py.
def link3(inp, descriptive=False, verbose=False):
rule = rule_id2text["15"]
out = inp
pairs = [ ("ᆨ ᄋ", " ᄀ"),
...
("ᆹ ᄋ", "ᆸ ᄊ") ]
for str1, str2 in pairs:
out = out.replace(str1, str2)
gloss(verbose, out, inp, rule)
return out
From 한국어 어문 규범, 제15항 받침 뒤에 모음 ‘ㅏ, ㅓ, ㅗ, ㅜ, ㅟ’ 들로 시작되는 실질 형태소가 연결되는 경우에는, 대표음으로 바꾸어서 뒤 음절 첫소리로 옮겨 발음한다.
And it seems that you do not consider 실질 형태소 or 모음 ‘ㅏ, ㅓ, ㅗ, ㅜ, ㅟ’.
Is consideration being taken in other parts of the source?
If not, I would like to implement it by myself. Please let me know if you have already improved this part.
The real question is that the g2pk conversion result above is the correct answer according to 한국어 어문 규범!
"아주" starts with "ㅏ" and it is a 실질 형태소 and 대표음 of "ㄹ" from "를" is "ㄹ".
So, "를 아주" should be pronounced "르 라주" according to 한국어 어문 규범.
I have been thinking of this issue for several weeks, and I have concluded that Korean tends to attach a comma to space " " between letters when they think it is needed. "애기를 아주" becomes "애기를, 아주" to highlight the pronunciation of "아" as "아", to distinguish it from "라". Yet, I have not found any good algorithm to selectively apply rule 15 in accordance with my common sense.
As a quick fix, I just nullified the link3 and named it G2PK no 15 on the demo page.
I have already achieved a satisfactory experimental result, and it seems OK to extend my research on Phoneme and Grapheme alignment.
But, as I mentioned earlier, I am not a professional in Korean or any other Linguistics.
So, I would appreciate an opinion from the real linguist to properly improve my TTS results and G2P conversion.
So if you have any opinion regarding my questions, please share it.
Thanks.
The text was updated successfully, but these errors were encountered:
5Hyeons
pushed a commit
to 5Hyeons/g2pK
that referenced
this issue
Aug 5, 2022
Dear contributors,
Thank you for your great works!
I have been trying to improve TTS quality while keeping the amount of the data unchanged.
I thought using the g2pk package would improve the model by reducing the number of the token being fed into the model by a significant amount, rule 8 for example, reducing 21(Jongsung) tokens to 7(Pronounceable Jongsung) tokens.
I combined GlowTTS, g2pk, and Multi-band MelGan and trained with the KSS dataset and acquired the following result.
G2PK Comparison Demo
It seems that g2pk grapheme tokens are much better than just using Jamo tokens!
Yet, I found that the g2pk conversion result is slightly different from how common Korean usually pronounce.
Since I am no expert in the Korean language, I referred to 한국어 어문 규범 and 부산대학교 표준발음 변환기.
For a sample sentence from the KSS,
I suggest that "르 라" is the problem since common Korean does not speak like that.
I found the following function in the source, regular.py.
From 한국어 어문 규범,
제15항 받침 뒤에 모음 ‘ㅏ, ㅓ, ㅗ, ㅜ, ㅟ’ 들로 시작되는 실질 형태소가 연결되는 경우에는, 대표음으로 바꾸어서 뒤 음절 첫소리로 옮겨 발음한다.
And it seems that you do not consider 실질 형태소 or 모음 ‘ㅏ, ㅓ, ㅗ, ㅜ, ㅟ’.
Is consideration being taken in other parts of the source?
If not, I would like to implement it by myself. Please let me know if you have already improved this part.
The real question is that the g2pk conversion result above is the correct answer according to 한국어 어문 규범!
"아주" starts with "ㅏ" and it is a 실질 형태소 and 대표음 of "ㄹ" from "를" is "ㄹ".
So, "를 아주" should be pronounced "르 라주" according to 한국어 어문 규범.
I have been thinking of this issue for several weeks, and I have concluded that Korean tends to attach a comma to space " " between letters when they think it is needed. "애기를 아주" becomes "애기를, 아주" to highlight the pronunciation of "아" as "아", to distinguish it from "라". Yet, I have not found any good algorithm to selectively apply rule 15 in accordance with my common sense.
As a quick fix, I just nullified the link3 and named it G2PK no 15 on the demo page.
I have already achieved a satisfactory experimental result, and it seems OK to extend my research on Phoneme and Grapheme alignment.
But, as I mentioned earlier, I am not a professional in Korean or any other Linguistics.
So, I would appreciate an opinion from the real linguist to properly improve my TTS results and G2P conversion.
So if you have any opinion regarding my questions, please share it.
Thanks.
The text was updated successfully, but these errors were encountered: