-
-
Notifications
You must be signed in to change notification settings - Fork 319
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mandarin AISHELL1 #85
Conversation
lifeiteng
commented
Apr 13, 2023
•
edited
Loading
edited
- refactor TextTokenizer
phonemized = [] | ||
for _text in text: | ||
_text = re.sub(" +", " ", _text.strip()) | ||
_text = _text.replace(" ", separator.word) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"此项 工作 还能 怎么 改进"=>'ci3_xiang4_gong1_zuo4_hai2_neng2_zen3_me5_gai3_jin4'. If want to keep original SPACE, seems a problem.
or use something like '%' to replace original SPACE =>'ci3_xiang4_%gong1_zuo4%hai2_neng2%zen3_me5%_gai3_jin4'.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"此项 工作 还能 怎么 改进"=>'ci3_xiang4_gong1_zuo4_hai2_neng2_zen3_me5_gai3_jin4'. If want to keep original SPACE, seems a problem. or use something like '%' to replace original SPACE =>'ci3_xiang4_%gong1_zuo4%hai2_neng2%zen3_me5%_gai3_jin4'.
I think we should use different tokens to represent the internal and boundary of WORD
:
ci3-xiang4_gong1-zuo4 ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about commas(逗号) and periods(句号、问号、感叹号)?
These marks have explicit pause and emotion prompt for the synthesis.
AISHELL dataset may not contain them, but still we can add them for future compatibility.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any suggestion for code? or you mean it is aishell1's problem.