Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mandarin AISHELL1 #85

Merged
merged 13 commits into from
Apr 14, 2023
Merged

Mandarin AISHELL1 #85

merged 13 commits into from
Apr 14, 2023

Conversation

lifeiteng
Copy link
Owner

@lifeiteng lifeiteng commented Apr 13, 2023

  • refactor TextTokenizer

phonemized = []
for _text in text:
_text = re.sub(" +", " ", _text.strip())
_text = _text.replace(" ", separator.word)
Copy link
Collaborator

@zhaomingwork zhaomingwork Apr 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"此项 工作 还能 怎么 改进"=>'ci3_xiang4_gong1_zuo4_hai2_neng2_zen3_me5_gai3_jin4'. If want to keep original SPACE, seems a problem.
or use something like '%' to replace original SPACE =>'ci3_xiang4_%gong1_zuo4%hai2_neng2%zen3_me5%_gai3_jin4'.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"此项 工作 还能 怎么 改进"=>'ci3_xiang4_gong1_zuo4_hai2_neng2_zen3_me5_gai3_jin4'. If want to keep original SPACE, seems a problem. or use something like '%' to replace original SPACE =>'ci3_xiang4_%gong1_zuo4%hai2_neng2%zen3_me5%_gai3_jin4'.

I think we should use different tokens to represent the internal and boundary of WORD:
ci3-xiang4_gong1-zuo4 ...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about commas(逗号) and periods(句号、问号、感叹号)?
These marks have explicit pause and emotion prompt for the synthesis.
AISHELL dataset may not contain them, but still we can add them for future compatibility.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any suggestion for code? or you mean it is aishell1's problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants