12 Hours - Chinese Mandarin Entertainment anchor Style Multi-emotional Synthesis Corpus. It is recorded by Chinese native speaker. six emotional text+modal particles, phonemes and tones are balanced. Professional phonetician participates in the annotation. It precisely matches with the research and development needs of the speech synthesis.
For more details, please refer to the link: https://www.nexdata.ai/datasets/tts/1304?source=Github
48,000Hz, 24bit, uncompressed wav, mono channel
professional recording studio
seven emotions (happiness, anger, sadness, surprise, fear, disgust)+sentences with filler word
professional CharacterVoice; Role: An 18-year-old girl who works as an entertainment anchor and enjoys singing and dancing
microphone
Mandarin
word and pinyin transcription, prosodic boundary annotation, phoneme boundary annotation
The amount of neutral data is not less than 1.6 hours; the amount of data with filler word is not less than 0.4 hours; and the remaining six types of emotional data is not less than 1.67 hours each
Commercial License