Looks like it uses a Japanese voicepack to generate English lyrics

Continuing the topic https://github.com/ASLP-lab/DiffRhythm/issues/15.

When I don't install Japanese voices from https://github.com/numediart/MBROLA-voices, it crashes with `failed to load voice "ja"`.
But the lyrics I provided is https://github.com/ASLP-lab/DiffRhythm/blob/main/infer/example/eg_en.lrc which are entirely in English.

So, probably because it uses Japanese voicepack, the generated lyrics practically does not correspond to the provided lyrics - only fragments of words are recognizable.

Here is an example of generated audio (packed to zip, wav converted to 192k mp3 using ffmpeg)
[output.zip](https://github.com/user-attachments/files/19103920/output.zip)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Looks like it uses a Japanese voicepack to generate English lyrics #17

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Looks like it uses a Japanese voicepack to generate English lyrics #17

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions