Continuing the topic #15.
When I don't install Japanese voices from https://github.com/numediart/MBROLA-voices, it crashes with failed to load voice "ja".
But the lyrics I provided is https://github.com/ASLP-lab/DiffRhythm/blob/main/infer/example/eg_en.lrc which are entirely in English.
So, probably because it uses Japanese voicepack, the generated lyrics practically does not correspond to the provided lyrics - only fragments of words are recognizable.
Here is an example of generated audio (packed to zip, wav converted to 192k mp3 using ffmpeg)
output.zip