-
Notifications
You must be signed in to change notification settings - Fork 306
Adding support for a new synthesis engine
OpenUtau is an open source singing synthesis platform. Anyone who develops their singing synthesis engine can use OpenUtau as its UI. This article introduces how to add support for a new synthesis engine, taking DiffSinger as an example. DiffSinger is a typical machine learning based singing synthesis engine that uses the classic packaging format
Singer packaging format is how users of your synthesis engine will build and distribute their voicebanks. We suggest using the classic packaging format if possible, which is the same packaging format that UTAU, ENUNU and DiffSinger singers use. Using the classic packaging format will make your voicebank compatible with the latest universal features of OpenUtau in future updates.
Here is the folder structure of classic packaging format
your_singer
├─ character.txt
├─ character.yaml
├─ ...the other files and folders containing the voice data.
character.txt and character.yaml contain the basic information of your voicebank, including the name of your voicebank, type of your voicebank and the phonemizer it uses
character.txt example:
name=Zhibin Diffsinger
image=zhibin.png
author=Chisong
voice=Chisong
web=http://zhibin.club/
character.yaml example:
text_file_encoding: utf-8
portrait_opacity: 0.67
default_phonemizer: OpenUtau.Core.DiffSinger.DiffSingerPhonemizer
singer_type: diffsinger
Tips for designing your packaging format:
- Use only lower-cased ascii characters for filenames if possible, because ascii characters don't get garbled on a PC in different locales.
- Use a special prefix for file names and folder names. For example, in DiffSinger voicebanks,
dsdur
,dspitch
,dsvariance
,dsconfig
anddsvocoder
start with "ds". This perfix clearly states that these files are used by diffsinger renderer.
To define a singer type, you'll need to create a singer class that inherits USinger. Here are some important features you need to implement in this class:
- object initializer: Load avatar, subbanks (voice colors), phonemes list in object initializer
-
FreeMemory
: If your voicebank stores large resources in memory, use this function to free them when the singer is no longer used. Note that the voicebank may be used again even after this method is called, and this method may be called even when the singer has not been used.
An example is DiffSingerSinger
You also have to regist your singer type to everywhere else in OpenUtau's codebase that works differently according to singer type, including:
- OpenUtau.Core/Ustx/USinger.cs
- OpenUtau.Core/Classic/ClassicSingerLoader.cs
- OpenUtau.Core/Render/Renderers.cs (associate your singer type to your renderer)
- OpenUtau/ViewModels/SingerSetupViewModel.cs
Starting from here, we'll add our synthesis code into OpenUtau. Create a new folder under OpenUtau.Core and put all the code for your synthesis engine there.
A renderer should implement IRenderer. Here are the main features of a renderer:
-
Render
: Synthesize audio based on RenderPhrase input -
LoadRenderedPitch
: Auto generate pitch curve based on RenderPhrase input (optional, useSupportsRenderPitch
to declare) -
SupportsExpression
: Declare the expressions that your engine supports. -
GetSuggestedExpressions
: If your engine supports custom expressions, use this API to provide them to the user.
An example is DiffSingerRenderer
OpenUtau's phonemizer is independent from renderer. If existing phonemizers already meet your needs, you won't need to write your own phonemizers.
If you're going to write a phonemizer, see Developing new phonemizers
If your phonemizer is based on machine learning models, you can inherite MachineLearningPhonemizer where you need to implement the ProcessPart
function that takes phrases and write the timing results into partResult
. DiffSingerBasePhonemizer is an example.