Skip to content

Commit

Permalink
Merge pull request CNChTu#31 from OOPPEENN/OOPPEENN-patch-1
Browse files Browse the repository at this point in the history
Update ReadME
  • Loading branch information
CNChTu authored Jul 23, 2023
2 parents 77cef35 + 3ef174e commit a49f446
Show file tree
Hide file tree
Showing 2 changed files with 9 additions and 5 deletions.
6 changes: 4 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -118,7 +118,8 @@ python train.py -c configs/config.yaml
只训练k_step_max深度的浅扩散模型与naive模型的组合比单纯完全扩散的质量可能还要更高,同时训练速度更快。但是naive模型可能存在音域问题。
****

### 2.1 训练完整过程的扩散预训练模型
### 2.1 训练完整过程的扩散预训练模型
(注意:whisper-ppg对应whisper的medium权重,whisper-ppg-large对应whisper的large-v2权重)
| Units Encoder | 网络大小 | 数据集 | 下载 |
|-------------------------------------------------------------------------------------------------------------------------------------------------------------|--------|--------------------------------------------|-----------------------------------------------------------------------------------------------------|
| [contentvec768l12(推荐)](https://ibm.ent.box.com/s/z1wgl1stco8ffooyatzdwsqn2psd9lrr) | 512*20 | VCTK<br/>m4singer | [HuggingFace](https://huggingface.co/ChiTu/Diffusion-SVC/resolve/main/v0.1/contentvec768l12.7z) |
Expand All @@ -127,7 +128,8 @@ python train.py -c configs/config.yaml

补充一个用contentvec768l12编码的整活底模,数据集为`m4singer`/`opencpop`/`vctk`,不推荐使用,不保证没问题:[下载](https://huggingface.co/ChiTu/Diffusion-SVC/resolve/main/v0.1/contentvec768l12%2Bmakefunny.7z)

### 2.2 只训练k_step_max深度的扩散预训练模型
### 2.2 只训练k_step_max深度的扩散预训练模型
(注意:whisper-ppg对应whisper的medium权重,whisper-ppg-large对应whisper的large-v2权重)
| 所用编码器 | 网络大小 | k_step_max | 数据集 | 浅扩散模型下载 |
|--------------------------------------------------------------------------------|--------|------------|-------------------|-----------------------------------------------------------------------------------------------------------------------------------------------|
| [contentvec768l12](https://ibm.ent.box.com/s/z1wgl1stco8ffooyatzdwsqn2psd9lrr) | 512*30 | 100 | VCTK<br/>m4singer | [HuggingFace](https://huggingface.co/datasets/ms903/Diff-SVC-refactor-pre-trained-model/resolve/main/Diffusion-SVC/shallow_512_30/model_0.pt) |
Expand Down
8 changes: 5 additions & 3 deletions README_en.md
Original file line number Diff line number Diff line change
Expand Up @@ -120,7 +120,8 @@ python train.py -c configs/config.yaml
The combination of shallow Diffusion model that only train k_step_max depth and Naive model may have higher quality and faster training speed than pure full diffusion model. But the Naive model may have f0 range issues.
****

### 2.1 Pre training Diffusion model which training full depth
### 2.1 Pre training Diffusion model which training full depth
(Note: whisper-ppg corresponds to the medium-weight version of Whisper, while whisper-ppg-large corresponds to the large-v2-weight version of Whisper.)
| Units Encoder | Network size | Datasets | Model |
|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------|--------------------------------------------|-----------------------------------------------------------------------------------------------------|
| [contentvec768l12(Recommend)](https://ibm.ent.box.com/s/z1wgl1stco8ffooyatzdwsqn2psd9lrr) | 512*20 | VCTK<br/>m4singer | [HuggingFace](https://huggingface.co/ChiTu/Diffusion-SVC/resolve/main/v0.1/contentvec768l12.7z) |
Expand All @@ -129,12 +130,13 @@ The combination of shallow Diffusion model that only train k_step_max depth and

Here is an additional special pre-trained model using the contentvec768l12 encoder, the dataset is `m4singer`/`opencpop`/`vctk`. It is not recommended to use this and there's no guarantee it won't cause problems: [Download](https://huggingface.co/ChiTu/Diffusion-SVC/resolve/main/v0.1/contentvec768l12%2Bmakefunny.7z).

### 2.2 Diffusion pre training model that only trains k_step_max depth
### 2.2 Diffusion pre training model that only trains k_step_max depth
(Note: whisper-ppg corresponds to the medium-weight version of Whisper, while whisper-ppg-large corresponds to the large-v2-weight version of Whisper.)
| Units Encoder | Network size | k_step_max | Datasets | Diffusion Model |
|--------------------------------------------------------------------------------|--------------|------------|-------------------|-----------------------------------------------------------------------------------------------------------------------------------------------|
| [contentvec768l12](https://ibm.ent.box.com/s/z1wgl1stco8ffooyatzdwsqn2psd9lrr) | 512*30 | 100 | VCTK<br/>m4singer | [HuggingFace](https://huggingface.co/datasets/ms903/Diff-SVC-refactor-pre-trained-model/resolve/main/Diffusion-SVC/shallow_512_30/model_0.pt) |
| [contentvec768l12](https://ibm.ent.box.com/s/z1wgl1stco8ffooyatzdwsqn2psd9lrr) | 512*20 | 200 | VCTK<br/>m4singer | [HuggingFace](https://huggingface.co/datasets/ms903/Diff-SVC-refactor-pre-trained-model/resolve/main/Diffusion-SVC/shallow_512_20/model_0.pt) |

| [whisper-ppg(only can use with sovits)](https://openaipublic.azureedge.net/main/whisper/models/345ae4da62f9b3d59415adc60127b97c714f32e89e936602e85993674d08dcb1/medium.pt) | 768*30 | 200 | PTDB<br/>m4singer<br/>kiritan<br/>opencpop<br/>pjs_corpus<br/>popcs | [HuggingFace](https://huggingface.co/OOPPEENN/Diffusion-SVC-pretrained-models/resolve/main/whisper_medium_vol_76830_k200.zip) |
- **The experiment found that the Naive model has f0 range issues on small data. Please prioritize fine-tuning the Naive model with fewer steps or directly using the infinite range ddsp model.**

### 2.3 Naive pre training model and DDSP pre training model matched with 2.2
Expand Down

0 comments on commit a49f446

Please sign in to comment.