1. add huggingface SVS; 2. add inference logic from raw inputs; 3. fix typo in readme.

MoonInTheRiver · MoonInTheRiver · commit 60b950ce53cd · 2022-08-02T11:53:42.000+08:00
diff --git a/docs/README-SVS.md b/docs/README-SVS.md
@@ -7,7 +7,7 @@
 ## DiffSinger (SVS)
 
 ### PART1. [Run DiffSinger on PopCS](README-SVS-popcs.md)
-In PART1, we only focus on spectrum modeling (acoustic model) and assume the ground-truth (GT) F0 to be given as the pitch information following these papers [1][2][3]. If you want to conduct experiments on F0 prediction, please move to PART2.
+In PART1, we only focus on spectrum modeling (acoustic model) and assume the ground-truth (GT) F0 to be given as the pitch information following these papers [1][2][3]. If you want to conduct experiments with F0 prediction, please move to PART2.
 
 Thus, the pipeline of this part can be summarized as:
 
@@ -57,20 +57,20 @@ Thus, the pipeline of [2.B](README-SVS-opencpop-e2e.md) can be summarized as:
 Click here for detailed instructions: [link](README-SVS-opencpop-e2e.md).
 
 ### FAQ
-Q: Why do I need F0 in Vocoders?
+Q1: Why do I need F0 in Vocoders?
 
-A: See vocoder parts in HiFiSinger, DiffSinger or SingGAN. This is a common practice now.
+A1: See vocoder parts in HiFiSinger, DiffSinger or SingGAN. This is a common practice now.
 
-Q: Why not run MIDI version SVS on PopCS dataset? or Why not release MIDI labels for PopCS dataset?
+Q2: Why not run MIDI version SVS on PopCS dataset? or Why not release MIDI labels for PopCS dataset?
 
-A: Our laboratory has no funds to label PopCS dataset. But there are funds for labeling other singing datasets, which is coming soon.
+A2: Our laboratory has no funds to label PopCS dataset. But there are funds for labeling other singing dataset, which is coming soon.
 
-Q: Why " 'HifiGAN' object has no attribute 'model' "?
+Q3: Why " 'HifiGAN' object has no attribute 'model' "?
 
-A: Please put the pretrained vocoders in your `checkpoints` dictionary.
+A3: Please put the pretrained vocoders in your `checkpoints` dictionary.
 
-Q: How to check whether I use GT information or predicted information during inference from packed test set?
+Q4: How to check whether I use GT information or predicted information during inference from packed test set?
 
-A: Please see codes [here](https://github.com/MoonInTheRiver/DiffSinger/blob/55e2f46068af6e69940a9f8f02d306c24a940cab/tasks/tts/fs2.py#L343).
+A4: Please see codes [here](https://github.com/MoonInTheRiver/DiffSinger/blob/55e2f46068af6e69940a9f8f02d306c24a940cab/tasks/tts/fs2.py#L343).
 
 ...
diff --git a/inference/svs/base_svs_infer.py b/inference/svs/base_svs_infer.py
@@ -123,7 +123,7 @@ def preprocess_word_level_input(self, inp):
             #  0        0          1
             if len(note_in_this_word) > 1:  # is_slur = True, we should repeat the YUNMU to match the 2nd, 3rd... notes.
                 for idx in range(1, len(note_in_this_word)):
-                    ph_lst.append(ph_in_this_word[1])
+                    ph_lst.append(ph_in_this_word[-1])
                     note_lst.append(note_in_this_word[idx])
                     midi_dur_lst.append(midi_dur_in_this_word[idx])
                     is_slur.append(1)
diff --git a/inference/svs/gradio/gradio_settings.yaml b/inference/svs/gradio/gradio_settings.yaml
@@ -15,6 +15,12 @@ example_inputs:
     你 说 你 不 SP 懂 为 何 在 这 时 牵 手 AP<sep>D#4/Eb4 | D#4/Eb4 | D#4/Eb4 | D#4/Eb4 | rest | D#4/Eb4 | D4 | D4 | D4 | D#4/Eb4 | F4 | D#4/Eb4 | D4 | rest<sep>0.113740 | 0.329060 | 0.287950 | 0.133480 | 0.150900 | 0.484730 | 0.242010 | 0.180820 | 0.343570 | 0.152050 | 0.266720 | 0.280310 | 0.633300 | 0.444590
   - |-
     小酒窝长睫毛AP是你最美的记号<sep>C#4/Db4 | F#4/Gb4 | G#4/Ab4 | A#4/Bb4 F#4/Gb4 | F#4/Gb4 C#4/Db4 | C#4/Db4 | rest | C#4/Db4 | A#4/Bb4 | G#4/Ab4 | A#4/Bb4 | G#4/Ab4 | F4 | C#4/Db4<sep>0.407140 | 0.376190 | 0.242180 | 0.509550 0.183420 | 0.315400 0.235020 | 0.361660 | 0.223070 | 0.377270 | 0.340550 | 0.299620 | 0.344510 | 0.283770 | 0.323390 | 0.360340
+  - |-
+    小酒窝长睫毛AP那是可爱猪宝宝<sep>C#4/Db4 | F#4/Gb4 | G#4/Ab4 | A#4/Bb4 F#4/Gb4 | F#4/Gb4 C#4/Db4 | C#4/Db4 | rest | C#4/Db4 | A#4/Bb4 | G#4/Ab4 | A#4/Bb4 | G#4/Ab4 | F4 | C#4/Db4<sep>0.407140 | 0.376190 | 0.242180 | 0.509550 0.183420 | 0.315400 0.235020 | 0.361660 | 0.223070 | 0.377270 | 0.340550 | 0.299620 | 0.344510 | 0.283770 | 0.323390 | 0.360340
+  - |-
+    我真的SP爱你SP句句不轻易<sep>D4 | A4 | F#4 |  rest | A4 | D4 | rest | B4 | A4 F#4 | F#4 | A4 | A4<sep>0.8 | 0.4 | 0.967 | 0.3 | 0.4 | 0.967 | 0.4 | 0.8 | 0.4 0.4 | 0.25 | 0.967 | 0.9
+  - |-
+    好冷啊 AP 我在东北玩泥巴<sep>F4 | F4 | D4 | rest | D4 | D4 | C4 | C4 | B3 | C4 | D4<sep>0.5 | 0.3 | 0.3 | 0.3 | 0.2 | 0.2 | 0.2 | 0.2 | 0.25 | 0.25 | 0.4
 
 #inference_cls: inference.svs.ds_cascade.DiffSingerCascadeInfer
 #exp_name: 0303_opencpop_ds58_midi