This is a Tensorflow implementation of Speaker Indepentent Source Separation.
Models we have implemented are (1) TasNet and (2) Cross Domain Joint Embedding and Clustering Network.
Also, we have implemented an alternative to solve the label ambiguity problem, described in Interrupted and Cascaded PIT.
You can run on TensorFlow 2 !!! However, instead of executing eagerly, we build the graph first as done in TF v1.
No. | Model | Label Assignment | SDRi (Validation) | SDRi (Test) |
---|---|---|---|---|
(1) | Tasnet | PIT | 16.2 dB | 15.8 dB |
(2) | CDNet | PIT | 17.1 dB | 16.9 dB |
(3) | TasNet | Fixed Assign (L=100) | 17.3 dB | 16.9 dB |
(4) | TasNet | Fixed Assign (L=80) | 17.7 dB | 17.4 dB |
(5) | TasNet Init from (4) | PIT | 18.0 dB | 17.7 dB |
python main.py -m train -c json/config.json
python main.py -m test -c models/name/config.json -ckpt chosen_checkpoint
1st stage :
Train a model using PIT first : python main.py -m train -c json/tasnet-1.json
2nd stage :
Extract the pretrained label assignment of a 1st stage model by using write_pretrained_perm('tasnet-1', 100)
in util.py to generate a fixed label assignment file in models/tasnet-1/perm_idx/100.csv
(default epoch = 100)
Then train the model using python main.py -m train -c json/tasnet-2.json
3rd stage :
Load the 2nd stage model parameters, and continue to train the model with PIT.
Which can be simply done by python main.py -m train -c json/tasnet-3.json
.
A detailed description of all configurable parameters can be found in json/tasnet-1.json
Argument | Valid Inputs | Default | Description |
---|---|---|---|
mode | train/test | training | |
config | string | config.json | Path to JSON-formatted config file |
ckpt | string | None | Path to model's checkpoint. If not specfied, will automatically load the latest checkpoint. |
From SPHERE to wav : bash convert_wsj0.sh
Generate WSJ0-2mix (Wall Street Journal with 2-speaker mixture) or WSJ0-3mix
-
Download official code or use my modified version in
create_wav_2speakers.m
andcreate_wav_3speakers.m
-
Download voicebox
-
Steps to run octave on linux:
(1) run
octave-cli
(2) load package
pkg load <pkg-name>
(3) run
create_wav_2speakers.m
orcreate_wav_3speakers.m
If you find this repo interesting, you can refer to more details in the following papers.
[1] Yang, G., Tuan, C., Lee, H., Lee, L. (2019) Improved Speech Separation with Time-and-Frequency Cross-Domain Joint Embedding and Clustering. Proc. Interspeech 2019, 1363-1367, DOI: 10.21437/Interspeech.2019-2181. Link to paper
[2] G. Yang, S. Wu, Y. Mao, H. Lee and L. Lee, "Interrupted and Cascaded Permutation Invariant Training for Speech Separation," ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain, 2020, pp. 6369-6373, doi: 10.1109/ICASSP40776.2020.9053697. Link to paper