Skip to content

Latest commit

 

History

History
101 lines (89 loc) · 6.58 KB

README.md

File metadata and controls

101 lines (89 loc) · 6.58 KB

Results

  • Setup: fbank80, num_frms200, epoch150, ArcMargin, aug_prob0.6, speed_perturb (no spec_aug)
  • Scoring: cosine (sub mean of vox2_dev), AS-Norm, QMF
  • Metric: EER(%)
  • 🔥 UPDATE 2024.09.03: We support the SimAM_ResNet pretrained on VoxBlink2 and Finetuned on Voxceleb2!
  • 🔥 UPDATE 2024.08.27: We support SSL models as the feature front-end, take a look at the WavLM recipe!
  • UPDATE 2022.07.19: We apply the same setups as the winning system of CNSRC 2022 (see cnceleb recipe for details), and obtain significant performance improvement.
    • LR scheduler warmup from 0
    • Remove one embedding layer in ResNet models
    • Add large margin fine-tuning strategy (LM)
Model Params Flops LM AS-Norm QMF vox1-O-clean vox1-E-clean vox1-H-clean
XVEC-TSTP-emb512 4.61M 0.53G × × × 1.989 1.950 3.412
× × 1.834 1.846 3.124
× × 1.749 1.721 2.944
× 1.590 1.641 2.726
ECAPA_TDNN_GLOB_c512-ASTP-emb192 6.19M 1.04G × × × 1.069 1.209 2.310
× × 0.957 1.128 2.105
× × 0.878 1.072 2.007
× 0.782 1.005 1.824
ECAPA_TDNN_GLOB_c1024-ASTP-emb192 14.65M 2.65G × × × 0.856 1.072 2.059
× × 0.808 0.990 1.874
× × 0.798 0.993 1.883
× 0.728 0.929 1.721
0.707 0.894 1.615
ResNet34-TSTP-emb256 6.63M 4.55G × × × 0.867 1.049 1.959
× × 0.787 0.964 1.726
× 0.718 0.911 1.606
× × 0.797 0.937 1.695
× 0.723 0.867 1.532
0.659 0.821 1.437
ResNet221-TSTP-emb256 23.79M 21.29G × × × 0.569 0.774 1.464
× × 0.479 0.707 1.290
× × 0.580 0.729 1.351
× 0.505 0.676 1.213
ResNet293-TSTP-emb256 28.62M 28.10G × × × 0.595 0.756 1.433
× × 0.537 0.701 1.276
× × 0.532 0.707 1.311
× 0.447 0.657 1.183
0.425 0.641 1.146
RepVGG_TINY_A0 6.26M 4.65G × × × 0.909 1.034 1.943
× × 0.824 0.953 1.709
CAM++ 7.18M 1.15G × × × 0.803 0.932 1.860
× × 0.718 0.879 1.735
x × 0.707 0.845 1.664
× 0.659 0.803 1.569
ERes2Net34_Base 7.88M 3.43G × × × 0.914 1.065 1.986
× × 0.803 0.976 1.787
x × 0.824 0.968 1.776
× 0.744 0.896 1.603
Res2Net34_Base 4.68M 1.77G × × × 1.351 1.347 2.478
× × 1.234 1.232 2.162
Gemini_DFResNet114 6.53M 5.42G × × × 0.787 0.963 1.760
× × 0.707 0.889 1.546
x × 0.771 0.906 1.599
× 0.638 0.839 1.427
SimAM_ResNet34 (VoxBlink2 Pretrain) 25.2M x × 0.415 0.615 1.121
× 0.372 0.581 1.049
0.372 0.559 0.997
SimAM_ResNet100 (VoxBlink2 Pretrain) 50.2M x × 0.229 0.458 0.868
× 0.207 0.424 0.804
0.202 0.421 0.795

PLDA results

If you are interested in the PLDA scoring (which is inferior to the simple cosine scoring under the margin based setting), simply run:

local/score_plda.sh --stage 1 --stop-stage 3 --exp_dir exp_name

The results on ResNet34 (large margin, no asnorm) are:

Scoring method vox1-O-clean vox1-E-clean vox1-H-clean
PLDA 1.207 1.350 2.528

WavLM results

  • Pre-trained frontend: the WavLM Large model, multilayer features are used
  • Speaker model: ECAPA_TDNN_GLOB_c512-ASTP-emb192
  • Training strategy: Frozen => Joint ft => Joint lmft
bash run_wavlm.sh --stage 3 --stop_stage 9
Training strategy AS-Norm QMF vox1-O-clean vox1-E-clean vox1-H-clean
Frozen × × 0.595 0.719 1.501
× 0.548 0.656 1.355
0.489 0.619 1.224
Frozen => Joint ft × × 0.542 0.635 1.355
× 0.521 0.594 1.237
0.494 0.576 1.205
Frozen => Joint ft => Joint lmft × × 0.521 0.626 1.344
× 0.495 0.588 1.247
0.415 0.551 1.118