-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Problems of Generating tr_label_phn during Inference #15
Comments
Hi, I think you are very correct on this (i.e., The correct method is to always generate the phn_dict from the one generated when training the SpeechOcean762). Otherwise the model won't do anything correct.
However, even if the bug is fixed, the input phn would have an relative large impact on the prediction. This is because 1) different phn have different error prior; and 2) if the phone is pronounced correctly depends on the canonical phone, e.g., for a phone pronounced as /e/, it will be correct if the canonical phone is /e/, but wrong if the canonical phone is /a:/. We did an ablation study in the paper. -Yuan |
Hi @YuanGongND did you update the tutorial? |
No, I don't plan to do so as 1) it is not promised in the paper, we already released whatever we have; and 2) it is more related to Kaldi rather than GOPT. Please understand that we are not a company so cannot provide full support for the project. -Yuan |
Hi @LyWangPX could you please explain how did you solve this problem?? I am running into the same problem my score is not changing even i change the content in to .wav file completely. |
Hi, have you updated the inference tutorial, or are your inference scripts correct? |
Hi, if I understand correctly, we can use
|
In my own inference experiment, I notice the score is mainly determined not by the .wav but by the phn.
There was an extreme pattern for multiple sound files of the same word:
Word A 4 4 4 4 4 4
Word B 5 5 5 5 5 5
Even after messing up the .wav files, the results remain the same.
Then I found a potential reason:
In
gen_seq_data_phn.py
,tr_label_phn
orte_label_phn
is generated by thephn_dict
that is specific to the dataset that we want to use. However, the pretrain model is based on SpeechOcean762. When trying to inference any other dataset, the model will receive these labels specific to the inference dataset not the SpeechOcean dataset, causing inconsistent inference results.The correct method is to always generate the phn_dict from the one generated when training the SpeechOcean762.
I will update the inference tutorial if you think it is necessary.
The text was updated successfully, but these errors were encountered: