Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problems of Generating tr_label_phn during Inference #15

Open
LyWangPX opened this issue Feb 7, 2023 · 6 comments
Open

Problems of Generating tr_label_phn during Inference #15

LyWangPX opened this issue Feb 7, 2023 · 6 comments
Labels
bug Something isn't working

Comments

@LyWangPX
Copy link
Contributor

LyWangPX commented Feb 7, 2023

In my own inference experiment, I notice the score is mainly determined not by the .wav but by the phn.
There was an extreme pattern for multiple sound files of the same word:

Word A 4 4 4 4 4 4
Word B 5 5 5 5 5 5

Even after messing up the .wav files, the results remain the same.
Then I found a potential reason:

In gen_seq_data_phn.py, tr_label_phn or te_label_phn is generated by the phn_dict that is specific to the dataset that we want to use. However, the pretrain model is based on SpeechOcean762. When trying to inference any other dataset, the model will receive these labels specific to the inference dataset not the SpeechOcean dataset, causing inconsistent inference results.

The correct method is to always generate the phn_dict from the one generated when training the SpeechOcean762.
I will update the inference tutorial if you think it is necessary.

@LyWangPX LyWangPX changed the title problems of generating tr_label_phn Problems of Generating tr_label_phn during Inference Feb 7, 2023
@YuanGongND YuanGongND added the question Further information is requested label Feb 10, 2023
@YuanGongND
Copy link
Owner

YuanGongND commented Feb 10, 2023

Hi, I think you are very correct on this (i.e., The correct method is to always generate the phn_dict from the one generated when training the SpeechOcean762). Otherwise the model won't do anything correct.

I notice the score is mainly determined not by the .wav but by the phn.

However, even if the bug is fixed, the input phn would have an relative large impact on the prediction. This is because 1) different phn have different error prior; and 2) if the phone is pronounced correctly depends on the canonical phone, e.g., for a phone pronounced as /e/, it will be correct if the canonical phone is /e/, but wrong if the canonical phone is /a:/. We did an ablation study in the paper.

-Yuan

@YuanGongND YuanGongND added bug Something isn't working and removed question Further information is requested labels Feb 11, 2023
@amandeepbaberwal
Copy link

Hi @YuanGongND did you update the tutorial?

@YuanGongND
Copy link
Owner

@amandeepbaberwal

No, I don't plan to do so as 1) it is not promised in the paper, we already released whatever we have; and 2) it is more related to Kaldi rather than GOPT.

Please understand that we are not a company so cannot provide full support for the project.

-Yuan

@amandeepbaberwal
Copy link

Hi @LyWangPX could you please explain how did you solve this problem?? I am running into the same problem my score is not changing even i change the content in to .wav file completely.

@jianliu-ml
Copy link

In my own inference experiment, I notice the score is mainly determined not by the .wav but by the phn. There was an extreme pattern for multiple sound files of the same word:

Word A 4 4 4 4 4 4 Word B 5 5 5 5 5 5

Even after messing up the .wav files, the results remain the same. Then I found a potential reason:

In gen_seq_data_phn.py, tr_label_phn or te_label_phn is generated by the phn_dict that is specific to the dataset that we want to use. However, the pretrain model is based on SpeechOcean762. When trying to inference any other dataset, the model will receive these labels specific to the inference dataset not the SpeechOcean dataset, causing inconsistent inference results.

The correct method is to always generate the phn_dict from the one generated when training the SpeechOcean762. I will update the inference tutorial if you think it is necessary.

Hi, have you updated the inference tutorial, or are your inference scripts correct?

@jianliu-ml
Copy link

jianliu-ml commented Dec 30, 2024

In my own inference experiment, I notice the score is mainly determined not by the .wav but by the phn. There was an extreme pattern for multiple sound files of the same word:

Word A 4 4 4 4 4 4 Word B 5 5 5 5 5 5

Even after messing up the .wav files, the results remain the same. Then I found a potential reason:

In gen_seq_data_phn.py, tr_label_phn or te_label_phn is generated by the phn_dict that is specific to the dataset that we want to use. However, the pretrain model is based on SpeechOcean762. When trying to inference any other dataset, the model will receive these labels specific to the inference dataset not the SpeechOcean dataset, causing inconsistent inference results.

The correct method is to always generate the phn_dict from the one generated when training the SpeechOcean762. I will update the inference tutorial if you think it is necessary.

Hi, if I understand correctly, we can use data/train/lang_nosp/phones-pure.txt to generate the (pure-)phone dict.
A sample code is:

def gen_phn_dict(label):
    phn_dict = {}
    with open('data/lang_nosp/phones-pure.txt') as filein:
        for line in filein:
           phone_s, _ = line.split()
           if not phone_s in phn_dict:
               phn_dict[phone_s] = len(phn_dict)
    return phn_dict

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants