Problems of Generating tr_label_phn during Inference #15

LyWangPX · 2023-02-07T15:36:59Z

In my own inference experiment, I notice the score is mainly determined not by the .wav but by the phn.
There was an extreme pattern for multiple sound files of the same word:

Word A 4 4 4 4 4 4
Word B 5 5 5 5 5 5

Even after messing up the .wav files, the results remain the same.
Then I found a potential reason:

In gen_seq_data_phn.py, tr_label_phn or te_label_phn is generated by the phn_dict that is specific to the dataset that we want to use. However, the pretrain model is based on SpeechOcean762. When trying to inference any other dataset, the model will receive these labels specific to the inference dataset not the SpeechOcean dataset, causing inconsistent inference results.

The correct method is to always generate the phn_dict from the one generated when training the SpeechOcean762.
I will update the inference tutorial if you think it is necessary.

The text was updated successfully, but these errors were encountered:

YuanGongND · 2023-02-10T20:24:34Z

Hi, I think you are very correct on this (i.e., The correct method is to always generate the phn_dict from the one generated when training the SpeechOcean762). Otherwise the model won't do anything correct.

I notice the score is mainly determined not by the .wav but by the phn.

However, even if the bug is fixed, the input phn would have an relative large impact on the prediction. This is because 1) different phn have different error prior; and 2) if the phone is pronounced correctly depends on the canonical phone, e.g., for a phone pronounced as /e/, it will be correct if the canonical phone is /e/, but wrong if the canonical phone is /a:/. We did an ablation study in the paper.

-Yuan

amandeepbaberwal · 2023-03-25T06:30:46Z

Hi @YuanGongND did you update the tutorial?

YuanGongND · 2023-03-25T07:38:44Z

@amandeepbaberwal

No, I don't plan to do so as 1) it is not promised in the paper, we already released whatever we have; and 2) it is more related to Kaldi rather than GOPT.

Please understand that we are not a company so cannot provide full support for the project.

-Yuan

amandeepbaberwal · 2023-03-31T06:10:17Z

Hi @LyWangPX could you please explain how did you solve this problem?? I am running into the same problem my score is not changing even i change the content in to .wav file completely.

jianliu-ml · 2024-12-24T08:21:44Z

In my own inference experiment, I notice the score is mainly determined not by the .wav but by the phn. There was an extreme pattern for multiple sound files of the same word:

Word A 4 4 4 4 4 4 Word B 5 5 5 5 5 5

Even after messing up the .wav files, the results remain the same. Then I found a potential reason:

In gen_seq_data_phn.py, tr_label_phn or te_label_phn is generated by the phn_dict that is specific to the dataset that we want to use. However, the pretrain model is based on SpeechOcean762. When trying to inference any other dataset, the model will receive these labels specific to the inference dataset not the SpeechOcean dataset, causing inconsistent inference results.

The correct method is to always generate the phn_dict from the one generated when training the SpeechOcean762. I will update the inference tutorial if you think it is necessary.

Hi, have you updated the inference tutorial, or are your inference scripts correct?

jianliu-ml · 2024-12-30T09:07:36Z

In my own inference experiment, I notice the score is mainly determined not by the .wav but by the phn. There was an extreme pattern for multiple sound files of the same word:

Word A 4 4 4 4 4 4 Word B 5 5 5 5 5 5

Even after messing up the .wav files, the results remain the same. Then I found a potential reason:

In gen_seq_data_phn.py, tr_label_phn or te_label_phn is generated by the phn_dict that is specific to the dataset that we want to use. However, the pretrain model is based on SpeechOcean762. When trying to inference any other dataset, the model will receive these labels specific to the inference dataset not the SpeechOcean dataset, causing inconsistent inference results.

The correct method is to always generate the phn_dict from the one generated when training the SpeechOcean762. I will update the inference tutorial if you think it is necessary.

Hi, if I understand correctly, we can use data/train/lang_nosp/phones-pure.txt to generate the (pure-)phone dict.
A sample code is:

def gen_phn_dict(label):
    phn_dict = {}
    with open('data/lang_nosp/phones-pure.txt') as filein:
        for line in filein:
           phone_s, _ = line.split()
           if not phone_s in phn_dict:
               phn_dict[phone_s] = len(phn_dict)
    return phn_dict

LyWangPX changed the title ~~problems of generating tr_label_phn~~ Problems of Generating tr_label_phn during Inference Feb 7, 2023

YuanGongND added the question Further information is requested label Feb 10, 2023

YuanGongND added bug Something isn't working and removed question Further information is requested labels Feb 11, 2023

LyWangPX mentioned this issue Feb 20, 2023

How can I assess my own .wav file using trained model #16

Open

amandeepbaberwal mentioned this issue Apr 1, 2023

The result seems to be no change #21

Closed

jianliu-ml mentioned this issue Dec 30, 2024

Given the bug in the inference, the results are wired and hard to interpret. #40

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problems of Generating tr_label_phn during Inference #15

Problems of Generating tr_label_phn during Inference #15

LyWangPX commented Feb 7, 2023 •

edited

Loading

YuanGongND commented Feb 10, 2023 •

edited

Loading

amandeepbaberwal commented Mar 25, 2023

YuanGongND commented Mar 25, 2023

amandeepbaberwal commented Mar 31, 2023

jianliu-ml commented Dec 24, 2024

jianliu-ml commented Dec 30, 2024 •

edited

Loading

Problems of Generating tr_label_phn during Inference #15

Problems of Generating tr_label_phn during Inference #15

Comments

LyWangPX commented Feb 7, 2023 • edited Loading

YuanGongND commented Feb 10, 2023 • edited Loading

amandeepbaberwal commented Mar 25, 2023

YuanGongND commented Mar 25, 2023

amandeepbaberwal commented Mar 31, 2023

jianliu-ml commented Dec 24, 2024

jianliu-ml commented Dec 30, 2024 • edited Loading

LyWangPX commented Feb 7, 2023 •

edited

Loading

YuanGongND commented Feb 10, 2023 •

edited

Loading

jianliu-ml commented Dec 30, 2024 •

edited

Loading