merging utterances based on the start time and end time to form a paragraph. #3953

aarora8 · 2020-02-22T22:07:57Z

It addresses two issues raised in the CHiME group.

The first issue was a different result was obtained using the pre-trained models. Around 2% degradation in WER in the dev set and around 0.8% for the eval set, however, in my run the degradation was around 3% for dev set. It was possibly because the previous results were reported without array synchronization.
The second issue was sorting based on the start time and end time was not performed while merging utterances from each speaker for each session to form a paragraph.

In the S01.json file of the eval set, array U02 was the ref in the starting and then in the middle but in the code, all utterances of array U02 were merged together in the starting. It caused incorrect scoring for the eval set. With this fix the result changed from 85.42 to 78.08 for the eval set.

Results after running the setup with the pre-trained model
Dev: %WER 84.33 [ 49653 / 58881, 1529 ins, 35813 del, 12311 sub ]
Eval: %WER 85.42 [ 47093 / 55132, 1583 ins, 32671 del, 12839 sub ]

Results after fixing scoring and running the setup with the pre-trained model
Dev: %WER 84.33 [ 49653 / 58881, 1529 ins, 35813 del, 12311 sub ]
Eval: %WER 78.08 [ 43046 / 55132, 957 ins, 32045 del, 10044 sub ]
@sw005320

danpovey · 2020-02-23T03:43:46Z

OK, I assume you guys have some kind of internal process to give the OK for these things, please let me know when it's ready to merge.

sw005320 · 2020-02-24T20:50:54Z

OK, I assume you guys have some kind of internal process to give the OK for these things, please let me know when it's ready to merge.

This is OK and ready to merge.
As a record, I also put a thread in the CHiME challenge google group discussion about it.

bug fix sorting utterances by their start time and end time

5293ffb

danpovey merged commit a257387 into kaldi-asr:master Feb 25, 2020

aarora8 deleted the chime6_feb_c02 branch December 5, 2020 04:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

merging utterances based on the start time and end time to form a paragraph. #3953

merging utterances based on the start time and end time to form a paragraph. #3953

aarora8 commented Feb 22, 2020

danpovey commented Feb 23, 2020

sw005320 commented Feb 24, 2020

merging utterances based on the start time and end time to form a paragraph. #3953

merging utterances based on the start time and end time to form a paragraph. #3953

Conversation

aarora8 commented Feb 22, 2020

danpovey commented Feb 23, 2020

sw005320 commented Feb 24, 2020