Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

merging utterances based on the start time and end time to form a paragraph. #3953

Merged
merged 1 commit into from
Feb 25, 2020

Conversation

aarora8
Copy link
Contributor

@aarora8 aarora8 commented Feb 22, 2020

It addresses two issues raised in the CHiME group.

  • The first issue was a different result was obtained using the pre-trained models. Around 2% degradation in WER in the dev set and around 0.8% for the eval set, however, in my run the degradation was around 3% for dev set. It was possibly because the previous results were reported without array synchronization.
  • The second issue was sorting based on the start time and end time was not performed while merging utterances from each speaker for each session to form a paragraph.

In the S01.json file of the eval set, array U02 was the ref in the starting and then in the middle but in the code, all utterances of array U02 were merged together in the starting. It caused incorrect scoring for the eval set. With this fix the result changed from 85.42 to 78.08 for the eval set.

Results after running the setup with the pre-trained model
Dev: %WER 84.33 [ 49653 / 58881, 1529 ins, 35813 del, 12311 sub ]
Eval: %WER 85.42 [ 47093 / 55132, 1583 ins, 32671 del, 12839 sub ]

Results after fixing scoring and running the setup with the pre-trained model
Dev: %WER 84.33 [ 49653 / 58881, 1529 ins, 35813 del, 12311 sub ]
Eval: %WER 78.08 [ 43046 / 55132, 957 ins, 32045 del, 10044 sub ]
@sw005320

@danpovey
Copy link
Contributor

OK, I assume you guys have some kind of internal process to give the OK for these things, please let me know when it's ready to merge.

@sw005320
Copy link
Contributor

OK, I assume you guys have some kind of internal process to give the OK for these things, please let me know when it's ready to merge.

This is OK and ready to merge.
As a record, I also put a thread in the CHiME challenge google group discussion about it.

@danpovey danpovey merged commit a257387 into kaldi-asr:master Feb 25, 2020
@aarora8 aarora8 deleted the chime6_feb_c02 branch December 5, 2020 04:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants