Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed rounding issue in utils/data/get_uniform_subsegments.py #2200

Merged
merged 1 commit into from
Feb 1, 2018

Conversation

mmaciej2
Copy link
Contributor

@mmaciej2 mmaciej2 commented Feb 1, 2018

Using the "int" function instead of proper rounding while creating the segment ID could lead to cases where two segments would erroneously receive the same ID.

@danpovey
Copy link
Contributor

danpovey commented Feb 1, 2018

Can you give an example of the problem? I have a hard time seeing how this could happen.

@mmaciej2
Copy link
Contributor Author

mmaciej2 commented Feb 1, 2018

@danpovey The problem happens when the start_time (I assume as some kind of machine precision issue) that is read out of the segments file is slightly larger than its value rounded to two decimal places. So the sub-segment is listed as being from something like time 0.99999 to 2.9999 and will produce a segments file with something like this:
utt_000099_000299 1.00 3.00
If there is a segment going from 0.99 to 2.99, it will receive the same ID, i.e.
utt_000099_000299 0.99 2.99

I think there's an argument that you should not be trying to produce subsegments that are only one frame apart, so the ID collision isn't an issue, but nevertheless the code in the master branch produces IDs that are inconsistent with the time marks by 0.01 seconds.

If you would like me to point you to a particular segments file and set of parameters that have this problem, I can set that up as well.

@danpovey danpovey merged commit c82560d into kaldi-asr:master Feb 1, 2018
@mmaciej2 mmaciej2 deleted the subsegmentation-fix branch February 1, 2018 23:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants