Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[src] Make arpa2fst robust against ARPA files without <s>. #2167

Merged
merged 3 commits into from
Feb 1, 2018

Conversation

galv
Copy link
Contributor

@galv galv commented Jan 21, 2018

Previously, arpa2fst would silently create a mal-formed G.fst (it
would have not start state) which would create an empty LG.fst when it
was composed. This wouldn't be caught until the end of
utils/mkgraph.sh.

It would be good to test for the presence of </s>, but VectorFst's do
not provide quick access to the final state (since there could be
multiple final statates in OpenFST's representation).

I tested by running this on my own personal workflow which (by accident) did not have <s> and </s> initially, and made sure it caught the problem. I also made sure that existing unit tests pass. I can make a new unit test if necessary.

Previously, arpa2fst would silently create a mal-formed G.fst (it
would have not start state) which would create an empty LG.fst when it
was composed. This wouldn't be caught until the end of
utils/mkgraph.sh.

It would be good to test for the presence of </s>, but VectorFst's do
not provide quick access to the final state (since there could be
multiple final statates in OpenFST's representation).
const fst::StdVectorFst& Fst() const { return fst_; }
fst::StdVectorFst* MutableFst() { return &fst_; }
const fst::StdVectorFst& Fst() const { CheckFstInvariants(); return fst_; }
fst::StdVectorFst* MutableFst() { CheckFstInvariants(); return &fst_; }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is cool, but I'd prefer if you put this check at the end of ReadComplete(). It gives fewer things to worry about.
And probably rename CheckFstInvariants() to Check(), and make it private-- just because I normally use that name for functions that do a check of some kind.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your suggestions are reasonable. I'll plan to address them tonight. By the way, CheckFstInvariants() was already private.

@danpovey
Copy link
Contributor

@galv, don't forget about this!

@galv
Copy link
Contributor Author

galv commented Jan 31, 2018

Just want to make a unit test first. I'll do it tomorrow night (or perhaps earlier).

@danpovey
Copy link
Contributor

I really don't think a unit test is necessary for this, it's a fairly simple fix.

@galv
Copy link
Contributor Author

galv commented Feb 1, 2018

Sorry I didn't see your message. I did the unit test anyway just now because I was scared. I just pushed two commits. 4e6ed54 for cleanup based on your fix and a0944bb for the unit test.

It was also a good opportunity for me to learn what a backoff weight actually is in an n-gram language model, as well. I missed that part of education.

@danpovey danpovey merged commit 2de3b38 into kaldi-asr:master Feb 1, 2018
Skaiste pushed a commit to Skaiste/idlak that referenced this pull request Sep 26, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants