-
Notifications
You must be signed in to change notification settings - Fork 5.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[src] Make arpa2fst robust against ARPA files without <s>. #2167
Conversation
Previously, arpa2fst would silently create a mal-formed G.fst (it would have not start state) which would create an empty LG.fst when it was composed. This wouldn't be caught until the end of utils/mkgraph.sh. It would be good to test for the presence of </s>, but VectorFst's do not provide quick access to the final state (since there could be multiple final statates in OpenFST's representation).
src/lm/arpa-lm-compiler.h
Outdated
const fst::StdVectorFst& Fst() const { return fst_; } | ||
fst::StdVectorFst* MutableFst() { return &fst_; } | ||
const fst::StdVectorFst& Fst() const { CheckFstInvariants(); return fst_; } | ||
fst::StdVectorFst* MutableFst() { CheckFstInvariants(); return &fst_; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is cool, but I'd prefer if you put this check at the end of ReadComplete(). It gives fewer things to worry about.
And probably rename CheckFstInvariants() to Check(), and make it private-- just because I normally use that name for functions that do a check of some kind.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your suggestions are reasonable. I'll plan to address them tonight. By the way, CheckFstInvariants() was already private.
@galv, don't forget about this! |
Just want to make a unit test first. I'll do it tomorrow night (or perhaps earlier). |
I really don't think a unit test is necessary for this, it's a fairly simple fix. |
Sorry I didn't see your message. I did the unit test anyway just now because I was scared. I just pushed two commits. 4e6ed54 for cleanup based on your fix and a0944bb for the unit test. It was also a good opportunity for me to learn what a backoff weight actually is in an n-gram language model, as well. I missed that part of education. |
Previously, arpa2fst would silently create a mal-formed G.fst (it
would have not start state) which would create an empty LG.fst when it
was composed. This wouldn't be caught until the end of
utils/mkgraph.sh.
It would be good to test for the presence of
</s>
, but VectorFst's donot provide quick access to the final state (since there could be
multiple final statates in OpenFST's representation).
I tested by running this on my own personal workflow which (by accident) did not have
<s>
and</s>
initially, and made sure it caught the problem. I also made sure that existing unit tests pass. I can make a new unit test if necessary.