-
Notifications
You must be signed in to change notification settings - Fork 5.4k
Fix a potential bug in carpa when calculating backoff logprob #4195
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I think the bug isn't there, I think the bug is at line 794: it does |
I think you are right. I will try this soon. |
great find! |
I have tested the modification, and it works as expected. Another place in function |
… On Tue, Jul 28, 2020 at 5:19 PM Jinhao Zhao ***@***.***> wrote:
I have tested the modification, and it works as expected. Another place in
function GetArc() is also modifed to get the identical result.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#4195 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAZFLOZJFK5OTD3ZNJNTJTDR52J3BANCNFSM4PKCB74Q>
.
|
@kkm000 please merge if you OK the change, it's your code (or touches your code) |
LGTM @jtrmal, no, I wrote the compiler, I think carpa was a later addition, so I'm getting no credit for this By the way, not related to this change but just in case, a tidbit to keep in mind about the messiness of
In other words,
Also, for regular floats of 32, 64 and 80 bit, and, I think but not sure, for all IEEE-754[-like?] floating point numbers, For reference, the table
and the code to produce it, partly lifted from cppreference.com: #include <limits>
#include <iostream>
int main()
{
std::cout << "| `T` | `lowest()` | `denorm_min()` | `min()` | `max()` |\n"
<< "| ---: | ---: | ---: | ---: | ---: |\n"
<< "| float | "
<< std::numeric_limits<float>::lowest() << " | "
<< std::numeric_limits<float>::denorm_min() << " | "
<< std::numeric_limits<float>::min() << " | "
<< std::numeric_limits<float>::max() << " |\n"
<< "| double | "
<< std::numeric_limits<double>::lowest() << " | "
<< std::numeric_limits<double>::denorm_min() << " | "
<< std::numeric_limits<double>::min() << " | "
<< std::numeric_limits<double>::max() << " |\n"
<< "| long double | "
<< std::numeric_limits<long double>::lowest() << " | "
<< std::numeric_limits<long double>::denorm_min() << " | "
<< std::numeric_limits<long double>::min() << " | "
<< std::numeric_limits<long double>::max() << " |\n";
} And, in the end, I think I should put into Wiki somewhere. |
Suppose there is an n-gram A-B. If B is not in the vocabulary list, it should be treated as <unk>, or as the -inf prob. In the latter case, it seems that A-B should also be -inf, but in current code, it is actually backoff(A).
Usually this happens when the language model for rescoring is not the superset of the language model for decoding. Therefore, some words in the lattice may not be recognized when rescoring.
Take an example to show the bad effect. Suppose 1,2,3,... are the states in the lattices, and A,B,C are the words on the edge. Consider this lattice:
1---(A)---2---(B)---3---(C)---5
\ /
--(D)---4---(E)--
where A-D-E is supposed to be the correct result, and B is a low frequency word such that B does not appear in the vocabulary list. It can be imagined that the weight of 2-(B)-3 is large, so that the one-best algorithm is less likely to choose this path.
Now, in the rescoring procedure, we subtract the weight of them, so 2-(B)-3 is smaller than others. Normally we do not need to worry about that, because it may soon be added by another large number (as it is low frequency), or removed (as it does not appear in the vocabulary list).
However, since A-B will trigger the bug, the probability may be backoff(A), which means 2-(B)-3 is not so large, so the result may be mistakenly set to A-B-C.
This is also the reason that carpa differs from the fst-like structure. In the fst-like structure, edge near B will be removed if B is out of vocabulary. It would definitely reduce the error when the language model is small, but it is unexpected when the model is large.