You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In line2pairs, line 53 and 56, it checks whether the input ngram and the output ngram overlap (completely or partially).
For complete overlap, I reckon you have to check if the first token of each ngram is the same and if both ngrams are the same length. However, this last check is performed with the input_order and output_order variables, that don't represent those particular ngrams' length but the maximum ngram's length to search for in the line. For example, if you have input_order = 1, output_order = 2 and overlap = True, you will never pass the input_order = output_order check and therefore you will eventually get ngrams paired with themselves.
The same thing happens with the partial overlap check.
Shouldn't line 53 be if i == l and j == k:
instead of if i == l and input_order == output_order:
And line 56 if len(set(range(i, i + j)) & set(range(l, l + k))) > 0:
instead of if len(set(range(i, i + j)) & set(range(l, l + k))) > 0:
The text was updated successfully, but these errors were encountered:
Thanks a lot. You are right.
I have fixed the problem.
I also analyze the impact of this bug on the final results.
It has minor impact on overlap setting.
But it can influence the results when non-overlap setting is used and window size is small.
In line2pairs, line 53 and 56, it checks whether the input ngram and the output ngram overlap (completely or partially).
For complete overlap, I reckon you have to check if the first token of each ngram is the same and if both ngrams are the same length. However, this last check is performed with the input_order and output_order variables, that don't represent those particular ngrams' length but the maximum ngram's length to search for in the line. For example, if you have input_order = 1, output_order = 2 and overlap = True, you will never pass the input_order = output_order check and therefore you will eventually get ngrams paired with themselves.
The same thing happens with the partial overlap check.
Shouldn't line 53 be
if i == l and j == k:
instead of
if i == l and input_order == output_order:
And line 56
if len(set(range(i, i + j)) & set(range(l, l + k))) > 0:
instead of
if len(set(range(i, i + j)) & set(range(l, l + k))) > 0:
The text was updated successfully, but these errors were encountered: