potential issue with SARI n-gram add-score

Hi, I have observed a particular situation with the SARI implementation where system outputs can receive a <100 score even when they are identical to the reference (where there is only a single reference). 

Basically, if a reference does not introduce new tokens, it will receive a 0.00 unigram add-score, but 100 for all n>1-grams.

Take the following example:
```python
sources=["Shu Abe (born June 7 1984) is a former Japanese football player."]
predictions=["Shu Abe (born June 7 1984) is a Japanese football player."]
references=[["Shu Abe (born June 7 1984) is a Japanese football player."]]
sari_score = corpus_sari(sources, predictions, references)
print(sari_score)

>>> 91.66666666666667
```

In this case, the add score will be 75.0 because there are no new unigrams (because of the `if sys_total > 0:` checks in `compute_precision_recall_f1()`) but there are technically new bigrams, trigrams, and 4-grams around the location of the deleted word (`["a japanese", "a japanese football", "is a japanese"]`, etc.).

I am just curious of whether this is the expected behaviour or if a definitive 0.00 or 100.0 result for the add-score would be more desirable?

Thanks in advance for any insight. 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

potential issue with SARI n-gram add-score #99

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development