Skip to content

potential issue with SARI n-gram add-score #99

Open
@liamcripwell

Description

Hi, I have observed a particular situation with the SARI implementation where system outputs can receive a <100 score even when they are identical to the reference (where there is only a single reference).

Basically, if a reference does not introduce new tokens, it will receive a 0.00 unigram add-score, but 100 for all n>1-grams.

Take the following example:

sources=["Shu Abe (born June 7 1984) is a former Japanese football player."]
predictions=["Shu Abe (born June 7 1984) is a Japanese football player."]
references=[["Shu Abe (born June 7 1984) is a Japanese football player."]]
sari_score = corpus_sari(sources, predictions, references)
print(sari_score)

>>> 91.66666666666667

In this case, the add score will be 75.0 because there are no new unigrams (because of the if sys_total > 0: checks in compute_precision_recall_f1()) but there are technically new bigrams, trigrams, and 4-grams around the location of the deleted word (["a japanese", "a japanese football", "is a japanese"], etc.).

I am just curious of whether this is the expected behaviour or if a definitive 0.00 or 100.0 result for the add-score would be more desirable?

Thanks in advance for any insight.

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions