Unify input order of ROUGEScore and BERTScore with other NLG metrics #687

ashutoshml · 2021-12-18T12:39:21Z

What does this PR do?

Fixes #686

Before submitting

Was this discussed/approved via a Github issue? (no need for typos and docs improvements)
Did you read the contributor guideline, Pull Request section?
Did you make sure to update the docs?
Did you write any new necessary tests?

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

Did you have fun?

Make sure you had fun coding 🙃
Yes

…G metrics

codecov · 2021-12-18T12:42:31Z

Codecov Report

Merging #687 (92c637a) into master (293af54) will decrease coverage by 0%.
The diff coverage is 100%.

@@          Coverage Diff          @@
##           master   #687   +/-   ##
=====================================
- Coverage      95%    95%   -0%     
=====================================
  Files         166    166           
  Lines        6413   6413           
=====================================
- Hits         6105   6103    -2     
- Misses        308    310    +2

stancld

LGTM!

tests/text/test_bertscore.py

torchmetrics/text/bert.py

stancld · 2021-12-18T15:17:49Z

tests/text/inputs.py

 _inputs_error_rate_batch_size_1 = Input(**ERROR_RATES_BATCHES_1)

 _inputs_error_rate_batch_size_2 = Input(**ERROR_RATES_BATCHES_2)
+
+_inputs_multiple_sentences_multiple_reference = Input(**ARTICLES_INPUT)


Actually, there's a single reference for a given hypothesis.

Yes. Should I call it _inputs_multiple_sentences_single_reference?

Possibly we can. Or maybe we can leave the references in the test file for now. We aim to adjust BERTScore in a way to handle multiple references #647 (similar updates as you made for ROUGEScore) so we can eventually use already defined _inputs_multiple_references.

Right. For now, I'll keep it as is in the current PR. We can rename it when issue #647 is completed.

I additionally had concerns that we should standardize naming conventions for preds (in some places hypothesis) and targets (in some places references) in the entirety of NLG metrics.

@SkafteNicki ^^

IMO we should go with predictions and targets everywhere, since this is then more consistent with metrics in other domains.

agree here :]

Co-authored-by: Daniel Stancl <46073029+stancld@users.noreply.github.com>

awaelchli

LGTM. Not familiar with these metrics in particular, but I assume this change can be done because metric(pred, target) = metric(target, pred)

ashutoshml · 2021-12-19T15:26:46Z

LGTM. Not familiar with these metrics in particular, but I assume this change can be done because metric(pred, target) = metric(target, pred)

Symmetricity might not be true completely. Basically, the precision and recall values flip in ROUGEScore calculations and in BERTScore sending X [SEP] Y results in a different score than Y [SEP] X. Also since ROUGEScore now allows for multi-reference inputs, the API will throw an error if we interchange preds with targets. Multi-reference is also being planned for BERTScore.

Co-authored-by: Daniel Stancl <46073029+stancld@users.noreply.github.com>

justusschock

Major concerns on backwards compatibility here. From the logic itself it's fine though

justusschock · 2021-12-22T10:15:11Z

tests/text/inputs.py

 _inputs_error_rate_batch_size_1 = Input(**ERROR_RATES_BATCHES_1)

 _inputs_error_rate_batch_size_2 = Input(**ERROR_RATES_BATCHES_2)
+
+_inputs_multiple_sentences_multiple_reference = Input(**ARTICLES_INPUT)


IMO we should go with predictions and targets everywhere, since this is then more consistent with metrics in other domains.

justusschock · 2021-12-22T10:16:05Z

torchmetrics/functional/text/bert.py

    references: Union[List[str], Dict[str, Tensor]],
+    predictions: Union[List[str], Dict[str, Tensor]],


this is a breaking change. Not sure if we can do it that easily

justusschock · 2021-12-22T10:16:33Z

torchmetrics/functional/text/rouge.py

    targets: Union[str, Sequence[str], Sequence[Sequence[str]]],
+    preds: Union[str, Sequence[str]],


same concerns about breaking change

justusschock · 2021-12-22T10:17:40Z

torchmetrics/text/bert.py

@@ -192,15 +192,15 @@ def __init__(
            self.tokenizer = AutoTokenizer.from_pretrained(model_name_or_path)
            self.user_tokenizer = False

-    def update(self, predictions: List[str], references: List[str]) -> None:  # type: ignore
+    def update(self, references: List[str], predictions: List[str]) -> None:  # type: ignore


justusschock · 2021-12-22T10:18:10Z

torchmetrics/text/rouge.py

@@ -126,7 +126,7 @@ def __init__(
                self.add_state(f"{rouge_key}_{score}", [], dist_reduce_fx=None)

    def update(  # type: ignore
-        self, preds: Union[str, Sequence[str]], targets: Union[str, Sequence[str], Sequence[Sequence[str]]]
+        self, targets: Union[str, Sequence[str], Sequence[Sequence[str]]], preds: Union[str, Sequence[str]]


SkafteNicki · 2021-12-22T18:30:21Z

Changes are not backward compatible as @justusschock mentions. Maybe worth inserting a warning in the __init__ of the class and the functional version:

import warnings
warnings.warn("Input order of preds and targets were changed to target firsts and predictions second in v0.7. Warning will be removed in v0.8")

ashutoshml · 2021-12-23T11:31:57Z

Since the requirements of this PR have been redesigned, it would require complete rework. I'll submit a new PR soon. Closing this one.

ashutoshml added 3 commits December 18, 2021 17:51

Make input order of ROUGEScore and BERTScore consistent with other NL…

50f1d93

…G metrics

Move comment from test_bertscore to common test inputs

d94aa36

Update CHANGELOG.md

c59ac49

ashutoshml requested review from ananyahjha93, Borda, ethanwharris, justusschock, SeanNaren, SkafteNicki and tchaton as code owners December 18, 2021 12:39

stancld reviewed Dec 18, 2021

View reviewed changes

Apply suggestions from code review

0db755c

Co-authored-by: Daniel Stancl <46073029+stancld@users.noreply.github.com>

Borda added API / design refactoring refactoring and code health labels Dec 18, 2021

Borda added this to the v0.7 milestone Dec 18, 2021

awaelchli reviewed Dec 19, 2021

View reviewed changes

maximsch2 and others added 4 commits December 21, 2021 13:08

Update paper.md (Lightning-AI#690)

938ceee

ci: rename oldest

67f78ab

Merge branch 'master' into orderfixation

d02ea76

CI: set HF caching (Lightning-AI#691)

63d0d75

Co-authored-by: Daniel Stancl <46073029+stancld@users.noreply.github.com>

justusschock requested changes Dec 22, 2021

View reviewed changes

Borda requested review from awaelchli and stancld December 22, 2021 10:25

Merge branch 'master' into orderfixation

7ddc088

Borda force-pushed the master branch from 63d0d75 to 0135327 Compare December 22, 2021 11:58

ashutoshml requested a review from williamFalcon as a code owner December 22, 2021 11:58

mergify bot added the has conflicts label Dec 22, 2021

Merge branch 'master' into orderfixation

92c637a

mergify bot removed the has conflicts label Dec 22, 2021

This was referenced Dec 23, 2021

Unify the input order for text (NLG) metrics #686

Closed

add Extended Edit Distance (EED) metric #668

Merged

ashutoshml closed this Dec 23, 2021

ashutoshml deleted the orderfixation branch December 24, 2021 07:44

Borda added the topic: Text label Aug 25, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unify input order of ROUGEScore and BERTScore with other NLG metrics #687

Unify input order of ROUGEScore and BERTScore with other NLG metrics #687

ashutoshml commented Dec 18, 2021 •

edited

Loading

codecov bot commented Dec 18, 2021 •

edited

Loading

stancld left a comment

stancld Dec 18, 2021

ashutoshml Dec 18, 2021

stancld Dec 18, 2021

ashutoshml Dec 18, 2021

stancld Dec 18, 2021

justusschock Dec 22, 2021

Borda Dec 22, 2021

awaelchli left a comment •

edited

Loading

ashutoshml commented Dec 19, 2021 •

edited

Loading

justusschock left a comment

justusschock Dec 22, 2021

justusschock Dec 22, 2021

justusschock Dec 22, 2021

justusschock Dec 22, 2021

justusschock Dec 22, 2021

SkafteNicki commented Dec 22, 2021

ashutoshml commented Dec 23, 2021

		references: Union[List[str], Dict[str, Tensor]],
		predictions: Union[List[str], Dict[str, Tensor]],

		targets: Union[str, Sequence[str], Sequence[Sequence[str]]],
		preds: Union[str, Sequence[str]],

Unify input order of ROUGEScore and BERTScore with other NLG metrics #687

Unify input order of ROUGEScore and BERTScore with other NLG metrics #687

Conversation

ashutoshml commented Dec 18, 2021 • edited Loading

What does this PR do?

Before submitting

PR review

Did you have fun?

codecov bot commented Dec 18, 2021 • edited Loading

Codecov Report

stancld left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

awaelchli left a comment • edited Loading

Choose a reason for hiding this comment

ashutoshml commented Dec 19, 2021 • edited Loading

justusschock left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SkafteNicki commented Dec 22, 2021

ashutoshml commented Dec 23, 2021

ashutoshml commented Dec 18, 2021 •

edited

Loading

codecov bot commented Dec 18, 2021 •

edited

Loading

awaelchli left a comment •

edited

Loading

ashutoshml commented Dec 19, 2021 •

edited

Loading