-
Notifications
You must be signed in to change notification settings - Fork 405
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unify input order of ROUGEScore and BERTScore with other NLG metrics #687
Conversation
Codecov Report
@@ Coverage Diff @@
## master #687 +/- ##
=====================================
- Coverage 95% 95% -0%
=====================================
Files 166 166
Lines 6413 6413
=====================================
- Hits 6105 6103 -2
- Misses 308 310 +2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
_inputs_error_rate_batch_size_1 = Input(**ERROR_RATES_BATCHES_1) | ||
|
||
_inputs_error_rate_batch_size_2 = Input(**ERROR_RATES_BATCHES_2) | ||
|
||
_inputs_multiple_sentences_multiple_reference = Input(**ARTICLES_INPUT) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, there's a single reference for a given hypothesis.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Should I call it _inputs_multiple_sentences_single_reference
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possibly we can. Or maybe we can leave the references in the test file for now. We aim to adjust BERTScore
in a way to handle multiple references #647 (similar updates as you made for ROUGEScore
) so we can eventually use already defined _inputs_multiple_references
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. For now, I'll keep it as is in the current PR. We can rename it when issue #647 is completed.
I additionally had concerns that we should standardize naming conventions for preds (in some places hypothesis) and targets (in some places references) in the entirety of NLG metrics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@SkafteNicki ^^
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO we should go with predictions and targets everywhere, since this is then more consistent with metrics in other domains.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agree here :]
Co-authored-by: Daniel Stancl <46073029+stancld@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Not familiar with these metrics in particular, but I assume this change can be done because metric(pred, target) = metric(target, pred)
Symmetricity might not be true completely. Basically, the |
Co-authored-by: Daniel Stancl <46073029+stancld@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Major concerns on backwards compatibility here. From the logic itself it's fine though
_inputs_error_rate_batch_size_1 = Input(**ERROR_RATES_BATCHES_1) | ||
|
||
_inputs_error_rate_batch_size_2 = Input(**ERROR_RATES_BATCHES_2) | ||
|
||
_inputs_multiple_sentences_multiple_reference = Input(**ARTICLES_INPUT) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO we should go with predictions and targets everywhere, since this is then more consistent with metrics in other domains.
references: Union[List[str], Dict[str, Tensor]], | ||
predictions: Union[List[str], Dict[str, Tensor]], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a breaking change. Not sure if we can do it that easily
targets: Union[str, Sequence[str], Sequence[Sequence[str]]], | ||
preds: Union[str, Sequence[str]], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same concerns about breaking change
@@ -192,15 +192,15 @@ def __init__( | |||
self.tokenizer = AutoTokenizer.from_pretrained(model_name_or_path) | |||
self.user_tokenizer = False | |||
|
|||
def update(self, predictions: List[str], references: List[str]) -> None: # type: ignore | |||
def update(self, references: List[str], predictions: List[str]) -> None: # type: ignore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also a BC
@@ -126,7 +126,7 @@ def __init__( | |||
self.add_state(f"{rouge_key}_{score}", [], dist_reduce_fx=None) | |||
|
|||
def update( # type: ignore | |||
self, preds: Union[str, Sequence[str]], targets: Union[str, Sequence[str], Sequence[Sequence[str]]] | |||
self, targets: Union[str, Sequence[str], Sequence[Sequence[str]]], preds: Union[str, Sequence[str]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also a BC
Changes are not backward compatible as @justusschock mentions. Maybe worth inserting a warning in the import warnings
warnings.warn("Input order of preds and targets were changed to target firsts and predictions second in v0.7. Warning will be removed in v0.8") |
Since the requirements of this PR have been redesigned, it would require complete rework. I'll submit a new PR soon. Closing this one. |
What does this PR do?
Fixes #686
Before submitting
PR review
Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.
Did you have fun?
Make sure you had fun coding 🙃
Yes