-
Notifications
You must be signed in to change notification settings - Fork 19.6k
Allow sample_weight / class_weight to be applied to metrics #7482
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
keras/engine/training.py
Outdated
@@ -604,7 +564,7 @@ class Model(Container): | |||
""" | |||
|
|||
def compile(self, optimizer, loss, metrics=None, loss_weights=None, | |||
sample_weight_mode=None, **kwargs): | |||
sample_weight_mode=None, weight_metrics=False, **kwargs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't it be weigh_metrics
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah that is a better word choice. Fixed.
c3d438f
to
1ea33b9
Compare
The problem I have with this feature is that sample weights only exist as a way to modulate gradient contributions for different samples (or classes) during training. It's a training modulator, like gradient clipping, for instance. It isn't supported at all in inference mode. The purpose of per-sample gradient weighting during training is to get better unweighted metrics. I understand that some people want to modulate their metrics as well. That sounds like a different feature altogether though. |
I think this is one use case for sample weights (like severe class imbalance), whereas another is to indicate that you care more about some samples than others because of <insert business reason>. In this case, you are sometimes interested in optimizing a weighted metric. For example, the context in which I encountered this was in a classification problem where each positive sample has some attached monetary value. It is more important to me that a positive sample with a high monetary value is classified correctly than a positive sample with a low monetary value. In this scenario, it seems to me that it is appropriate to both use the sample weight, which is a function of this monetary value, in both gradient contributions and metric contributions. What do you think?
Can you elaborate on why this is a different feature? I feel like this PR is that feature! |
I meant that if it's a different feature (as opposed to a natural generalization of an existing feature) it would require a new API keyword, in order to prevent user confusion (although they are already confused, so I guess it's too late). I agree that there seems to be enough user demand to warrant this feature. I am okay with merging it. I also agree that reusing the Remains to ponder whether |
I am not sure I can think of a better term that is not overly verbose / confusing. A couple other options are:
model.fit(..., sample_weight=sample_weight, metric_sample_weight=sample_weight) You'd also have to add
model.compile(..., weighted_metrics=['accuracy']) This is somewhat appealing because you could track both weighted and unweighted metrics easily. model.compile(..., metrics=['accuracy'], weighted_metrics=['accuracy']) |
Do you know why the tests are failing? Seems unrelated. They were passing prior to the |
Looks unrelated. Re-running tests. |
The following API model.compile(..., weighted_metrics=['accuracy']) could be a good solution. But the initial proposal is likely to be more user-friendly. Still unsure at this point... |
It's cool that "weighted_metrics" are now supported. However, as opposed to plain "metrics", they don't seem to be saved into (nor loaded from) checkpoints, which means I need to re-compile (hopefully that works!). A propos the discussion regarding the use of the feature: I use sample_weights to mark invalid samples in my targets, or more generally, to specify confidence that the given target value is correct. |
Out of curiosity, why not pass in the weights into the custom metric or loss function? I'm not too familiar with the keras codebase, but it seems that here (as of writing at least) we should be able to introspect on the metric function, and see how many required positional arguments it takes. For backwards compatibility, we could infer proper usage, e.g.:
If a 2 arg function is provided and weighting is specified, we could default to the current logic. My use case is that I want to normalize my weights and reduce my score array in a different manner. |
I noticed a lot of issues related to the sample weights not being applied to metrics.
There is currently not an easy way to build a custom metric to achieve this, as a custom metric can only accept
y_true
andy_pred
. It is probably possible with a callback metric but that seems like a lot of work for something that is in popular demand and should be supported as a first-class option. I noticed this was attempted in #4335, but was abandoned. I also noticed that the setting was per-metric in that PR, whereas I am proposing the setting at thecompile()
level such that its application is consistent betweenfit()
andevaluate()
. Example usage:Now with
weight_metrics=True
:I think the consistency here is nice. Additionally, if
sample_weight
is passed toevaluate()
andweight_metrics=True
incompile()
, any metrics will also be weighted.