Skip to content

Allow sample_weight / class_weight to be applied to metrics #7482

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from

Conversation

nicolewhite
Copy link
Contributor

@nicolewhite nicolewhite commented Jul 31, 2017

I noticed a lot of issues related to the sample weights not being applied to metrics.

There is currently not an easy way to build a custom metric to achieve this, as a custom metric can only accept y_true and y_pred. It is probably possible with a callback metric but that seems like a lot of work for something that is in popular demand and should be supported as a first-class option. I noticed this was attempted in #4335, but was abandoned. I also noticed that the setting was per-metric in that PR, whereas I am proposing the setting at the compile() level such that its application is consistent between fit() and evaluate(). Example usage:

from keras.models import Sequential
from keras.layers import Dense

import numpy as np

X = np.random.normal(size=(100, 10))
y = np.random.randint(2, size=100)
sample_weight = np.random.normal(size=100)
loss = 'binary_crossentropy'

model = Sequential()
model.add(Dense(10, input_shape=(10, )))
model.add(Dense(1, activation='sigmoid'))

model.compile(loss=loss, optimizer='rmsprop', metrics=[loss], weight_metrics=False)

model.fit(X, y, epochs=5, verbose=2, sample_weight=sample_weight)
Epoch 1/5
0s - loss: -2.1792e-02 - binary_crossentropy: 0.9223
Epoch 2/5
0s - loss: -3.2389e-02 - binary_crossentropy: 0.9294
Epoch 3/5
0s - loss: -3.5780e-02 - binary_crossentropy: 0.9285
Epoch 4/5
0s - loss: -3.6447e-02 - binary_crossentropy: 0.9311
Epoch 5/5
0s - loss: -3.8856e-02 - binary_crossentropy: 0.9264

Now with weight_metrics=True:

model.compile(loss=loss, optimizer='rmsprop', metrics=[loss], weight_metrics=True)

model.fit(X, y, epochs=5, verbose=2, sample_weight=sample_weight)
Epoch 1/5
0s - loss: -3.7481e-02 - binary_crossentropy: -3.7481e-02
Epoch 2/5
0s - loss: -4.5657e-02 - binary_crossentropy: -4.5657e-02
Epoch 3/5
0s - loss: -5.1353e-02 - binary_crossentropy: -5.1353e-02
Epoch 4/5
0s - loss: -5.3839e-02 - binary_crossentropy: -5.3839e-02
Epoch 5/5
0s - loss: -5.6900e-02 - binary_crossentropy: -5.6900e-02

I think the consistency here is nice. Additionally, if sample_weight is passed to evaluate() and weight_metrics=True in compile(), any metrics will also be weighted.

model.evaluate(X, y, verbose=2, sample_weight=sample_weight)
[-0.084627518355846407, -0.084627518355846407]

@@ -604,7 +564,7 @@ class Model(Container):
"""

def compile(self, optimizer, loss, metrics=None, loss_weights=None,
sample_weight_mode=None, **kwargs):
sample_weight_mode=None, weight_metrics=False, **kwargs):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't it be weigh_metrics?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that is a better word choice. Fixed.

@fchollet
Copy link
Collaborator

fchollet commented Aug 1, 2017

The problem I have with this feature is that sample weights only exist as a way to modulate gradient contributions for different samples (or classes) during training. It's a training modulator, like gradient clipping, for instance. It isn't supported at all in inference mode. The purpose of per-sample gradient weighting during training is to get better unweighted metrics.

I understand that some people want to modulate their metrics as well. That sounds like a different feature altogether though.

@nicolewhite
Copy link
Contributor Author

sample weights only exist as a way to modulate gradient contributions for different samples (or classes) during training

I think this is one use case for sample weights (like severe class imbalance), whereas another is to indicate that you care more about some samples than others because of <insert business reason>. In this case, you are sometimes interested in optimizing a weighted metric. For example, the context in which I encountered this was in a classification problem where each positive sample has some attached monetary value. It is more important to me that a positive sample with a high monetary value is classified correctly than a positive sample with a low monetary value. In this scenario, it seems to me that it is appropriate to both use the sample weight, which is a function of this monetary value, in both gradient contributions and metric contributions. What do you think?

I understand that some people want to modulate their metrics as well. That sounds like a different feature altogether though.

Can you elaborate on why this is a different feature? I feel like this PR is that feature!

@fchollet
Copy link
Collaborator

fchollet commented Aug 2, 2017

Can you elaborate on why this is a different feature? I feel like this PR is that feature!

I meant that if it's a different feature (as opposed to a natural generalization of an existing feature) it would require a new API keyword, in order to prevent user confusion (although they are already confused, so I guess it's too late).

I agree that there seems to be enough user demand to warrant this feature. I am okay with merging it. I also agree that reusing the sample_weight argument is the sensible thing to do. Given these assumptions, it follows that the switch between the two behaviors should be a compile argument (like sample_weight_mode).

Remains to ponder whether weigh_metrics is the best possible argument name. This would become permanently part of the public API, so it's worth thinking about it twice. What would be some other options?

@nicolewhite
Copy link
Contributor Author

I am not sure I can think of a better term that is not overly verbose / confusing. A couple other options are:

  1. Pass in separate weights for the metric weights as metric_sample_weight. But then you'll end up with people mostly just passing in the same value twice.
model.fit(..., sample_weight=sample_weight, metric_sample_weight=sample_weight)

You'd also have to add metric_class_weight which is not ideal.

  1. Allow for a list of metrics that will be weighted with weighted_metrics.
model.compile(..., weighted_metrics=['accuracy'])

This is somewhat appealing because you could track both weighted and unweighted metrics easily.

model.compile(..., metrics=['accuracy'], weighted_metrics=['accuracy'])

@nicolewhite
Copy link
Contributor Author

Do you know why the tests are failing? Seems unrelated. They were passing prior to the weight_metrics -> weigh_metrics change.

@fchollet
Copy link
Collaborator

fchollet commented Aug 4, 2017

Looks unrelated. Re-running tests.

@fchollet fchollet closed this Aug 4, 2017
@fchollet fchollet reopened this Aug 4, 2017
@fchollet
Copy link
Collaborator

fchollet commented Aug 4, 2017

The following API

model.compile(..., weighted_metrics=['accuracy'])

could be a good solution. But the initial proposal is likely to be more user-friendly. Still unsure at this point...

@nicolewhite nicolewhite deleted the sample-weights branch August 14, 2017 19:54
@mitkeyastromouse
Copy link

It's cool that "weighted_metrics" are now supported. However, as opposed to plain "metrics", they don't seem to be saved into (nor loaded from) checkpoints, which means I need to re-compile (hopefully that works!).

A propos the discussion regarding the use of the feature: I use sample_weights to mark invalid samples in my targets, or more generally, to specify confidence that the given target value is correct.

@trianta2
Copy link

Out of curiosity, why not pass in the weights into the custom metric or loss function?

I'm not too familiar with the keras codebase, but it seems that here (as of writing at least) we should be able to introspect on the metric function, and see how many required positional arguments it takes.

For backwards compatibility, we could infer proper usage, e.g.:

2 args --> fn(y_true, y_pred)
3 args --> fn(y_true, y_pred, weights)
4 args --> fn(y_true, y_pred, weights, mask)

If a 2 arg function is provided and weighting is specified, we could default to the current logic.

My use case is that I want to normalize my weights and reduce my score array in a different manner.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants