Fix bugs with `Mean`, `Accuracy` and `BinaryAccuracy` metrics. #19847

hertschuh · 2024-06-12T22:24:49Z

reduce_to_samplewise_values would not reduce sample_weights correctly because the number of dimensions of values was checked.
reduce_to_samplewise_values needs to explicitely broadcast sample_weights. Before, it was implicitly broadcast in the multiplication with values. However, the explicit broadcast is needed for the computation of num_samples for the averaging to be correct. This causes a bug when sample_weights is of rank 2 or more and a broadcast happens when doing the multiplication. This logic existed in tf_keras: https://github.com/keras-team/tf-keras/blob/master/tf_keras/metrics/base_metric.py#L508
Accuracy and BinaryAccuracy were doing a mean reduction too early, before multiplying by sample_weights. This matters when the rank of sample_weights is the same as y_true and y_pred.

codecov-commenter · 2024-06-12T22:31:04Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 73.92%. Comparing base (4551644) to head (604094c).
Report is 1 commits behind head on master.

Additional details and impacted files

@@            Coverage Diff             @@
##           master   #19847      +/-   ##
==========================================
- Coverage   78.85%   73.92%   -4.93%     
==========================================
  Files         498      498              
  Lines       45853    45855       +2     
  Branches     8449     8450       +1     
==========================================
- Hits        36156    33899    -2257     
- Misses       7995    10312    +2317     
+ Partials     1702     1644      -58

Flag	Coverage Δ
keras	`73.81% <100.00%> (-4.90%)`	⬇️
keras-jax	`?`
keras-numpy	`56.63% <100.00%> (-0.01%)`	⬇️
keras-tensorflow	`63.68% <100.00%> (-0.01%)`	⬇️
keras-torch	`62.36% <100.00%> (-0.01%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

fchollet

Thanks for the PR!

fchollet · 2024-06-12T23:51:19Z

keras/src/metrics/accuracy_metrics.py

-        ops.cast(ops.equal(y_true, y_pred), dtype=backend.floatx()),
-        axis=-1,
-    )
+    return ops.cast(ops.equal(y_true, y_pred), dtype=backend.floatx())


accuracy and binary_accuracy should reduce on the last dimension at this stage (unless the input has rank 1, in which case the last dimension is the batch dimension). All metrics functions do this.

So if your inputs have rank 2, then the function will return one accuracy scalar score per entry in the batch (e.g. (32, 4) -> (32,))

Yes, the reduction is applied in a generic way here (if weights have a smaller rank):
https://github.com/keras-team/keras/blob/master/keras/src/metrics/reduction_metrics.py#L29-L31
and/or here:
https://github.com/keras-team/keras/blob/master/keras/src/metrics/reduction_metrics.py#L39-L41
So it does work as expected for rank 2 and above.

Let me make sure it works as expected for rank 1 ys and add unit tests.

The issue is that we don't prevent weights that are the same rank as y_true and y_pred. What it was doing before was multiplying the average of the equality by the average of the weights. Now it's averaging the equality multiplied by the weights.

The full sequence is now:

compute equality (called values)

tweak values and sample_weights to get to the same rank, which can involve a mean on the last few dimensions of values

multiply values by sample_weights

mean reduction of values and sample_weights to rank 1

sum values (total) and sample_weights (num_samples)

divide total by num_samples.

Before, there was an extra mean between 1. and 2. It did not matter if rank(sample_weights) < rank(values) because step 2. does a mean to lower the rank of values.

Note that it now follows the same pattern as tf_keras where the accuracy function is not the one doing the reduction: https://github.com/keras-team/tf-keras/blob/master/tf_keras/metrics/accuracy_metrics.py#L354-L364
but the reduction is done generically in the Reduce class: https://github.com/keras-team/tf-keras/blob/master/tf_keras/metrics/base_metric.py#L520-L522 and https://github.com/keras-team/tf-keras/blob/master/tf_keras/metrics/base_metric.py#L525

- `reduce_to_samplewise_values` would not reduce `sample_weights` correctly because the number of dimensions of `values` was checked. - `reduce_to_samplewise_values` needs to explicitely broadcast `sample_weights`. Before, it was implicitly broadcast in the multiplication with `values`. However, the explicit broadcast is needed for the computation of `num_samples` for the averaging to be correct. This causes a bug when `sample_weights` is of rank 2 or more and a broadcast happens when doing the multiplication. This logic existed in `tf_keras`: https://github.com/keras-team/tf-keras/blob/master/tf_keras/metrics/base_metric.py#L508 - `Accuracy` and `BinaryAccuracy` were doing a mean reduction too early, before multiplying by `sample_weights`. This matters when the rank of `sample_weights` is the same as `y_true` and `y_pred`.

fchollet

LGTM, thanks for the explanation!

Fix `LayerNormalization.get_config` (keras-team#19807) Propagate kwargs through `keras.ops.isclose` (keras-team#19782) * propagate kwargs through isclose this allows passing atol and rtol * switch isclose **kwargs to explicit kwargs * reduce line lengths * fix ops.isclose signature * fix ops.IsClose compute_output_spec signature * implement isclose rtol atol equal_nan args for all backends * shorten line lengths again * revert using tf.experimental.numpy.isclose tensorflow version now uses code inspired from tf.experimental.numpy.isclose * fix lint * add docs for new parameters Faster in_top_k implementation for Jax backend (keras-team#19814) * Faster in_top_k implementation. * Fix bug in rank computation. Fix CI Fix TypeError in `Lambda.from_config` (keras-team#19827) fixing dmtree.is_nested() and parameterized tree test (keras-team#19822) Fix `keras.ops.repeat` cannot return an expected shape when `x` is a … (keras-team#19826) * Fix `keras.ops.repeat` cannot return an expected shape when `x` is a `KerasTensor` and the `axis` is `None` * Test dynamic is still dynamic after repetition * Improve error messages `Metric.variables` is now recursive. (keras-team#19830) This allows it to surface variables from metrics nested at any depth. Previously, metrics within metrics within metrics would not have their variables tracked in JAX, causing them to not be updated. Fix `get_file` when the HTTP response has no `Content-Length` header (keras-team#19833) Add `ops.switch` (keras-team#19834) * Add `ops.switch` * Update tests * Fix out-of-bound issue * Revert `torch.cond` Use `absl.testing.parameterized` for `tree_test.py`. (keras-team#19842) For consistency, use `absl.testing.parameterized` instead of `parameterized` for `tree_test.py` since that is used for all other tests. It's one less dependency. It also says `optree` or `dmtree` in each test name. Make batch norm mask shape error more descriptive (keras-team#19829) * Made batch norm mask shape error more descriptive * Added shape info in mask error message to help with degugging Fix code style doc: `ops.slice` (keras-team#19843) corrected the example code in unit_normalization.py (keras-team#19845) Added missing closing bracket and exact output value in example code after replicating the code. Adjust code example Add `training` argument to `Model.compute_loss()`. (keras-team#19840) This allows models to perform different computations during training and evaluation. For instance, some expensive to compute metrics can be skipped during training and only computed during evaluation. Note that backwards compatibility with overrides that do not have the `training` argument is maintained. Fix the compatibility issues of `Orthogonal` and `GRU` (keras-team#19844) * Add legacy `Orthogonal` class name * Add legacy `implementation` arg to `GRU` Fix inconsistent behavior of `losses.sparse_categorical_crossentropy`… (keras-team#19838) * Fix inconsistent behavior of `losses.sparse_categorical_crossentropy` with and without `ignore_class` * Test * chore(format) * Fix tests in `losses` Fix bugs with `Mean`, `Accuracy` and `BinaryAccuracy` metrics. (keras-team#19847) - `reduce_to_samplewise_values` would not reduce `sample_weights` correctly because the number of dimensions of `values` was checked. - `reduce_to_samplewise_values` needs to explicitely broadcast `sample_weights`. Before, it was implicitly broadcast in the multiplication with `values`. However, the explicit broadcast is needed for the computation of `num_samples` for the averaging to be correct. This causes a bug when `sample_weights` is of rank 2 or more and a broadcast happens when doing the multiplication. This logic existed in `tf_keras`: https://github.com/keras-team/tf-keras/blob/master/tf_keras/metrics/base_metric.py#L508 - `Accuracy` and `BinaryAccuracy` were doing a mean reduction too early, before multiplying by `sample_weights`. This matters when the rank of `sample_weights` is the same as `y_true` and `y_pred`. Add tests for `DTypePolicyMap` Fix test Update the logic of `default_policy` Improve serialization of `DTypePolicyMap` Improve `__repr__` and `__eq__` Add `custom_gradient` for the numpy backend (keras-team#19849) fix variable name when add in init function (keras-team#19853) Address comments

Introduce `DTypePolicyMap` Fix `LayerNormalization.get_config` (keras-team#19807) Propagate kwargs through `keras.ops.isclose` (keras-team#19782) * propagate kwargs through isclose this allows passing atol and rtol * switch isclose **kwargs to explicit kwargs * reduce line lengths * fix ops.isclose signature * fix ops.IsClose compute_output_spec signature * implement isclose rtol atol equal_nan args for all backends * shorten line lengths again * revert using tf.experimental.numpy.isclose tensorflow version now uses code inspired from tf.experimental.numpy.isclose * fix lint * add docs for new parameters Faster in_top_k implementation for Jax backend (keras-team#19814) * Faster in_top_k implementation. * Fix bug in rank computation. Fix CI Fix TypeError in `Lambda.from_config` (keras-team#19827) fixing dmtree.is_nested() and parameterized tree test (keras-team#19822) Fix `keras.ops.repeat` cannot return an expected shape when `x` is a … (keras-team#19826) * Fix `keras.ops.repeat` cannot return an expected shape when `x` is a `KerasTensor` and the `axis` is `None` * Test dynamic is still dynamic after repetition * Improve error messages `Metric.variables` is now recursive. (keras-team#19830) This allows it to surface variables from metrics nested at any depth. Previously, metrics within metrics within metrics would not have their variables tracked in JAX, causing them to not be updated. Fix `get_file` when the HTTP response has no `Content-Length` header (keras-team#19833) Add `ops.switch` (keras-team#19834) * Add `ops.switch` * Update tests * Fix out-of-bound issue * Revert `torch.cond` Use `absl.testing.parameterized` for `tree_test.py`. (keras-team#19842) For consistency, use `absl.testing.parameterized` instead of `parameterized` for `tree_test.py` since that is used for all other tests. It's one less dependency. It also says `optree` or `dmtree` in each test name. Make batch norm mask shape error more descriptive (keras-team#19829) * Made batch norm mask shape error more descriptive * Added shape info in mask error message to help with degugging Fix code style doc: `ops.slice` (keras-team#19843) corrected the example code in unit_normalization.py (keras-team#19845) Added missing closing bracket and exact output value in example code after replicating the code. Adjust code example Add `training` argument to `Model.compute_loss()`. (keras-team#19840) This allows models to perform different computations during training and evaluation. For instance, some expensive to compute metrics can be skipped during training and only computed during evaluation. Note that backwards compatibility with overrides that do not have the `training` argument is maintained. Fix the compatibility issues of `Orthogonal` and `GRU` (keras-team#19844) * Add legacy `Orthogonal` class name * Add legacy `implementation` arg to `GRU` Fix inconsistent behavior of `losses.sparse_categorical_crossentropy`… (keras-team#19838) * Fix inconsistent behavior of `losses.sparse_categorical_crossentropy` with and without `ignore_class` * Test * chore(format) * Fix tests in `losses` Fix bugs with `Mean`, `Accuracy` and `BinaryAccuracy` metrics. (keras-team#19847) - `reduce_to_samplewise_values` would not reduce `sample_weights` correctly because the number of dimensions of `values` was checked. - `reduce_to_samplewise_values` needs to explicitely broadcast `sample_weights`. Before, it was implicitly broadcast in the multiplication with `values`. However, the explicit broadcast is needed for the computation of `num_samples` for the averaging to be correct. This causes a bug when `sample_weights` is of rank 2 or more and a broadcast happens when doing the multiplication. This logic existed in `tf_keras`: https://github.com/keras-team/tf-keras/blob/master/tf_keras/metrics/base_metric.py#L508 - `Accuracy` and `BinaryAccuracy` were doing a mean reduction too early, before multiplying by `sample_weights`. This matters when the rank of `sample_weights` is the same as `y_true` and `y_pred`. Add tests for `DTypePolicyMap` Fix test Update the logic of `default_policy` Improve serialization of `DTypePolicyMap` Improve `__repr__` and `__eq__` Add `custom_gradient` for the numpy backend (keras-team#19849) fix variable name when add in init function (keras-team#19853) Address comments Update docstrings

google-ml-butler bot added the size:M label Jun 12, 2024

google-ml-butler bot assigned gbaned Jun 12, 2024

hertschuh requested a review from fchollet June 12, 2024 22:42

google-ml-butler bot added the awaiting review label Jun 12, 2024

fchollet reviewed Jun 12, 2024

View reviewed changes

hertschuh force-pushed the metrics_reduction branch from cc3a9c5 to 604094c Compare June 13, 2024 00:43

fchollet approved these changes Jun 13, 2024

View reviewed changes

google-ml-butler bot added kokoro:force-run ready to pull Ready to be merged into the codebase labels Jun 13, 2024

kokoro-team removed the kokoro:force-run label Jun 13, 2024

fchollet merged commit a4e8554 into keras-team:master Jun 13, 2024

google-ml-butler bot removed awaiting review ready to pull Ready to be merged into the codebase labels Jun 13, 2024

hertschuh deleted the metrics_reduction branch June 13, 2024 17:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix bugs with `Mean`, `Accuracy` and `BinaryAccuracy` metrics. #19847

Fix bugs with `Mean`, `Accuracy` and `BinaryAccuracy` metrics. #19847

Uh oh!

hertschuh commented Jun 12, 2024

Uh oh!

codecov-commenter commented Jun 12, 2024 •

edited

Loading

Uh oh!

fchollet left a comment

Uh oh!

fchollet Jun 12, 2024 •

edited

Loading

Uh oh!

hertschuh Jun 13, 2024

Uh oh!

fchollet left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Fix bugs with Mean, Accuracy and BinaryAccuracy metrics. #19847

Fix bugs with Mean, Accuracy and BinaryAccuracy metrics. #19847

Uh oh!

Conversation

hertschuh commented Jun 12, 2024

Uh oh!

codecov-commenter commented Jun 12, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

fchollet left a comment

Choose a reason for hiding this comment

Uh oh!

fchollet Jun 12, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hertschuh Jun 13, 2024

Choose a reason for hiding this comment

Uh oh!

fchollet left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Fix bugs with `Mean`, `Accuracy` and `BinaryAccuracy` metrics. #19847

Fix bugs with `Mean`, `Accuracy` and `BinaryAccuracy` metrics. #19847

codecov-commenter commented Jun 12, 2024 •

edited

Loading

fchollet Jun 12, 2024 •

edited

Loading