Skip to content

fix: extend zero_division parameter to percentage and range-based metrics#3122

Open
mahimn01 wants to merge 1 commit into
unit8co:masterfrom
mahimn01:fix/metrics-zero-division-pct
Open

fix: extend zero_division parameter to percentage and range-based metrics#3122
mahimn01 wants to merge 1 commit into
unit8co:masterfrom
mahimn01:fix/metrics-zero-division-pct

Conversation

@mahimn01

Copy link
Copy Markdown

Checklist before merging this PR:

  • Mentioned all issues that this PR fixes or addresses.
  • Summarized the updates of this PR under Summary.
  • Added an entry under Unreleased in the Changelog.

Mirrors the design of #3059 for the percentage / range-based metric family. Adopts the breaking-change 🔴 marker convention discussed in #3080.

Summary

Percentage and range-based metrics (wmape, ope, arre, marre, coefficient_of_variation) previously either raised a hard ValueError or silently returned nan/inf when their denominator was zero. This made batch evaluation pipelines brittle for constant or all-zero components.

This PR extends the zero_division parameter pattern introduced in #3059 to these five metrics:

  • "warn" (default) — returns np.nan and emits a warning.
  • "raise" — preserves the legacy ValueError.

A new _safe_pct_divide helper sits next to _safe_scaled_divide in darts/metrics/utils.py. The two differ only in fill semantics: percentage metrics multiply the ratio by 100, so a 1.0 fill for the 0/0 case (the scaled-metric "on par with naive") would surface as 100 % error and be misleading — hence np.nan instead.

Two adjacent bugs are fixed in the same change:

  • ope previously checked sum > 0 and rejected actual_series with a strictly negative sum (e.g. financial return series). The check is now sum != 0 via the helper.
  • wmape's docstring claimed ValueError if actual_series contains some zeros, but the implementation divides by sum(|y_true|) and only the all-zero case ever triggered any path. Docstring corrected.

Behavior change

Input shape Behavior
Healthy inputs (non-zero denominator) Unchanged — all 455 existing tests pass without modification
Zero denominator + default "warn" Returns np.nan and logs a warning (previously: silent NaN/inf, or hard ValueError)
Zero denominator + "raise" Raises ValueError (legacy behavior preserved as opt-in)
Negative-sum actual_series in ope Now works correctly (previously: incorrectly raised)

Per the discussion in #3080, the CHANGELOG entry for the parameter addition carries the breaking-change 🔴 marker because the default behavior flips from raising to warning.

Test plan

  • New parametrized test_pct_metrics_zero_division covering all 5 metrics: warn-mode returns NaN with warning, raise-mode raises ValueError, invalid string rejected, healthy inputs unaffected.
  • New test_ope_accepts_negative_sum verifying the sum != 0 fix using a series with a strictly negative sum.
  • Updated existing test_ope_zero and test_arre to opt into legacy behavior with zero_division="raise".
  • uv run pytest darts/tests/metrics/test_metrics.py461 passed (was 455, +6 new).
  • uv run pre-commit run --files <changed> → all hooks pass (ruff, ruff-format, trailing whitespace, end-of-file).

Other Information

Files changed (+208/-31):

  • darts/metrics/utils.py_safe_pct_divide helper
  • darts/metrics/metrics.py — 5 metrics extended
  • darts/tests/metrics/test_metrics.py — 6 new tests + 2 legacy-test updates
  • CHANGELOG.md — entries under **Fixed** (the bug fixes) and **Improved** (the new parameter, with 🔴 marker)

@mahimn01 mahimn01 requested a review from dennisbader as a code owner May 26, 2026 15:23
…rics

Percentage and range-based metrics (`wmape`, `ope`, `arre`, `marre`,
`coefficient_of_variation`) previously either raised a hard `ValueError`
or silently returned `nan`/`inf` when their denominator was zero. This
made batch evaluation pipelines brittle for constant or all-zero
components.

Mirrors the `zero_division` design introduced in unit8co#3059 for the scaled
error family: `"warn"` (default) returns `np.nan` and emits a warning,
`"raise"` preserves the legacy `ValueError`. A new `_safe_pct_divide`
helper sits next to `_safe_scaled_divide`; the two differ only in fill
semantics — percentage metrics multiply the ratio by 100 so a `1.0`
fill for the 0/0 case (the scaled-metric "on par with naive") would
surface as `100 %` error and be misleading, hence `np.nan` instead.

Two adjacent bugs surface and are fixed in the same change:
* `ope` previously checked `sum > 0` and rejected `actual_series` with
  a strictly negative sum (e.g. financial return series). The check
  is now `sum != 0` via the helper.
* `wmape`'s docstring claimed `ValueError if actual_series contains
  some zeros`, but the implementation divides by `sum(|y_true|)` and
  only the all-zero case ever triggered the path. Docstring corrected.

The CHANGELOG entry for the parameter addition carries the breaking-
change marker per the convention discussed in unit8co#3080 (the post-mortem
on unit8co#3059), since the default behavior flips from raising to warning.

Adds a parametrized regression test covering all five metrics and an
explicit OPE-with-negative-sum test. Existing `test_ope_zero` and the
arre/marre legacy raise check are updated to opt into the legacy
behavior with `zero_division="raise"`.
@mahimn01 mahimn01 force-pushed the fix/metrics-zero-division-pct branch from 3565785 to d55581a Compare May 31, 2026 01:04
@mahimn01

mahimn01 commented Jun 8, 2026

Copy link
Copy Markdown
Author

Hi @dennisbader — gentle nudge on this when you get a chance. CI hasn't triggered yet (looks like it needs a maintainer approve-and-run for first-time contributors). It extends the #3059 zero_division pattern to the percentage/range metrics (wmape, ope, arre, marre, coefficient_of_variation), with tests. Thanks!

@codecov

codecov Bot commented Jun 9, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 96.48%. Comparing base (b0ccd3a) to head (d55581a).

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #3122      +/-   ##
==========================================
- Coverage   96.55%   96.48%   -0.07%     
==========================================
  Files         160      160              
  Lines       17285    17293       +8     
==========================================
- Hits        16689    16686       -3     
- Misses        596      607      +11     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@jakubchlapek

Copy link
Copy Markdown
Collaborator

Hey @mahimn01, thanks for this PR and contributing :). Will review it shortly as this is something we should definitely add, but it would be also great if you could create an issue for this first. Thanks!

@jakubchlapek jakubchlapek left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @mahimn01, I've taken an initial look at the code, it looks fine, thanks. The ope clarification makes sense to me. One thing that we would definitely need to add is analogous support to all the percentage metrics (e.g. ape, mape, sape, smape, wmape, ope, arre, marre).
Before reviewing further I would like to discuss the fill semantics. For the scaled metrics the user could pass in the insample metric leading to the zero_division issue in the first place. Here as we don't supply that, there is no issue of matching the periodic series with forecasts. To align this to the existing solution I think what would make more sense than returning nans is returning a perfect forecast (e.g. 0/0 for MAPE returns 0% error). This would then depend on the metric and best score, so we can think on how to structure this best (maybe collapse it with the _safe_scaled_divide as the logic will be quite similar I think.) Let me know what you think on this :)

Comment thread darts/metrics/metrics.py
q=q,
)
y_max, y_min = np.nanmax(y_true, axis=TIME_AX), np.nanmin(y_true, axis=TIME_AX)
if not (y_max > y_min).all():

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why was this removed?

Comment on lines +1520 to +1550

def test_ope_accepts_negative_sum(self, caplog):
"""OPE must accept ``actual_series`` whose sum is negative (e.g.
financial return series). Only an exact zero sum triggers the
zero-division handling.

The previous ``y_true_sum > 0`` guard incorrectly rejected this
valid input.
"""
# mean ~ 0 but sum strictly negative
neg_sum_values = np.array([
1.0,
-2.0,
1.0,
-3.0,
1.0,
-2.0,
1.0,
-3.0,
1.0,
-2.0,
]).reshape(-1, 1)
actual = TimeSeries.from_values(neg_sum_values)
pred = TimeSeries.from_values(neg_sum_values + 0.1)

caplog.clear()
with caplog.at_level(logging.WARNING):
result = metrics.ope(actual, pred)
assert "denominator" not in caplog.text
assert np.isfinite(result)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think it would be nicer to instead create a helper for negative series and ship this check as part of the test_ope() test (similarly to how it's structured there)

@mahimn01

mahimn01 commented Jun 9, 2026

Copy link
Copy Markdown
Author

Thanks @jakubchlapek, this makes sense — best-score for 0/0 is the right call, and it actually fixes a latent bug I'd missed: with NaN-everywhere, a genuinely perfect forecast on a degenerate series scores NaN (ope on a zero-sum series with an identical prediction; arre/marre on a constant series predicted perfectly).

I'll collapse _safe_pct_divide into _safe_scaled_divide — one helper with a configurable 0/0 fill (1.0 scaled, 0.0 percentage), x/0 → NaN, and "raise" preserved (keeping each caller's existing message). A nice side effect of the shared helper's element-wise fill: for arre/marre it fixes a case the current code gets wrong — a constant-actual column is currently all-NaN even at timesteps where the per-step error is 0; element-wise gives 0/0 → 0 and x/0 → NaN within the same column.

On scope (full breakdown in the issue): wmape/ope/arre/marre/coefficient_of_variation are already in this PR — I'll switch them to the unified helper. Good news on the rest of the family: sape/smape already encode 0/0 → 0.0 inline and have no x/0 case, so the only delta there is exposing the explicit zero_division knob for the "raise" option. ape/mape are the trickier ones — they still hard-raise on any y_t == 0, and because their denominator is the per-timestep actual (not a column scalar) they need element-wise handling; relaxing MAPE's default is also a larger behavioral change. Would you prefer ape/mape in this PR (default kept as raise, opt into fill via zero_division) or as a focused follow-up?

One small flag: coefficient_of_variation is RMSE/mean (not abs-wrapped, and the mean can be negative), so a zero mean is a legitimate non-degenerate case — I've proposed 0/0 → 0.0 / x/0 → NaN for it but happy to adjust.

Inline replies:

  • (arre, "why removed?") that was the old if not (y_max > y_min).all(): raise(...) guard — the zero-range check moved into the helper, gated behind zero_division (default fills; "raise" re-raises, preserving the legacy behavior). The old message also said "MARRE" inside arre(), so that copy-paste slip drops out.
  • (test) agreed — I'll fold the negative-sum check into test_ope() with a small negative-series helper. I'll also add a 0/0 best-score case to the zero-division test (the current fixtures are all x/0, so they stay NaN and need no change).

I've opened #3132 to track the design. Thanks for the thoughtful review!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants