fix: extend zero_division parameter to percentage and range-based metrics by mahimn01 · Pull Request #3122 · unit8co/darts

mahimn01 · 2026-05-26T15:23:06Z

Checklist before merging this PR:

Mentioned all issues that this PR fixes or addresses.
Summarized the updates of this PR under Summary.
Added an entry under Unreleased in the Changelog.

Mirrors the design of #3059 for the percentage / range-based metric family. Adopts the breaking-change 🔴 marker convention discussed in #3080.

Summary

Percentage and range-based metrics (wmape, ope, arre, marre, coefficient_of_variation) previously either raised a hard ValueError or silently returned nan/inf when their denominator was zero. This made batch evaluation pipelines brittle for constant or all-zero components.

This PR extends the zero_division parameter pattern introduced in #3059 to these five metrics:

"warn" (default) — returns np.nan and emits a warning.
"raise" — preserves the legacy ValueError.

A new _safe_pct_divide helper sits next to _safe_scaled_divide in darts/metrics/utils.py. The two differ only in fill semantics: percentage metrics multiply the ratio by 100, so a 1.0 fill for the 0/0 case (the scaled-metric "on par with naive") would surface as 100 % error and be misleading — hence np.nan instead.

Two adjacent bugs are fixed in the same change:

ope previously checked sum > 0 and rejected actual_series with a strictly negative sum (e.g. financial return series). The check is now sum != 0 via the helper.
wmape's docstring claimed ValueError if actual_series contains some zeros, but the implementation divides by sum(|y_true|) and only the all-zero case ever triggered any path. Docstring corrected.

Behavior change

Input shape	Behavior
Healthy inputs (non-zero denominator)	Unchanged — all 455 existing tests pass without modification
Zero denominator + default `"warn"`	Returns `np.nan` and logs a warning (previously: silent NaN/inf, or hard `ValueError`)
Zero denominator + `"raise"`	Raises `ValueError` (legacy behavior preserved as opt-in)
Negative-sum `actual_series` in `ope`	Now works correctly (previously: incorrectly raised)

Per the discussion in #3080, the CHANGELOG entry for the parameter addition carries the breaking-change 🔴 marker because the default behavior flips from raising to warning.

Test plan

New parametrized test_pct_metrics_zero_division covering all 5 metrics: warn-mode returns NaN with warning, raise-mode raises ValueError, invalid string rejected, healthy inputs unaffected.
New test_ope_accepts_negative_sum verifying the sum != 0 fix using a series with a strictly negative sum.
Updated existing test_ope_zero and test_arre to opt into legacy behavior with zero_division="raise".
uv run pytest darts/tests/metrics/test_metrics.py → 461 passed (was 455, +6 new).
uv run pre-commit run --files <changed> → all hooks pass (ruff, ruff-format, trailing whitespace, end-of-file).

Other Information

Files changed (+208/-31):

darts/metrics/utils.py — _safe_pct_divide helper
darts/metrics/metrics.py — 5 metrics extended
darts/tests/metrics/test_metrics.py — 6 new tests + 2 legacy-test updates
CHANGELOG.md — entries under **Fixed** (the bug fixes) and **Improved** (the new parameter, with 🔴 marker)

…rics Percentage and range-based metrics (`wmape`, `ope`, `arre`, `marre`, `coefficient_of_variation`) previously either raised a hard `ValueError` or silently returned `nan`/`inf` when their denominator was zero. This made batch evaluation pipelines brittle for constant or all-zero components. Mirrors the `zero_division` design introduced in unit8co#3059 for the scaled error family: `"warn"` (default) returns `np.nan` and emits a warning, `"raise"` preserves the legacy `ValueError`. A new `_safe_pct_divide` helper sits next to `_safe_scaled_divide`; the two differ only in fill semantics — percentage metrics multiply the ratio by 100 so a `1.0` fill for the 0/0 case (the scaled-metric "on par with naive") would surface as `100 %` error and be misleading, hence `np.nan` instead. Two adjacent bugs surface and are fixed in the same change: * `ope` previously checked `sum > 0` and rejected `actual_series` with a strictly negative sum (e.g. financial return series). The check is now `sum != 0` via the helper. * `wmape`'s docstring claimed `ValueError if actual_series contains some zeros`, but the implementation divides by `sum(|y_true|)` and only the all-zero case ever triggered the path. Docstring corrected. The CHANGELOG entry for the parameter addition carries the breaking- change marker per the convention discussed in unit8co#3080 (the post-mortem on unit8co#3059), since the default behavior flips from raising to warning. Adds a parametrized regression test covering all five metrics and an explicit OPE-with-negative-sum test. Existing `test_ope_zero` and the arre/marre legacy raise check are updated to opt into the legacy behavior with `zero_division="raise"`.

mahimn01 · 2026-06-08T15:50:29Z

Hi @dennisbader — gentle nudge on this when you get a chance. CI hasn't triggered yet (looks like it needs a maintainer approve-and-run for first-time contributors). It extends the #3059 zero_division pattern to the percentage/range metrics (wmape, ope, arre, marre, coefficient_of_variation), with tests. Thanks!

codecov · 2026-06-09T07:37:09Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 96.48%. Comparing base (b0ccd3a) to head (d55581a).

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #3122      +/-   ##
==========================================
- Coverage   96.55%   96.48%   -0.07%     
==========================================
  Files         160      160              
  Lines       17285    17293       +8     
==========================================
- Hits        16689    16686       -3     
- Misses        596      607      +11

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

jakubchlapek · 2026-06-09T09:46:54Z

Hey @mahimn01, thanks for this PR and contributing :). Will review it shortly as this is something we should definitely add, but it would be also great if you could create an issue for this first. Thanks!

jakubchlapek

Hey @mahimn01, I've taken an initial look at the code, it looks fine, thanks. The ope clarification makes sense to me. One thing that we would definitely need to add is analogous support to all the percentage metrics (e.g. ape, mape, sape, smape, wmape, ope, arre, marre).
Before reviewing further I would like to discuss the fill semantics. For the scaled metrics the user could pass in the insample metric leading to the zero_division issue in the first place. Here as we don't supply that, there is no issue of matching the periodic series with forecasts. To align this to the existing solution I think what would make more sense than returning nans is returning a perfect forecast (e.g. 0/0 for MAPE returns 0% error). This would then depend on the metric and best score, so we can think on how to structure this best (maybe collapse it with the _safe_scaled_divide as the logic will be quite similar I think.) Let me know what you think on this :)

jakubchlapek · 2026-06-09T11:57:43Z

        q=q,
    )
    y_max, y_min = np.nanmax(y_true, axis=TIME_AX), np.nanmin(y_true, axis=TIME_AX)
-    if not (y_max > y_min).all():


why was this removed?

jakubchlapek · 2026-06-09T12:24:06Z

+
+    def test_ope_accepts_negative_sum(self, caplog):
+        """OPE must accept ``actual_series`` whose sum is negative (e.g.
+        financial return series). Only an exact zero sum triggers the
+        zero-division handling.
+
+        The previous ``y_true_sum > 0`` guard incorrectly rejected this
+        valid input.
+        """
+        # mean ~ 0 but sum strictly negative
+        neg_sum_values = np.array([
+            1.0,
+            -2.0,
+            1.0,
+            -3.0,
+            1.0,
+            -2.0,
+            1.0,
+            -3.0,
+            1.0,
+            -2.0,
+        ]).reshape(-1, 1)
+        actual = TimeSeries.from_values(neg_sum_values)
+        pred = TimeSeries.from_values(neg_sum_values + 0.1)
+
+        caplog.clear()
+        with caplog.at_level(logging.WARNING):
+            result = metrics.ope(actual, pred)
+        assert "denominator" not in caplog.text
+        assert np.isfinite(result)
+


i think it would be nicer to instead create a helper for negative series and ship this check as part of the test_ope() test (similarly to how it's structured there)

mahimn01 · 2026-06-09T18:53:12Z

Thanks @jakubchlapek, this makes sense — best-score for 0/0 is the right call, and it actually fixes a latent bug I'd missed: with NaN-everywhere, a genuinely perfect forecast on a degenerate series scores NaN (ope on a zero-sum series with an identical prediction; arre/marre on a constant series predicted perfectly).

I'll collapse _safe_pct_divide into _safe_scaled_divide — one helper with a configurable 0/0 fill (1.0 scaled, 0.0 percentage), x/0 → NaN, and "raise" preserved (keeping each caller's existing message). A nice side effect of the shared helper's element-wise fill: for arre/marre it fixes a case the current code gets wrong — a constant-actual column is currently all-NaN even at timesteps where the per-step error is 0; element-wise gives 0/0 → 0 and x/0 → NaN within the same column.

On scope (full breakdown in the issue): wmape/ope/arre/marre/coefficient_of_variation are already in this PR — I'll switch them to the unified helper. Good news on the rest of the family: sape/smape already encode 0/0 → 0.0 inline and have no x/0 case, so the only delta there is exposing the explicit zero_division knob for the "raise" option. ape/mape are the trickier ones — they still hard-raise on any y_t == 0, and because their denominator is the per-timestep actual (not a column scalar) they need element-wise handling; relaxing MAPE's default is also a larger behavioral change. Would you prefer ape/mape in this PR (default kept as raise, opt into fill via zero_division) or as a focused follow-up?

One small flag: coefficient_of_variation is RMSE/mean (not abs-wrapped, and the mean can be negative), so a zero mean is a legitimate non-degenerate case — I've proposed 0/0 → 0.0 / x/0 → NaN for it but happy to adjust.

Inline replies:

(arre, "why removed?") that was the old if not (y_max > y_min).all(): raise(...) guard — the zero-range check moved into the helper, gated behind zero_division (default fills; "raise" re-raises, preserving the legacy behavior). The old message also said "MARRE" inside arre(), so that copy-paste slip drops out.
(test) agreed — I'll fold the negative-sum check into test_ope() with a small negative-series helper. I'll also add a 0/0 best-score case to the zero-division test (the current fixtures are all x/0, so they stay NaN and need no change).

I've opened #3132 to track the design. Thanks for the thoughtful review!

mahimn01 requested a review from dennisbader as a code owner May 26, 2026 15:23

mahimn01 force-pushed the fix/metrics-zero-division-pct branch from 3565785 to d55581a Compare May 31, 2026 01:04

jakubchlapek reviewed Jun 9, 2026

View reviewed changes

mahimn01 mentioned this pull request Jun 9, 2026

Consistent zero-denominator handling for percentage and range metrics #3132

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: extend zero_division parameter to percentage and range-based metrics#3122

fix: extend zero_division parameter to percentage and range-based metrics#3122
mahimn01 wants to merge 1 commit into
unit8co:masterfrom
mahimn01:fix/metrics-zero-division-pct

mahimn01 commented May 26, 2026

Uh oh!

mahimn01 commented Jun 8, 2026

Uh oh!

codecov Bot commented Jun 9, 2026

Uh oh!

jakubchlapek commented Jun 9, 2026

Uh oh!

jakubchlapek left a comment

Uh oh!

jakubchlapek Jun 9, 2026

Uh oh!

jakubchlapek Jun 9, 2026

Uh oh!

mahimn01 commented Jun 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

mahimn01 commented May 26, 2026

Summary

Behavior change

Test plan

Other Information

Uh oh!

mahimn01 commented Jun 8, 2026

Uh oh!

codecov Bot commented Jun 9, 2026

Codecov Report

Uh oh!

jakubchlapek commented Jun 9, 2026

Uh oh!

jakubchlapek left a comment

Choose a reason for hiding this comment

Uh oh!

jakubchlapek Jun 9, 2026

Choose a reason for hiding this comment

Uh oh!

jakubchlapek Jun 9, 2026

Choose a reason for hiding this comment

Uh oh!

mahimn01 commented Jun 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants