fix: extend zero_division parameter to percentage and range-based metrics#3122
fix: extend zero_division parameter to percentage and range-based metrics#3122mahimn01 wants to merge 1 commit into
Conversation
…rics Percentage and range-based metrics (`wmape`, `ope`, `arre`, `marre`, `coefficient_of_variation`) previously either raised a hard `ValueError` or silently returned `nan`/`inf` when their denominator was zero. This made batch evaluation pipelines brittle for constant or all-zero components. Mirrors the `zero_division` design introduced in unit8co#3059 for the scaled error family: `"warn"` (default) returns `np.nan` and emits a warning, `"raise"` preserves the legacy `ValueError`. A new `_safe_pct_divide` helper sits next to `_safe_scaled_divide`; the two differ only in fill semantics — percentage metrics multiply the ratio by 100 so a `1.0` fill for the 0/0 case (the scaled-metric "on par with naive") would surface as `100 %` error and be misleading, hence `np.nan` instead. Two adjacent bugs surface and are fixed in the same change: * `ope` previously checked `sum > 0` and rejected `actual_series` with a strictly negative sum (e.g. financial return series). The check is now `sum != 0` via the helper. * `wmape`'s docstring claimed `ValueError if actual_series contains some zeros`, but the implementation divides by `sum(|y_true|)` and only the all-zero case ever triggered the path. Docstring corrected. The CHANGELOG entry for the parameter addition carries the breaking- change marker per the convention discussed in unit8co#3080 (the post-mortem on unit8co#3059), since the default behavior flips from raising to warning. Adds a parametrized regression test covering all five metrics and an explicit OPE-with-negative-sum test. Existing `test_ope_zero` and the arre/marre legacy raise check are updated to opt into the legacy behavior with `zero_division="raise"`.
3565785 to
d55581a
Compare
|
Hi @dennisbader — gentle nudge on this when you get a chance. CI hasn't triggered yet (looks like it needs a maintainer approve-and-run for first-time contributors). It extends the #3059 |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #3122 +/- ##
==========================================
- Coverage 96.55% 96.48% -0.07%
==========================================
Files 160 160
Lines 17285 17293 +8
==========================================
- Hits 16689 16686 -3
- Misses 596 607 +11 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
|
Hey @mahimn01, thanks for this PR and contributing :). Will review it shortly as this is something we should definitely add, but it would be also great if you could create an issue for this first. Thanks! |
jakubchlapek
left a comment
There was a problem hiding this comment.
Hey @mahimn01, I've taken an initial look at the code, it looks fine, thanks. The ope clarification makes sense to me. One thing that we would definitely need to add is analogous support to all the percentage metrics (e.g. ape, mape, sape, smape, wmape, ope, arre, marre).
Before reviewing further I would like to discuss the fill semantics. For the scaled metrics the user could pass in the insample metric leading to the zero_division issue in the first place. Here as we don't supply that, there is no issue of matching the periodic series with forecasts. To align this to the existing solution I think what would make more sense than returning nans is returning a perfect forecast (e.g. 0/0 for MAPE returns 0% error). This would then depend on the metric and best score, so we can think on how to structure this best (maybe collapse it with the _safe_scaled_divide as the logic will be quite similar I think.) Let me know what you think on this :)
| q=q, | ||
| ) | ||
| y_max, y_min = np.nanmax(y_true, axis=TIME_AX), np.nanmin(y_true, axis=TIME_AX) | ||
| if not (y_max > y_min).all(): |
There was a problem hiding this comment.
why was this removed?
|
|
||
| def test_ope_accepts_negative_sum(self, caplog): | ||
| """OPE must accept ``actual_series`` whose sum is negative (e.g. | ||
| financial return series). Only an exact zero sum triggers the | ||
| zero-division handling. | ||
|
|
||
| The previous ``y_true_sum > 0`` guard incorrectly rejected this | ||
| valid input. | ||
| """ | ||
| # mean ~ 0 but sum strictly negative | ||
| neg_sum_values = np.array([ | ||
| 1.0, | ||
| -2.0, | ||
| 1.0, | ||
| -3.0, | ||
| 1.0, | ||
| -2.0, | ||
| 1.0, | ||
| -3.0, | ||
| 1.0, | ||
| -2.0, | ||
| ]).reshape(-1, 1) | ||
| actual = TimeSeries.from_values(neg_sum_values) | ||
| pred = TimeSeries.from_values(neg_sum_values + 0.1) | ||
|
|
||
| caplog.clear() | ||
| with caplog.at_level(logging.WARNING): | ||
| result = metrics.ope(actual, pred) | ||
| assert "denominator" not in caplog.text | ||
| assert np.isfinite(result) | ||
|
|
There was a problem hiding this comment.
i think it would be nicer to instead create a helper for negative series and ship this check as part of the test_ope() test (similarly to how it's structured there)
|
Thanks @jakubchlapek, this makes sense — best-score for 0/0 is the right call, and it actually fixes a latent bug I'd missed: with NaN-everywhere, a genuinely perfect forecast on a degenerate series scores NaN ( I'll collapse On scope (full breakdown in the issue): One small flag: Inline replies:
I've opened #3132 to track the design. Thanks for the thoughtful review! |
Checklist before merging this PR:
Mirrors the design of #3059 for the percentage / range-based metric family. Adopts the breaking-change 🔴 marker convention discussed in #3080.
Summary
Percentage and range-based metrics (
wmape,ope,arre,marre,coefficient_of_variation) previously either raised a hardValueErroror silently returnednan/infwhen their denominator was zero. This made batch evaluation pipelines brittle for constant or all-zero components.This PR extends the
zero_divisionparameter pattern introduced in #3059 to these five metrics:"warn"(default) — returnsnp.nanand emits a warning."raise"— preserves the legacyValueError.A new
_safe_pct_dividehelper sits next to_safe_scaled_divideindarts/metrics/utils.py. The two differ only in fill semantics: percentage metrics multiply the ratio by 100, so a1.0fill for the 0/0 case (the scaled-metric "on par with naive") would surface as100 %error and be misleading — hencenp.naninstead.Two adjacent bugs are fixed in the same change:
opepreviously checkedsum > 0and rejectedactual_serieswith a strictly negative sum (e.g. financial return series). The check is nowsum != 0via the helper.wmape's docstring claimedValueError if actual_series contains some zeros, but the implementation divides bysum(|y_true|)and only the all-zero case ever triggered any path. Docstring corrected.Behavior change
"warn"np.nanand logs a warning (previously: silent NaN/inf, or hardValueError)"raise"ValueError(legacy behavior preserved as opt-in)actual_seriesinopePer the discussion in #3080, the CHANGELOG entry for the parameter addition carries the breaking-change 🔴 marker because the default behavior flips from raising to warning.
Test plan
test_pct_metrics_zero_divisioncovering all 5 metrics: warn-mode returns NaN with warning, raise-mode raisesValueError, invalid string rejected, healthy inputs unaffected.test_ope_accepts_negative_sumverifying thesum != 0fix using a series with a strictly negative sum.test_ope_zeroandtest_arreto opt into legacy behavior withzero_division="raise".uv run pytest darts/tests/metrics/test_metrics.py→ 461 passed (was 455, +6 new).uv run pre-commit run --files <changed>→ all hooks pass (ruff, ruff-format, trailing whitespace, end-of-file).Other Information
Files changed (+208/-31):
darts/metrics/utils.py—_safe_pct_dividehelperdarts/metrics/metrics.py— 5 metrics extendeddarts/tests/metrics/test_metrics.py— 6 new tests + 2 legacy-test updatesCHANGELOG.md— entries under**Fixed**(the bug fixes) and**Improved**(the new parameter, with 🔴 marker)