log_gamma diagnostic tests failing frequently

In #575, `test_calibration_log_gamma_end_to_end` (implemented in #522) failed with the following error:

> =========================== short test summary info ===========================
FAILED tests/test_diagnostics/test_diagnostics_metrics.py::test_calibration_log_gamma_end_to_end - assert not np.int64(87) <= np.float64(100.0)
> +  where np.int64(87) = <function sum at 0x000001FC8D648470>([np.True_, np.True_, np.True_, np.False_, np.True_, np.True_, ...])
> +    where <function sum at 0x000001FC8D648470> = np.sum
!!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!
= 1 failed, 128 passed, 85 skipped, 308 deselected, 10105 warnings in 282.46s (0:04:42) =
Error: Process completed with exit code 1.

Briefly skimming through the latest test workflow history brought up another test failure in #572:

> =========================== short test summary info ============================
FAILED tests/test_diagnostics/test_diagnostics_metrics.py::test_calibration_log_gamma_end_to_end - assert np.float64(87.0) <= np.int64(86)
> +  where np.int64(86) = <function sum at 0x7fa95488bd30>([np.True_, np.True_, np.True_, np.True_, np.True_, np.True_, ...])
> +    where <function sum at 0x7fa95488bd30> = np.sum
!!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
= 1 failed, 128 passed, 85 skipped, 308 deselected, 10187 warnings in 155.43s (0:02:35) =
Error: Process completed with exit code 1.

and one it #564:

> =========================== short test summary info ============================
FAILED tests/test_diagnostics/test_diagnostics_metrics.py::test_calibration_log_gamma_end_to_end - assert not 90 <= 100.0
> +  where 90 = <function sum at 0x7ff3aea32f70>([True, True, False, False, True, True, ...])
> +    where <function sum at 0x7ff3aea32f70> = np.sum
!!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
= 1 failed, 128 passed, 85 skipped, 308 deselected, 18 warnings in 212.72s (0:03:32) =
Error: Process completed with exit code 1.

This might hint at either too conservative/brittle test settings or some unexpected behavior of the metric itself @daniel-habermann @paul-buerkner @stefanradev93 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

log_gamma diagnostic tests failing frequently #576

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

log_gamma diagnostic tests failing frequently #576

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions