Skip to content

log_gamma diagnostic tests failing frequently #576

@elseml

Description

@elseml

In #575, test_calibration_log_gamma_end_to_end (implemented in #522) failed with the following error:

=========================== short test summary info ===========================
FAILED tests/test_diagnostics/test_diagnostics_metrics.py::test_calibration_log_gamma_end_to_end - assert not np.int64(87) <= np.float64(100.0)

  • where np.int64(87) = <function sum at 0x000001FC8D648470>([np.True_, np.True_, np.True_, np.False_, np.True_, np.True_, ...])
  • where <function sum at 0x000001FC8D648470> = np.sum
    !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!
    = 1 failed, 128 passed, 85 skipped, 308 deselected, 10105 warnings in 282.46s (0:04:42) =
    Error: Process completed with exit code 1.

Briefly skimming through the latest test workflow history brought up another test failure in #572:

=========================== short test summary info ============================
FAILED tests/test_diagnostics/test_diagnostics_metrics.py::test_calibration_log_gamma_end_to_end - assert np.float64(87.0) <= np.int64(86)

  • where np.int64(86) = <function sum at 0x7fa95488bd30>([np.True_, np.True_, np.True_, np.True_, np.True_, np.True_, ...])
  • where <function sum at 0x7fa95488bd30> = np.sum
    !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
    = 1 failed, 128 passed, 85 skipped, 308 deselected, 10187 warnings in 155.43s (0:02:35) =
    Error: Process completed with exit code 1.

and one it #564:

=========================== short test summary info ============================
FAILED tests/test_diagnostics/test_diagnostics_metrics.py::test_calibration_log_gamma_end_to_end - assert not 90 <= 100.0

  • where 90 = <function sum at 0x7ff3aea32f70>([True, True, False, False, True, True, ...])
  • where <function sum at 0x7ff3aea32f70> = np.sum
    !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
    = 1 failed, 128 passed, 85 skipped, 308 deselected, 18 warnings in 212.72s (0:03:32) =
    Error: Process completed with exit code 1.

This might hint at either too conservative/brittle test settings or some unexpected behavior of the metric itself @daniel-habermann @paul-buerkner @stefanradev93

Metadata

Metadata

Labels

unit testsA new set of tests needs to be added.

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions