You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
PR #185 made privileged M-series interop jobs more tolerant of transient containerlab failures by retrying each topology in .github/actions/run-interop-test. The action emits warning annotations when a retry succeeds, but there is no durable per-job summary that shows how often retries are being consumed across runs.
Expected direction
Add lightweight retry-rate telemetry for interop jobs so we can distinguish real stability improvements from flake masking. This should stay CI-local and low-risk: job summaries, uploaded JSON/Markdown artifacts, or workflow annotations are enough; no external metrics service is required for the first slice.
Acceptance criteria
Each interop job records topology label, attempts used, final result, and whether a retry absorbed a transient failure.
The workflow summary exposes retry counts in a way reviewers can see from the Actions UI.
Failed jobs still surface the first failing attempt clearly enough for debugging.
Context
PR #185 made privileged M-series interop jobs more tolerant of transient containerlab failures by retrying each topology in
.github/actions/run-interop-test. The action emits warning annotations when a retry succeeds, but there is no durable per-job summary that shows how often retries are being consumed across runs.Expected direction
Add lightweight retry-rate telemetry for interop jobs so we can distinguish real stability improvements from flake masking. This should stay CI-local and low-risk: job summaries, uploaded JSON/Markdown artifacts, or workflow annotations are enough; no external metrics service is required for the first slice.
Acceptance criteria
Related
.github/actions/run-interop-test/action.yml