Skip to content

Rework the way we currently detect regressions in build time metrics #40076

Open

Description

Description

#36108 was an attempt to detect regressions in native builds when certain metrics are outside a given range. Unfortunately this doesn't seem to work well in practice. The main reason seems to be that multiple PRs gradually increase the metrics without hitting the threshold. Then a new PR that happens to increase the metrics a bit more triggers a failure. Although this PR might not be responsible for the total increase (that resulted in hitting the threshold) it is the one being blocked.

Implementation ideas

A thought we had within the mandrel team (cc @Karm @jerboaa) and we are working towards it is the following.

We would like to start collecting data from Quarkus CI runs (initially from runs on main and lately probably from PRs as well). This will allow us to observe the change over time (as show in #39674 (comment)) instead of just when we hit a threshold.

Next we would ideally like to feed these data to a tool with anomaly detection (possibly https://horreum.hyperfoil.io/) in order to get automated alerts when something seems wrong. That could be:

  1. Create a generic GH issue when we have crossed a threshold from the last known "good state"
  2. Create a PR specific issue or comment in an open PR if it appears to be causing a sudden increase in the metrics we are interested in.

Related PRs:

  1. Introduce RunnerInfo to ImageStats Karm/collector#23
  2. Upload native build statistics #39784
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions