Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI/CD conventions for metrics #1111

Open
christophe-kamphaus-jemmic opened this issue Jun 2, 2024 · 4 comments
Open

CI/CD conventions for metrics #1111

christophe-kamphaus-jemmic opened this issue Jun 2, 2024 · 4 comments
Labels
area:cicd enhancement New feature or request

Comments

@christophe-kamphaus-jemmic
Copy link
Contributor

christophe-kamphaus-jemmic commented Jun 2, 2024

Area(s)

area:cicd

Is your change request related to a problem? Please describe.

This issue is to discuss attributes specific to metrics and as part of the CI/CD Working Group and Semantic Conventions WG.
Also a challenge specific to metrics can the time series cardinality when CICD observes metrics for individual builds.

Describe the solution you'd like

Following #1075 (by adjusting the vocabulary here below to align with #1075) we should define metric attributes for

  • duration of pipelineRuns (by status, pipeline)
  • count of pipelineRuns (by status, pipeline)
  • count of agents
  • queue length of pending pipelineRuns
  • duration for how long a pipelineRun is in the queue before starting execution

Additionally it should be possible to opt-in to metrics specific to a particular pipelineRun.
These could be metrics about the agent which executes a pipelineRun, the OS, network, jvm, the number of failed/total tests …
We need to specify the attribute which should link these metrics to the pipelineRun, eg. pipeline.run.id

Metrics specific to a pipelineRun are of high cardinality. We should document this as a warning and give guidance how these metrics can be efficiently encoded in the OTel protocol, ie by using resource attributes instead of metric attributes wherever possible.

Describe alternatives you've considered

Span metrics could be used for duration and count of pipelineRuns, however this relies on the pipelineRuns having completed.
This is due to limitations inherent in using traces to represent pipelineRuns, a span can only be sent when complete.
Due to this limitation it could be preferable for the CICD system to expose metrics directly about the duration, count and status of pipelineRuns. These pipelineRuns could account also for in progress builds.

Additional context

CICD metrics were discussed at KubeCon March 2024 SemConv users meeting.
High cardinality was highlighted as an issue for per build metrics.
Notes on how to deal with cardinality were:

  • Could we use Exemplars? We could link to the build trace from some metrics.
    This added information might make it easier to identify pipelineRuns that need investigation.
  • Using the resource attribute for the build ID is fine for the OTel protocol,
    but backends (eg. Prometheus) would still have the cardinality issue when storing the time series
    (metric / resource attributes would be flattened into time series).
@christophe-kamphaus-jemmic christophe-kamphaus-jemmic added enhancement New feature or request experts needed This issue or pull request is outside an area where general approvers feel they can approve triage:needs-triage labels Jun 2, 2024
@joaopgrassi joaopgrassi removed experts needed This issue or pull request is outside an area where general approvers feel they can approve triage:needs-triage labels Jul 9, 2024
@christophe-kamphaus-jemmic
Copy link
Contributor Author

We can use label area:cicd instead of area:new.

@adrielp
Copy link
Contributor

adrielp commented Jul 29, 2024

Currently have a pull request open to change the metrics in the Git Provider Receiver component within the OTEL Collector to better match the new conventions set in the registry. I think this can help provide contextual implementation details as part of this conversation.

@christophe-kamphaus-jemmic
Copy link
Contributor Author

christophe-kamphaus-jemmic commented Aug 29, 2024

Let's create additional issues for the separate concerns of metrics:

  • vcs metrics
  • metrics related to job queues
  • metrics related to individual builds (high cardinality issue)

Ie let's have smaller PRs to address them separately

@christophe-kamphaus-jemmic
Copy link
Contributor Author

@adrielp Can we use #1184 for point 3 "metrics related to individual builds (high cardinality issue)", perhaps renaming the issue or should this be a separate issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:cicd enhancement New feature or request
Projects
Status: No status
Development

No branches or pull requests

3 participants