Skip to content

Conversation

nugaon
Copy link
Member

@nugaon nugaon commented Jul 27, 2025

This PR adds Prometheus metrics to monitor worker wait times during chunk sampling in the
ReserveSample
function. Worker goroutines now track time between processing chunks, calculating the waiting time statistics reported via a new SamplingWorkerStats gauge. The implementation avoids Prometheus cardinality explosion by only reporting summary statistics at worker termination rather than per-observation metrics. These insights enable identification of bottlenecks in the sampling pipeline.

Checklist

  • I have read the coding guide.
  • My change requires a documentation update, and I have done it.
  • I have added tests to cover my changes.
  • I have filled out the description and linked the related issues.

Description

Open API Spec Version Changes (if applicable)

Motivation and Context (Optional)

Related Issue (Optional)

Screenshots (if appropriate):

@nugaon nugaon requested a review from gacevicljubisa July 27, 2025 10:24
@nugaon nugaon changed the base branch from master to feat/sampling-opts July 28, 2025 12:35
@nugaon nugaon linked an issue Jul 29, 2025 that may be closed by this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Optimize ReserveSample Function

1 participant