-
Notifications
You must be signed in to change notification settings - Fork 1.7k
[ENH]: add metric for component queue depth & change dispatcher queue depth metric buckets #5261
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ENH]: add metric for component queue depth & change dispatcher queue depth metric buckets #5261
Conversation
Reviewer ChecklistPlease leverage this checklist to ensure your code review is thorough before approving Testing, Bugs, Errors, Logs, Documentation
System Compatibility
Quality
|
… depth metrics to histograms
688fccd
to
4121d31
Compare
Add Component and Dispatcher Queue Depth Metrics with Dynamic Histogram Bucketing This PR introduces a new metric to track the queue depth of each component, allowing observability into how many messages are waiting for processing per component. Additionally, it modifies the dispatcher queue metrics so that the associated OpenTelemetry histograms use dynamically-calculated bucket boundaries based on the relevant configuration values (e.g., max queue sizes). This enhancement is intended to facilitate debugging of component and dispatcher queuing issues. The changes were validated visually in Grafana using Tilt. Key Changes• Introduced an Affected Areas• rust/system/src/execution/dispatcher.rs This summary was automatically generated by @propel-code-bot |
… depth metric buckets (#5261)
Co-authored-by: Max Isom <codetheweb@users.noreply.github.com>
Description of changes
This adds a metric for the queue depth of components (number of messages waiting to be handled by the component) and updates the dispatcher queue metrics to have a dynamic set of boundaries based on the current config.
This will help debug component/dispatcher queuing issues.
Test plan
How are these changes tested?
I validated that the metrics show up in Grafana when running with Tilt and that the values look reasonable.
Migration plan
Are there any migrations, or any forwards/backwards compatibility changes needed in order to make sure this change deploys reliably?
Observability plan
What is the plan to instrument and monitor this change?
Documentation Changes
Are all docstrings for user-facing APIs updated if required? Do we need to make documentation changes in the docs section?