Skip to content

CA-411679: Runstate metrics return data over 100% #6493

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jun 5, 2025

Conversation

BengangY
Copy link
Contributor

@BengangY BengangY commented Jun 3, 2025

To handle deviations in CPU rates, Derive values exceeding the maximum by up to 5% are capped at the maximum; others are marked as unknown. This logic is specific to Derive data sources because they represent rates derived from differences over time, which can occasionally exceed expected bounds due to measurement inaccuracies.

BengangY added 2 commits June 3, 2025 04:32
Signed-off-by: Bengang Yuan <bengang.yuan@cloud.com>
To handle deviations in CPU rates, Derive values exceeding the maximum
by up to 5% are capped at the maximum; others are marked as unknown.
This logic is specific to Derive data sources because they represent
rates derived from differences over time, which can occasionally
exceed expected bounds due to measurement inaccuracies.

Signed-off-by: Bengang Yuan <bengang.yuan@cloud.com>
Copy link
Contributor

@last-genius last-genius left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While the bulk of the inaccuracies should have already been addressed, there can potentially still be some small measurement inaccuracies (especially at high load) or floating-point related errors. 5% does seem more or less proper as a cutoff point for these.

Maybe we could, at least for some time, log something every time such a cutoff occurs?

@BengangY
Copy link
Contributor Author

BengangY commented Jun 3, 2025

While the bulk of the inaccuracies should have already been addressed, there can potentially still be some small measurement inaccuracies (especially at high load) or floating-point related errors. 5% does seem more or less proper as a cutoff point for these.

Maybe we could, at least for some time, log something every time such a cutoff occurs?

Good idea. I saw there is stats.ml in rrd lib for timing statistics, but it seems not used much. Maybe we can use it to do some statistics for these inaccuracies.

@BengangY BengangY marked this pull request as ready for review June 5, 2025 01:02
@BengangY BengangY added this pull request to the merge queue Jun 5, 2025
Merged via the queue into xapi-project:master with commit 7ad7f88 Jun 5, 2025
17 checks passed
gangj added a commit to gangj/xen-api that referenced this pull request Jun 13, 2025
Follow the fix here:
xapi-project#6493

Signed-off-by: Gang Ji <gang.ji@cloud.com>
gangj added a commit to gangj/xen-api that referenced this pull request Jun 13, 2025
Follow the fix here:
xapi-project#6493

Signed-off-by: Gang Ji <gang.ji@cloud.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants