-
Notifications
You must be signed in to change notification settings - Fork 20.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rpc: mismatch in rpc-metrics all versus individual #28619
Comments
Let's go |
NewTimer (used by the summary) return &StandardTimer{
histogram: NewHistogram(NewExpDecaySample(1028, 0.015)),
meter: NewMeter(),
} The individual ones:
So it seems that they are using the same basic building blocks.
I can't say what the difference comes from. I think you need to investigate your data more, for example, aside from those top 5, are there any extreme other outliers which affect the "all" meter? |
I have only ~10 eth methods which has P95 > 0, and also no extreme outliers as all others are lower values. My guess is that one of these two metrics not showing correct results. I checked different networks with this metrics and all has the same behaviour: Value of "all metric P95" being 100-1000x higher than highest P95 per method. (e.g. ethereum mainnet, sepolia, polygon,...) |
I think you're only showing the top 5 methods that succeed. Could you do There will always be a small mismatch between the two numbers since we do the time.Since computation twice at different points, but the difference should not be as big |
I thought that as well, but most of the time (checked now) the Exactly as you said. Difference is ok to exist, but this big difference means something is not working correctly. Here are the current results for my Ethereum Mainnet node (just to be sure it's not something related to Polygon fork.
Difference is ~30x. 🤔 |
Could you get the raw metrics report from geth? I don't know what happens there, with the |
100% sure everything is ok on Prometheus side. Here are the raw metrics from very same node. I did
|
Thanks for this, you are right, and I've found the bug now |
Something we don't get regarding Summary metric
rpc_duration_all
and the same metric per method.Just for easier glance, we created 2 recording rules so we can get percentiles per method by searching specific label:
So now when we compare P95 for all methods:
and P95 for top 5 methods:
See the results. The difference is huge. Thats not possible if summaries work correctly, those "top 5 P95 values" should be much much closer to
rpc_duration_all
.Can someone explain this behaviour?
The text was updated successfully, but these errors were encountered: