-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
balancer/rls: Add picker metrics #7484
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #7484 +/- ##
==========================================
- Coverage 81.58% 81.50% -0.08%
==========================================
Files 357 357
Lines 27243 27294 +51
==========================================
+ Hits 22227 22247 +20
- Misses 3820 3831 +11
- Partials 1196 1216 +20
|
Also, please add a more descriptive release note entry. |
balancer/rls/picker.go
Outdated
pr := "fail" | ||
if _, ok := status.FromError(err); ok { | ||
pr = "drop" | ||
} | ||
return res, func() { | ||
targetPicksMetric.Record(p.metricsRecorder, 1, p.grpcTarget, p.rlsServerTarget, cpw.target, pr) | ||
}, err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about making a helper since this logic is used in two places now:
// errToPickResult is a helper function which converts the error value returned
// by Pick() to a string that represents the pick result.
func errToPickResult(err error) string {
if err == nil {
return "complete"
}
if errors.Is(err, balancer.ErrNoSubConnAvailable) {
return "queue"
}
if _, ok := status.FromError(err); ok {
return "drop"
}
return "fail"
}
And we need to handle the balancer.ErrNoSubConnAvailable
case in your code, don't we? Otherwise, we will wrongly mark picks that a child policy wanted to be queued as "fail".
And if the decision is to not record a metric for queued picks, we need to make sure that we ignore that explicitly after we handle the balancer.ErrNoSubConnAvailable
case, as in the helper I described above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh for some reason I thought the balancer.ErrNoSubConnAvailable was only possible to be emitted from this RLS Layer. I forgot this could emit from the child as well in these error cases. I will try your helper out.
Also, how do you plan to test these changes? |
Thanks for the pass; great suggestions. I plan on testing this with unit tests for more specific sceanrios/counters, and e2e with RLS deployed as a top level balancer of channel, with a OpenTelemetry component configured. I can then verify OpenTelemetry eventually emits the expected metrics atoms for the RLS Balancer metrics. |
balancer/rls/picker.go
Outdated
rf := func() { | ||
targetPicksMetric.Record(p.metricsRecorder, 1, p.grpcTarget, p.rlsServerTarget, cpw.target, pr) | ||
} | ||
if pr == "queue" { | ||
// Don't record metrics for queued Picks. | ||
rf = func() {} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here and down below, I would pull in the check for queued picks inside the callback:
rf := func() {
if pr == "queue" {
// Don't record metrics for queued Picks.
return
}
targetPicksMetric.Record(p.metricsRecorder, 1, p.grpcTarget, p.rlsServerTarget, cpw.target, pr)
}
The above approach would mean that for completed and failed picks, we would incur the extra cost of running a conditional if pr == "queue" { ... }
, and branch statements can be expensive with CPU pipelining. I'm not too concerned about it though. I prefer the way the code looks when you pull in it. But if you are @dfawley thinks the cost is not worth the ergonomics, I'm fine with it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doug said he doesn't mind either, so I switched to this.
This PR adds the implementation for RLS metrics.
RELEASE NOTES: