Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add viz stat-inbound and viz stat-outbound commands #12994

Merged
merged 6 commits into from
Aug 29, 2024
Merged

Conversation

adleong
Copy link
Member

@adleong adleong commented Aug 26, 2024

We add two new commands to the linkerd viz extension: linkerd viz stat-inbound and linkerd viz stat-outbound. These commands are meant as replacements for the linkerd viz stat. The linkerd viz stat command provides stats when ServiceProfiles are used whereas the new commands provide stats when xRoute resources are used. Either command can be used when no xRoute or ServiceProfile is used but the new commands include several improvements:

  • Inbound and outbound stats are clearly separated into different commands rather than being contextual based on flag combinations
  • Route level and backend level stats are displayed together in a tree-view in linkerd viz stat-outbound to easily see the effects of retries, timeouts, and traffic splitting
> linkerd viz stat-outbound -n schlep deploy                  
NAME         SERVICE      ROUTE           TYPE       BACKEND    SUCCESS   RPS  LATENCY_P50  LATENCY_P95  LATENCY_P99  TIMEOUTS  RETRIES  
client-http  schlep:80    schlep-default  HTTPRoute             100.00%  1.00         31ms        387ms        478ms     0.00%    6.25%  
                          └───────────────────────►  schlep:80   93.75%  1.07         16ms         88ms         98ms     1.56%           
client-grpc  schlep:8080  schlep-default  GRPCRoute              98.31%  0.98         36ms        425ms        485ms     0.00%    0.00%  
                          ├───────────────────────►  fail:8080   96.88%  0.53         12ms         24ms         25ms     0.00%           
                          └───────────────────────►  good:8080  100.00%  0.45         25ms         95ms         99ms     0.00%
> linkerd viz stat-inbound -n schlep deploy
NAME         SERVER          ROUTE      TYPE  SUCCESS   RPS  LATENCY_P50  LATENCY_P95  LATENCY_P99  
client-grpc  [default]:4191  [default]        100.00%  0.10          2ms          3ms          3ms  
client-grpc  [default]:4191  probe            100.00%  0.20          0ms          1ms          1ms  
client-http  [default]:4191  [default]        100.00%  0.10          2ms          2ms          2ms  
client-http  [default]:4191  probe            100.00%  0.20          0ms          1ms          1ms  
server-fail  [default]:4191  probe            100.00%  0.20          0ms          1ms          1ms  
server-fail  [default]:4191  [default]        100.00%  0.10          2ms          2ms          2ms  
server-fail  [default]:8080  [default]         94.87%  1.30          0ms          1ms          1ms  
server-good  [default]:4191  [default]        100.00%  0.10          0ms          1ms          1ms  
server-good  [default]:4191  probe            100.00%  0.20          0ms          1ms          1ms  
server-good  [default]:8080  [default]        100.00%  0.73          8ms         92ms         98ms  
server-slow  [default]:4191  [default]        100.00%  0.10          0ms          1ms          1ms  
server-slow  [default]:4191  probe            100.00%  0.20          0ms          1ms          1ms

Unlike the linkerd viz stat command, these commands query prometheus directly rather than going through the intermediary of the metrics-api. If prometheus is enabled in linkerd-viz, these commands will use a port-forward to connect to that prometheus instance. If an external prometheus is configured, these commands will attempt to use that prometheus URL; however note that the prometheus URL must be reachable from where the CLI is executed for this to work. This can be overridden by a --prometheusURL flag.

Json and table output are both supported.

Signed-off-by: Alex Leong <alex@buoyant.io>
Signed-off-by: Alex Leong <alex@buoyant.io>
Signed-off-by: Alex Leong <alex@buoyant.io>
@adleong adleong requested a review from a team as a code owner August 26, 2024 23:10
Signed-off-by: Alex Leong <alex@buoyant.io>
Copy link
Member

@alpeb alpeb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a big fan of this change. UX vastly improved and the code reads very well too!

I think most of my comments below for stat-inbound also apply to stat-outbound.

cli/cmd/root.go Outdated Show resolved Hide resolved
viz/pkg/api/api.go Outdated Show resolved Hide resolved
}
for quantile, resultsChan := range results {
go func(quantile string) {
defer close(resultsChan)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't you need to pin resultsChan?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apparently not since all the quantile channels do get populated with this code. I think because channels are always passed by reference, the contents of the loop variable are already a reference (instead of being a reference to the loop variable).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm I think that being a reference doesn't matter for how closures use those vars, but we likely are just benefiting from go 1.22's fix around this type of issue, so nothing to see here :-)

viz/cmd/stat-inbound.go Outdated Show resolved Hide resolved
Comment on lines +89 to +97
* cronjobs
* daemonsets
* deployments
* namespaces
* jobs
* pods
* replicasets
* replicationcontrollers
* statefulsets`,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Authority is still supported, I think it's worth mentioning it here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... technically this works but it kind of breaks the mental model of centering around concrete workloads. I'm tempted to explicitly make this not work. Is there a compelling use case for it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not that I know of...

Signed-off-by: Alex Leong <alex@buoyant.io>
Signed-off-by: Alex Leong <alex@buoyant.io>
@adleong adleong merged commit 366ab94 into main Aug 29, 2024
39 checks passed
@adleong adleong deleted the alex/in-n-out-bound branch August 29, 2024 19:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants