Skip to content

linkerd viz stat-outbound reports incorrect latencies #13483

Open
@kflynn

Description

@kflynn

What is the issue?

linkerd viz stat-outbound seems to report latencies that are many times larger than actual latencies.

How can it be reproduced?

Get a new cluster, then

linkerd install --crds | kubectl apply -f -
linkerd install | kubectl apply -f -
linkerd viz install | kubectl apply -f -
linkerd check

kubectl create ns faces
kubectl annotate ns/faces linkerd.io/inject=enabled

helm install faces -n faces \
     oci://ghcr.io/buoyantio/faces-chart --version 2.0.0-rc.2 \
     --set gui.serviceType=LoadBalancer \
     --set face.errorFraction=0 \
     --set backend.errorFraction=0 \
     --set backend.delayBuckets="1000"

kubectl rollout status -n faces deploy

At this point, we need access to the faces-gui Service in the faces namespace. Though I can access it directly via its external IP in my setup, I'll write this repro using kubectl port-forward.

kubectl port-forward -n faces svc/faces-gui 8080:80

Using --set backend.delayBuckets=1000 when installing Faces means that the backend workloads (smiley and color) will always delay every call 1000ms. We can verify this with curl:

curl 'http://localhost:8080/face/center/?row=2&col=2' | jq

This will take about a second, and its output will include a latency element that will be around 1000ms. We can check this a few times:

for i in 1 2 3 4 5 6 7 8 9 10; do
    curl -s 'http://localhost:8080/face/center/?row=2&col=2' | jq .latency
done

This will take about 10 seconds, and should show a stack of numbers around 1000.

Go ahead and open a web browser to http://localhost:8080 and you'll see the Faces GUI, which we'll use as a traffic generator. Then run

watch linkerd viz stat-outbound -n faces deploy/face

and you'll see something like this (after possibly giving it a chance to warm up):

NAME  SERVICE    ROUTE      TYPE  BACKEND    SUCCESS    RPS  LATENCY_P50  LATENCY_P95  LATENCY_P99  TIMEOUTS  RETRIES
face  color:80   [default]                   100.00%   8.00       5500ms       9550ms       9910ms     0.00%    0.00%
                 └─────────────►  color:80   100.00%   8.00       5500ms       9550ms       9910ms     0.00%
face  smiley:80  [default]                   100.00%   8.00       5500ms       9550ms       9910ms     0.00%    0.00%
                 └─────────────►  smiley:80  100.00%   8.00       5500ms       9550ms       9910ms     0.00%

Ignore the 100% success rate for color (it's a gRPC service and we have no GRPCRoutes at present) and look at the latencies. All of these numbers should be right at 1000ms, but they're not?

Even more interesting, if we switch the backend latencies to 100ms:

kubectl set env -n faces deploy/smiley DELAY_BUCKETS=100
kubectl set env -n faces deploy/color DELAY_BUCKETS=100
kubectl rollout status -n faces deploy

then the for loop above will run it about one second, and show numbers right around 100, but linkerd viz stat-outbound -n faces deploy/face will (after it settles down) show things like this:

NAME  SERVICE    ROUTE      TYPE  BACKEND    SUCCESS    RPS  LATENCY_P50  LATENCY_P95  LATENCY_P99  TIMEOUTS  RETRIES
face  smiley:80  [default]                   100.00%   8.00        275ms        478ms        496ms     0.00%    0.00%
                 └─────────────►  smiley:80  100.00%   8.00        175ms        242ms        248ms     0.00%
face  color:80   [default]                   100.00%   8.00        275ms        478ms        496ms     0.00%    0.00%
                 └─────────────►  color:80   100.00%   8.00        175ms        242ms        248ms     0.00%

which is even weirder -- why the distinction between the different rows?

Logs, error output, etc

See above.

output of linkerd check -o short

:; linkerd check -o short
Status check results are √

Environment

MacOS 15.1.1
k3d version 5.7.4
k3s version 1.30.4-k3s1
Linkerd edge-24.11.8
Faces 2.0.0-rc.2 😇

Possible solution

No response

Additional context

No response

Would you like to work on fixing this bug?

maybe

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions