Description
What is the issue?
linkerd viz stat-outbound
seems to report latencies that are many times larger than actual latencies.
How can it be reproduced?
Get a new cluster, then
linkerd install --crds | kubectl apply -f -
linkerd install | kubectl apply -f -
linkerd viz install | kubectl apply -f -
linkerd check
kubectl create ns faces
kubectl annotate ns/faces linkerd.io/inject=enabled
helm install faces -n faces \
oci://ghcr.io/buoyantio/faces-chart --version 2.0.0-rc.2 \
--set gui.serviceType=LoadBalancer \
--set face.errorFraction=0 \
--set backend.errorFraction=0 \
--set backend.delayBuckets="1000"
kubectl rollout status -n faces deploy
At this point, we need access to the faces-gui
Service in the faces
namespace. Though I can access it directly via its external IP in my setup, I'll write this repro using kubectl port-forward
.
kubectl port-forward -n faces svc/faces-gui 8080:80
Using --set backend.delayBuckets=1000
when installing Faces means that the backend workloads (smiley
and color
) will always delay every call 1000ms. We can verify this with curl
:
curl 'http://localhost:8080/face/center/?row=2&col=2' | jq
This will take about a second, and its output will include a latency
element that will be around 1000ms. We can check this a few times:
for i in 1 2 3 4 5 6 7 8 9 10; do
curl -s 'http://localhost:8080/face/center/?row=2&col=2' | jq .latency
done
This will take about 10 seconds, and should show a stack of numbers around 1000.
Go ahead and open a web browser to http://localhost:8080
and you'll see the Faces GUI, which we'll use as a traffic generator. Then run
watch linkerd viz stat-outbound -n faces deploy/face
and you'll see something like this (after possibly giving it a chance to warm up):
NAME SERVICE ROUTE TYPE BACKEND SUCCESS RPS LATENCY_P50 LATENCY_P95 LATENCY_P99 TIMEOUTS RETRIES
face color:80 [default] 100.00% 8.00 5500ms 9550ms 9910ms 0.00% 0.00%
└─────────────► color:80 100.00% 8.00 5500ms 9550ms 9910ms 0.00%
face smiley:80 [default] 100.00% 8.00 5500ms 9550ms 9910ms 0.00% 0.00%
└─────────────► smiley:80 100.00% 8.00 5500ms 9550ms 9910ms 0.00%
Ignore the 100% success rate for color
(it's a gRPC service and we have no GRPCRoutes at present) and look at the latencies. All of these numbers should be right at 1000ms, but they're not?
Even more interesting, if we switch the backend latencies to 100ms:
kubectl set env -n faces deploy/smiley DELAY_BUCKETS=100
kubectl set env -n faces deploy/color DELAY_BUCKETS=100
kubectl rollout status -n faces deploy
then the for
loop above will run it about one second, and show numbers right around 100, but linkerd viz stat-outbound -n faces deploy/face
will (after it settles down) show things like this:
NAME SERVICE ROUTE TYPE BACKEND SUCCESS RPS LATENCY_P50 LATENCY_P95 LATENCY_P99 TIMEOUTS RETRIES
face smiley:80 [default] 100.00% 8.00 275ms 478ms 496ms 0.00% 0.00%
└─────────────► smiley:80 100.00% 8.00 175ms 242ms 248ms 0.00%
face color:80 [default] 100.00% 8.00 275ms 478ms 496ms 0.00% 0.00%
└─────────────► color:80 100.00% 8.00 175ms 242ms 248ms 0.00%
which is even weirder -- why the distinction between the different rows?
Logs, error output, etc
See above.
output of linkerd check -o short
:; linkerd check -o short
Status check results are √
Environment
MacOS 15.1.1
k3d version 5.7.4
k3s version 1.30.4-k3s1
Linkerd edge-24.11.8
Faces 2.0.0-rc.2 😇
Possible solution
No response
Additional context
No response
Would you like to work on fixing this bug?
maybe