-
Notifications
You must be signed in to change notification settings - Fork 276
fix(grafana): use osm_request_duration_ms for latency graphs #4297
Conversation
Codecov Report
@@ Coverage Diff @@
## main #4297 +/- ##
==========================================
- Coverage 69.16% 69.11% -0.05%
==========================================
Files 211 211
Lines 14251 14251
==========================================
- Hits 9856 9849 -7
- Misses 4347 4354 +7
Partials 48 48
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jaellio did you already manually test this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding @eduser25 to comment in case this was intentionally removed. I recollect a few metrics that were removed to not blow up the Prometheus storage.
as far as I recollect most of the |
@snehachhabria That seems to be the case based on commit dc51517. |
If memory serves me, this was really heavy on mem consumption per pod on prom. |
fef7817
to
84a8d48
Compare
84a8d48
to
d2389d6
Compare
d2389d6
to
e7951c6
Compare
This PR replaces envoy_cluster_upstream_rq_time with the existing osm_request_duration_ms SMI metric to display latency in the mesh. Currently, the latency graphs are present in the pod to service, service to service, and workload to service dashboards. Unlike envoy_cluster_upstream_rq_time, the osm_request_duration_ms metric does not capture the src or destination service. Therefore, the latency graphs no longer fit on the dashboards that allow the user to specify a source service or see the latencies labeled with the envoy cluster name (which includes the destination service name). This PR removes the latency graphs from the pod to service, service to service, and workload to service dashboards and creates a new dashboard for workload to workload metrics. Signed-off-by: jaellio <jaellio@microsoft.com>
e7951c6
to
659f5ed
Compare
Description:
The
envoy_cluster_upstream_rq_time
metric was referenced in the pre-configured OSMGrafana dashboard, but was not being scraped by Prometheus which resulted in
graphs with no data.
This PR replaces
envoy_cluster_upstream_rq_time
with the existingosm_request_duration_ms
SMI metric to display latency in the mesh. Currently, the latencygraphs are present in the pod to service, service to service, and workload to service
dashboards. Unlike
envoy_cluster_upstream_rq_time
, theosm_request_duration_ms
metricdoes not capture the src or destination service. Therefore, the latency graphs no longer fit
on the dashboards that allow the user to specify a source service or see the latencies
labeled with the envoy cluster name (which includes the destination service name).
This PR removes the latency graphs from the pod to service, service to service, and
workload to service dashboards and creates a new dashboard for workload to workload
metrics. Additionally, to improve clarity "Source" is added to the appropriate dashboard
variables.
Note:
An earlier version of this PR added
envoy_cluster_upstream_rq_time
to the PrometheusConfigMap. The discussion surrounding this initial change can be found below.
OSM Workload to Workload Metrics (New):

OSM Workload to Service Metrics:

OSM Pod to Service Metrics:

OSM Service to Service Metrics:

Testing done:
Latency graphs that depended on the envoy_cluster_upstream_rq_time_bucket
histogram rendered as expected with the osm_request_duration_ms histogram. The
functionality of the variales on the new dashboard were also verified.
Affected area:
Please answer the following questions with yes/no.
Does this change contain code from or inspired by another project? No
Is this a breaking change? No