Update steps for setting up metrics on openshift, focusing on single … #953

david-martin · 2024-10-23T15:53:28Z

…cluster

combine service monitors for operators for easier deploy
update openshift install steps to reference this file and explain the setup better
remove reference to multi cluster thanos setup to avoid confusion about user workload thanos-querier and custom thanos
add missing metrics service proxy for scraping important metrics directly from the gateway pod

…cluster Signed-off-by: David Martin <davmarti@redhat.com>

codecov · 2024-10-23T15:56:48Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 78.75%. Comparing base (63f1d28) to head (27feb31).
Report is 29 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #953      +/-   ##
==========================================
- Coverage   81.49%   78.75%   -2.75%     
==========================================
  Files         102      113      +11     
  Lines        7177     9558    +2381     
==========================================
+ Hits         5849     7527    +1678     
- Misses        898     1620     +722     
+ Partials      430      411      -19

Flag	Coverage Δ
bare-k8s-integration	`8.95% <ø> (+0.05%)`	⬆️
controllers-integration	`67.62% <ø> (+2.29%)`	⬆️
envoygateway-integration	`46.99% <ø> (-3.31%)`	⬇️
gatewayapi-integration	`12.88% <ø> (-1.53%)`	⬇️
istio-integration	`48.07% <ø> (-5.45%)`	⬇️
unit	`27.81% <ø> (-0.53%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components	Coverage Δ
api/v1beta1 (u)	`90.00% <ø> (-0.91%)`	⬇️
api/v1beta2 (u)	`∅ <ø> (∅)`
pkg/common (u)	`87.67% <88.88%> (-0.47%)`	⬇️
pkg/istio (u)	`58.57% <72.05%> (-12.95%)`	⬇️
pkg/log (u)	`93.18% <ø> (-1.56%)`	⬇️
pkg/reconcilers (u)	`∅ <ø> (∅)`
pkg/rlptools (u)	`∅ <ø> (∅)`
controllers (i)	`81.93% <84.07%> (-1.13%)`	⬇️

see 72 files with indirect coverage changes

roivaz · 2024-10-24T09:38:08Z

doc/install/install-openshift.md


 ```bash
 kubectl apply -f https://raw.githubusercontent.com/Kuadrant/kuadrant-operator/main/config/observability/openshift/kube-state-metrics.yaml
 kubectl apply -k https://github.com/Kuadrant/gateway-api-state-metrics?ref=main
 ```

-To enable request metrics in Istio, you must create a `telemetry` resource as follows:
+To enable request metrics in Istio and scrape them, create the following resources:

 ```bash
 kubectl apply -f https://raw.githubusercontent.com/Kuadrant/kuadrant-operator/main/config/observability/openshift/telemetry.yaml


This is not strictly necessary to have request metrics in Istio, those are enabled by default. This Telemetry configuration adds the request path as a label to the request metrics, which is not always desirable as it is a high cardinality label that can flood your prometheus instance if you have a big API. For example each resource in an API would generate a different prometheus time-series. We probably should warn about this at least.

Good point.
I'll split this out and explain better with a warning.

roivaz · 2024-10-24T09:45:26Z

doc/user-guides/secure-protect-connect-single-multi-cluster.md

+```bash
+kubectl apply -f - <<EOF
+apiVersion: monitoring.coreos.com/v1
+kind: ServiceMonitor
+metadata:
+  name: ingress-gateway
+  namespace: ${gatewayNS}
+spec:
+  selector:
+    matchLabels:
+      istio.io/gateway-name: ${gatewayName}
+  endpoints:
+  - port: metrics
+    path: /stats/prometheus
+--- 
+apiVersion: v1
+kind: Service
+metadata:
+  name: ingress-metrics-proxy
+  namespace: ${gatewayNS}
+  labels:
+    istio.io/gateway-name: ${gatewayName}
+spec:
+  selector:
+    istio.io/gateway-name: ${gatewayName}
+  ports:
+  - name: metrics
+    protocol: TCP
+    port: 15020
+    targetPort: 15020    
+EOF
+```


This can be done using a single PodMonitor that targets all your Gateways because Istio annotates the gateway pods with the port where the metrics are served. It's slightly convoluted though, but has the advantage of targeting all gateways in the namespace with a single PodMonitor:

apiVersion: monitoring.coreos.com/v1 kind: PodMonitor metadata: name: istio-proxies-monitor spec: selector: matchExpressions: - key: istio-prometheus-ignore operator: DoesNotExist podMetricsEndpoints: - path: /stats/prometheus interval: 30s relabelings: - action: keep sourceLabels: ["__meta_kubernetes_pod_container_name"] regex: "istio-proxy" - action: keep sourceLabels: ["__meta_kubernetes_pod_annotationpresent_prometheus_io_scrape"] - action: replace regex: (\d+);(([A-Fa-f0-9]{1,4}::?){1,7}[A-Fa-f0-9]{1,4}) replacement: "[$2]:$1" sourceLabels: [ "__meta_kubernetes_pod_annotation_prometheus_io_port", "__meta_kubernetes_pod_ip", ] targetLabel: "__address__" - action: replace regex: (\d+);((([0-9]+?)(\.|$)){4}) replacement: "$2:$1" sourceLabels: [ "__meta_kubernetes_pod_annotation_prometheus_io_port", "__meta_kubernetes_pod_ip", ] targetLabel: "__address__" - action: labeldrop regex: "__meta_kubernetes_pod_label_(.+)" - sourceLabels: ["__meta_kubernetes_namespace"] action: replace targetLabel: namespace - sourceLabels: ["__meta_kubernetes_pod_name"] action: replace targetLabel: pod_name

I recall getting this from the Istio documentation, but I can't find it now ...

Ah, here is the source https://github.com/istio-ecosystem/sail-operator/blob/main/docs/README.md#observability-integrations
Sail documentation, not Istio's.

I have a similar looking 'additionalScrapeConfig' here but it doesn't work with user workload monitoring on Openshift due to restrictions on what can be configured.

If this single PodMonitor approach works with UWM, I think that would be more robust than the Service/ServiceMonitor approach.

Yes, I have tested this in OCP with UWM and works as expected.

I should note that I tested it using sail-operator to install Istio, but it should work the same for other Istio install methods.

roivaz

Just a couple of comments, looks good overall.

config/observability/prometheus/monitors/operators.yaml

jsmolar · 2024-10-24T09:53:41Z

doc/install/install-openshift.md

+
+    There is 1 more metrics configuration that needs to be applied so that all relevant metrics are being scraped.
+    That configuration depends on where you deploy your Gateway.
+    The steps to configure that are detailed in the follow on 'Secure, protect, and connect' guide.


Maybe we should provide link to Secure, protect, and connect

I thought about it, and held off as the link is at the end of the guide as a follow on.
However, no harm in linking here too for easier navigation options.

doc/install/install-openshift.md

jsmolar · 2024-10-24T09:59:58Z

doc/install/install-openshift.md


+For Grafana installation details, see [installing Grafana on OpenShift](https://cloud.redhat.com/experts/o11y/ocp-grafana/). When installed, you must [set up a data source to the thanos-querier route in the OpenShift cluster](https://docs.openshift.com/container-platform/4.15/observability/monitoring/accessing-third-party-monitoring-apis.html#accessing-metrics-from-outside-cluster_accessing-monitoring-apis-by-using-the-cli).


Thanos datasource setup is also described in `install Grafana on OpenShifte guide.

Good catch.
I think i'll call that out here, but keep the 2nd link as well as it's a more 'more details' and official way of accessing thanos-querier.

Signed-off-by: David Martin <davmarti@redhat.com>

Update steps for setting up metrics on openshift, focusing on single …

dc034c4

…cluster Signed-off-by: David Martin <davmarti@redhat.com>

david-martin requested review from trepel, jsmolar and roivaz October 23, 2024 15:53

roivaz reviewed Oct 24, 2024

View reviewed changes

jsmolar reviewed Oct 24, 2024

View reviewed changes

david-martin mentioned this pull request Oct 24, 2024

Consistent labeling for Operator metric Services #846

Open

Address feedback

27feb31

Signed-off-by: David Martin <davmarti@redhat.com>

roivaz approved these changes Oct 24, 2024

View reviewed changes

jsmolar approved these changes Oct 24, 2024

View reviewed changes

david-martin merged commit 28ba551 into main Oct 24, 2024
30 of 32 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update steps for setting up metrics on openshift, focusing on single … #953

Update steps for setting up metrics on openshift, focusing on single … #953

david-martin commented Oct 23, 2024

codecov bot commented Oct 23, 2024 •

edited

Loading

roivaz Oct 24, 2024

david-martin Oct 24, 2024

roivaz Oct 24, 2024

roivaz Oct 24, 2024

david-martin Oct 24, 2024

roivaz Oct 24, 2024

roivaz Oct 24, 2024

roivaz left a comment

jsmolar Oct 24, 2024

david-martin Oct 24, 2024

jsmolar Oct 24, 2024

david-martin Oct 24, 2024


		For Grafana installation details, see [installing Grafana on OpenShift](https://cloud.redhat.com/experts/o11y/ocp-grafana/). When installed, you must [set up a data source to the thanos-querier route in the OpenShift cluster](https://docs.openshift.com/container-platform/4.15/observability/monitoring/accessing-third-party-monitoring-apis.html#accessing-metrics-from-outside-cluster_accessing-monitoring-apis-by-using-the-cli).

Update steps for setting up metrics on openshift, focusing on single … #953

Update steps for setting up metrics on openshift, focusing on single … #953

Conversation

david-martin commented Oct 23, 2024

codecov bot commented Oct 23, 2024 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

roivaz left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented Oct 23, 2024 •

edited

Loading