Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update steps for setting up metrics on openshift, focusing on single … #953

Merged
merged 2 commits into from
Oct 24, 2024

Conversation

david-martin
Copy link
Contributor

…cluster

  • combine service monitors for operators for easier deploy
  • update openshift install steps to reference this file and explain the setup better
  • remove reference to multi cluster thanos setup to avoid confusion about user workload thanos-querier and custom thanos
  • add missing metrics service proxy for scraping important metrics directly from the gateway pod

…cluster

Signed-off-by: David Martin <davmarti@redhat.com>
Copy link

codecov bot commented Oct 23, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 78.75%. Comparing base (63f1d28) to head (27feb31).
Report is 29 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #953      +/-   ##
==========================================
- Coverage   81.49%   78.75%   -2.75%     
==========================================
  Files         102      113      +11     
  Lines        7177     9558    +2381     
==========================================
+ Hits         5849     7527    +1678     
- Misses        898     1620     +722     
+ Partials      430      411      -19     
Flag Coverage Δ
bare-k8s-integration 8.95% <ø> (+0.05%) ⬆️
controllers-integration 67.62% <ø> (+2.29%) ⬆️
envoygateway-integration 46.99% <ø> (-3.31%) ⬇️
gatewayapi-integration 12.88% <ø> (-1.53%) ⬇️
istio-integration 48.07% <ø> (-5.45%) ⬇️
unit 27.81% <ø> (-0.53%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
api/v1beta1 (u) 90.00% <ø> (-0.91%) ⬇️
api/v1beta2 (u) ∅ <ø> (∅)
pkg/common (u) 87.67% <88.88%> (-0.47%) ⬇️
pkg/istio (u) 58.57% <72.05%> (-12.95%) ⬇️
pkg/log (u) 93.18% <ø> (-1.56%) ⬇️
pkg/reconcilers (u) ∅ <ø> (∅)
pkg/rlptools (u) ∅ <ø> (∅)
controllers (i) 81.93% <84.07%> (-1.13%) ⬇️

see 72 files with indirect coverage changes


```bash
kubectl apply -f https://raw.githubusercontent.com/Kuadrant/kuadrant-operator/main/config/observability/openshift/kube-state-metrics.yaml
kubectl apply -k https://github.com/Kuadrant/gateway-api-state-metrics?ref=main
```

To enable request metrics in Istio, you must create a `telemetry` resource as follows:
To enable request metrics in Istio and scrape them, create the following resources:

```bash
kubectl apply -f https://raw.githubusercontent.com/Kuadrant/kuadrant-operator/main/config/observability/openshift/telemetry.yaml
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not strictly necessary to have request metrics in Istio, those are enabled by default. This Telemetry configuration adds the request path as a label to the request metrics, which is not always desirable as it is a high cardinality label that can flood your prometheus instance if you have a big API. For example each resource in an API would generate a different prometheus time-series. We probably should warn about this at least.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point.
I'll split this out and explain better with a warning.

Comment on lines 172 to 203
```bash
kubectl apply -f - <<EOF
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: ingress-gateway
namespace: ${gatewayNS}
spec:
selector:
matchLabels:
istio.io/gateway-name: ${gatewayName}
endpoints:
- port: metrics
path: /stats/prometheus
---
apiVersion: v1
kind: Service
metadata:
name: ingress-metrics-proxy
namespace: ${gatewayNS}
labels:
istio.io/gateway-name: ${gatewayName}
spec:
selector:
istio.io/gateway-name: ${gatewayName}
ports:
- name: metrics
protocol: TCP
port: 15020
targetPort: 15020
EOF
```
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be done using a single PodMonitor that targets all your Gateways because Istio annotates the gateway pods with the port where the metrics are served. It's slightly convoluted though, but has the advantage of targeting all gateways in the namespace with a single PodMonitor:

apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
  name: istio-proxies-monitor
spec:
  selector:
    matchExpressions:
      - key: istio-prometheus-ignore
        operator: DoesNotExist
  podMetricsEndpoints:
    - path: /stats/prometheus
      interval: 30s
      relabelings:
        - action: keep
          sourceLabels: ["__meta_kubernetes_pod_container_name"]
          regex: "istio-proxy"
        - action: keep
          sourceLabels:
            ["__meta_kubernetes_pod_annotationpresent_prometheus_io_scrape"]
        - action: replace
          regex: (\d+);(([A-Fa-f0-9]{1,4}::?){1,7}[A-Fa-f0-9]{1,4})
          replacement: "[$2]:$1"
          sourceLabels:
            [
              "__meta_kubernetes_pod_annotation_prometheus_io_port",
              "__meta_kubernetes_pod_ip",
            ]
          targetLabel: "__address__"
        - action: replace
          regex: (\d+);((([0-9]+?)(\.|$)){4})
          replacement: "$2:$1"
          sourceLabels:
            [
              "__meta_kubernetes_pod_annotation_prometheus_io_port",
              "__meta_kubernetes_pod_ip",
            ]
          targetLabel: "__address__"
        - action: labeldrop
          regex: "__meta_kubernetes_pod_label_(.+)"
        - sourceLabels: ["__meta_kubernetes_namespace"]
          action: replace
          targetLabel: namespace
        - sourceLabels: ["__meta_kubernetes_pod_name"]
          action: replace
          targetLabel: pod_name

I recall getting this from the Istio documentation, but I can't find it now ...

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a similar looking 'additionalScrapeConfig' here but it doesn't work with user workload monitoring on Openshift due to restrictions on what can be configured.

If this single PodMonitor approach works with UWM, I think that would be more robust than the Service/ServiceMonitor approach.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I have tested this in OCP with UWM and works as expected.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should note that I tested it using sail-operator to install Istio, but it should work the same for other Istio install methods.

Copy link

@roivaz roivaz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a couple of comments, looks good overall.


There is 1 more metrics configuration that needs to be applied so that all relevant metrics are being scraped.
That configuration depends on where you deploy your Gateway.
The steps to configure that are detailed in the follow on 'Secure, protect, and connect' guide.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should provide link to Secure, protect, and connect

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about it, and held off as the link is at the end of the guide as a follow on.
However, no harm in linking here too for easier navigation options.

doc/install/install-openshift.md Show resolved Hide resolved

For Grafana installation details, see [installing Grafana on OpenShift](https://cloud.redhat.com/experts/o11y/ocp-grafana/). When installed, you must [set up a data source to the thanos-querier route in the OpenShift cluster](https://docs.openshift.com/container-platform/4.15/observability/monitoring/accessing-third-party-monitoring-apis.html#accessing-metrics-from-outside-cluster_accessing-monitoring-apis-by-using-the-cli).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanos datasource setup is also described in `install Grafana on OpenShifte guide.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch.
I think i'll call that out here, but keep the 2nd link as well as it's a more 'more details' and official way of accessing thanos-querier.

Signed-off-by: David Martin <davmarti@redhat.com>
@david-martin david-martin merged commit 28ba551 into main Oct 24, 2024
30 of 32 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

3 participants