Skip to content

Ensure Platform Prometheus targets are protected #30014

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

hongkailiu
Copy link
Member

@hongkailiu hongkailiu commented Jul 23, 2025

The new test checks (only for OpenShift components)

  • if each service monitor has authorization configuration, and
  • if each Prometheus active target denies requests without authorization.

Either of the above is not satisfied leads to failure of the test.

Moreover, The env. var. MONITORING_AUTH_TEST_NAMESPACE can be used to focus on
validating the resources from a single namespace.

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 23, 2025
@openshift-ci openshift-ci bot requested review from machine424 and slashpai July 23, 2025 20:14
@petr-muller
Copy link
Member

/cc

@openshift-ci openshift-ci bot requested a review from petr-muller July 24, 2025 09:54
@hongkailiu hongkailiu force-pushed the servicemonitor branch 4 times, most recently from 307a1cc to 1d33862 Compare July 24, 2025 15:59
Copy link

openshift-trt bot commented Jul 24, 2025

Job Failure Risk Analysis for sha: 1d33862

Job Name Failure Risk
pull-ci-openshift-origin-main-e2e-aws-disruptive IncompleteTests
Tests for this run (106) are below the historical average (203): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-aws-ovn-etcd-scaling Low
[bz-Cloud Compute] clusteroperator/control-plane-machine-set should not change condition/Degraded
This test has passed 50.00% of 2 runs on release 4.20 [Architecture:amd64 FeatureSet:default Installer:ipi JobTier:rare Network:ovn NetworkStack:ipv4 Owner:eng Platform:aws SecurityMode:default Topology:ha Upgrade:none] in the last week.

Open Bugs
etcd-scaling jobs failing ~60% of the time
pull-ci-openshift-origin-main-e2e-azure-ovn-upgrade IncompleteTests
Tests for this run (2139) are below the historical average (4234): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-gcp-disruptive IncompleteTests
Tests for this run (24) are below the historical average (214): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)

@hongkailiu
Copy link
Member Author

Some testing result: The failure is expected as https://issues.redhat.com/browse/OCPBUGS-57585 is not fixed yet.

launch 4.19 azure

$ git --no-pager log --pretty=oneline -1
ef1d7b2fb67c5db7aa441e270b5b6792f56112c0 (HEAD -> servicemonitor, hongkailiu/servicemonitor) Ensure ServiceMonitor's endpoints are protected
$ make WHAT=cmd/openshift-tests
$ cat /tmp/osServicePrincipal.json
{}

$ COMPONENT_NAMESPACE=openshift-cluster-version KUBECONFIG=/Users/hongkliu/.kube/config AZURE_AUTH_LOCATION=/tmp/osServicePrincipal.json ./openshift-tests run-test "[sig-instrumentation][Late] OpenShift service monitors [apigroup:image.openshift.io] should not be accessible without authorization"                
 ...
  Running Suite:  - /Users/hongkliu/repo/openshift/origin
  =======================================================
  Random Seed: 1753732342 - will randomize all specs

  Will run 1 of 1 specs
  ------------------------------
  [sig-instrumentation][Late] OpenShift service monitors [apigroup:image.openshift.io] should not be accessible without authorization
  github.com/openshift/origin/test/extended/prometheus/prometheus.go:72
    STEP: Creating a kubernetes client @ 07/28/25 15:52:23.353
  I0728 15:52:23.353887   21800 discovery.go:214] Invalidating discovery information
    STEP: verifying all service monitors are configured with authorization @ 07/28/25 15:52:23.426
  I0728 15:52:23.468686 21800 prometheus.go:92] service monitor openshift-cluster-version/cluster-version-operator has authorization
    STEP: verifying all targets returns 401 or 403 without authorization @ 07/28/25 15:52:23.468
  I0728 15:52:24.250271 21800 builder.go:121] Running '/Users/hongkliu/bin/kubectl --server=https://api.ci-ln-j1glv1b-1d09d.ci.azure.devcluster.openshift.com:6443 --kubeconfig=/Users/hongkliu/.kube/config --namespace=openshift-monitoring exec prometheus-k8s-0 -- /bin/sh -x -c curl -k -s -o /dev/null -w '%{http_code}' "https://10.0.0.5:9099/metrics"'
  I0728 15:52:25.138655 21800 builder.go:146] stderr: "+ curl -k -s -o /dev/null -w '%{http_code}' https://10.0.0.5:9099/metrics\n"
  I0728 15:52:25.138757 21800 builder.go:147] stdout: "200"
    [FAILED] in [It] - github.com/openshift/origin/test/extended/prometheus/prometheus.go:119 @ 07/28/25 15:52:25.139
  • [FAILED] [1.799 seconds]
  [sig-instrumentation][Late] OpenShift service monitors [apigroup:image.openshift.io] [It] should not be accessible without authorization
  github.com/openshift/origin/test/extended/prometheus/prometheus.go:72

    [FAILED] Expected
        <[]error | len:1, cap:1>: [
            <*fmt.wrapError | 0x1400789e0c0>{
                msg: "the scaple url https://10.0.0.5:9099/metrics for namespace openshift-cluster-version is accessible without authorization: last response from server was not in [401 403]: 200",
                err: <*errors.errorString | 0x140079f81b0>{
                    s: "last response from server was not in [401 403]: 200",
                },
            },
        ]
    to be empty
    In [It] at: github.com/openshift/origin/test/extended/prometheus/prometheus.go:119 @ 07/28/25 15:52:25.139
  ------------------------------

  Summarizing 1 Failure:
    [FAIL] [sig-instrumentation][Late] OpenShift service monitors [apigroup:image.openshift.io] [It] should not be accessible without authorization
    github.com/openshift/origin/test/extended/prometheus/prometheus.go:119

  Ran 1 of 1 Specs in 1.799 seconds
  FAIL! -- 0 Passed | 1 Failed | 0 Pending | 0 Skipped
[
  {
    "name": "[sig-instrumentation][Late] OpenShift service monitors [apigroup:image.openshift.io] should not be accessible without authorization",
    "lifecycle": "blocking",
    "duration": 1798,
    "startTime": "2025-07-28 19:52:23.342847 UTC",
    "endTime": "2025-07-28 19:52:25.141598 UTC",
    "result": "failed",
    "output": "  STEP: Creating a kubernetes client @ 07/28/25 15:52:23.353\n  STEP: verifying all service monitors are configured with authorization @ 07/28/25 15:52:23.426\nI0728 15:52:23.468686 21800 prometheus.go:92] service monitor openshift-cluster-version/cluster-version-operator has authorization\n  STEP: verifying all targets returns 401 or 403 without authorization @ 07/28/25 15:52:23.468\nI0728 15:52:24.250271 21800 builder.go:121] Running '/Users/hongkliu/bin/kubectl --server=https://api.ci-ln-j1glv1b-1d09d.ci.azure.devcluster.openshift.com:6443 --kubeconfig=/Users/hongkliu/.kube/config --namespace=openshift-monitoring exec prometheus-k8s-0 -- /bin/sh -x -c curl -k -s -o /dev/null -w '%{http_code}' \"https://10.0.0.5:9099/metrics\"'\nI0728 15:52:25.138655 21800 builder.go:146] stderr: \"+ curl -k -s -o /dev/null -w '%{http_code}' https://10.0.0.5:9099/metrics\\n\"\nI0728 15:52:25.138757 21800 builder.go:147] stdout: \"200\"\n  [FAILED] in [It] - github.com/openshift/origin/test/extended/prometheus/prometheus.go:119 @ 07/28/25 15:52:25.139\n",
    "error": "fail [github.com/openshift/origin/test/extended/prometheus/prometheus.go:119]: Expected\n    \u003c[]error | len:1, cap:1\u003e: [\n        \u003c*fmt.wrapError | 0x1400789e0c0\u003e{\n            msg: \"the scaple url https://10.0.0.5:9099/metrics for namespace openshift-cluster-version is accessible without authorization: last response from server was not in [401 403]: 200\",\n            err: \u003c*errors.errorString | 0x140079f81b0\u003e{\n                s: \"last response from server was not in [401 403]: 200\",\n            },\n        },\n    ]\nto be empty"
  }
]Error: 1 tests failed
error: 1 tests failed

@hongkailiu hongkailiu changed the title [wip]Ensure ServiceMonitor's endpoints are protected Ensure ServiceMonitor's endpoints are protected Jul 28, 2025
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 28, 2025
@juzhao
Copy link

juzhao commented Jul 29, 2025

FYI: see bugs for endpoints without authorization(except CVO) in https://issues.redhat.com/browse/MON-4304

@juzhao
Copy link

juzhao commented Jul 29, 2025

/retest-required

Copy link

openshift-trt bot commented Jul 29, 2025

Job Failure Risk Analysis for sha: ef1d7b2

Job Name Failure Risk
pull-ci-openshift-origin-main-e2e-aws-disruptive Medium
Job run should complete before timeout
This test has passed 92.31% of 13 runs on release 4.20 [Architecture:amd64 FeatureSet:default Installer:ipi JobTier:hidden Network:ovn NetworkStack:ipv4 Owner:eng Platform:aws Procedure:none SecurityMode:default Topology:ha Upgrade:micro-downgrade] in the last week.

@juzhao
Copy link

juzhao commented Jul 29, 2025

/test verify

Copy link

openshift-trt bot commented Jul 29, 2025

Risk analysis has seen new tests most likely introduced by this PR.
Please ensure that new tests meet guidelines for naming and stability.

New Test Risks for sha: b105e70

Job Name New Test Risk
pull-ci-openshift-origin-main-e2e-agnostic-ovn-cmd High - "[sig-instrumentation][Late] OpenShift service monitors [apigroup:image.openshift.io] should not be accessible without authorization [Suite:openshift/conformance/parallel]" is a new test that failed 1 time(s) against the current commit
pull-ci-openshift-origin-main-e2e-aws-ovn High - "[sig-instrumentation][Late] OpenShift service monitors [apigroup:image.openshift.io] should not be accessible without authorization [Suite:openshift/conformance/parallel]" is a new test that failed 1 time(s) against the current commit
pull-ci-openshift-origin-main-e2e-aws-ovn-cgroupsv2 High - "[sig-instrumentation][Late] OpenShift service monitors [apigroup:image.openshift.io] should not be accessible without authorization [Suite:openshift/conformance/parallel]" is a new test that failed 1 time(s) against the current commit
pull-ci-openshift-origin-main-e2e-aws-ovn-edge-zones High - "[sig-instrumentation][Late] OpenShift service monitors [apigroup:image.openshift.io] should not be accessible without authorization [Suite:openshift/conformance/parallel]" is a new test that failed 1 time(s) against the current commit
pull-ci-openshift-origin-main-e2e-aws-ovn-etcd-scaling High - "[sig-instrumentation][Late] OpenShift service monitors [apigroup:image.openshift.io] should not be accessible without authorization [Suite:openshift/conformance/parallel]" is a new test that failed 1 time(s) against the current commit
pull-ci-openshift-origin-main-e2e-aws-ovn-fips High - "[sig-instrumentation][Late] OpenShift service monitors [apigroup:image.openshift.io] should not be accessible without authorization [Suite:openshift/conformance/parallel]" is a new test that failed 1 time(s) against the current commit
pull-ci-openshift-origin-main-e2e-aws-ovn-kube-apiserver-rollout High - "[sig-instrumentation][Late] OpenShift service monitors [apigroup:image.openshift.io] should not be accessible without authorization [Suite:openshift/conformance/parallel]" is a new test that failed 1 time(s) against the current commit
pull-ci-openshift-origin-main-e2e-aws-ovn-serial-1of2 High - "[sig-instrumentation][Late] OpenShift service monitors [apigroup:image.openshift.io] should not be accessible without authorization [Suite:openshift/conformance/parallel]" is a new test that failed 1 time(s) against the current commit
pull-ci-openshift-origin-main-e2e-aws-ovn-serial-2of2 High - "[sig-instrumentation][Late] OpenShift service monitors [apigroup:image.openshift.io] should not be accessible without authorization [Suite:openshift/conformance/parallel]" is a new test that failed 1 time(s) against the current commit
pull-ci-openshift-origin-main-e2e-aws-ovn-single-node High - "[sig-instrumentation][Late] OpenShift service monitors [apigroup:image.openshift.io] should not be accessible without authorization [Suite:openshift/conformance/parallel]" is a new test that failed 1 time(s) against the current commit
pull-ci-openshift-origin-main-e2e-aws-ovn-single-node-serial High - "[sig-instrumentation][Late] OpenShift service monitors [apigroup:image.openshift.io] should not be accessible without authorization [Suite:openshift/conformance/parallel]" is a new test that failed 1 time(s) against the current commit
pull-ci-openshift-origin-main-e2e-aws-ovn-single-node-upgrade High - "[sig-instrumentation][Late] OpenShift service monitors [apigroup:image.openshift.io] should not be accessible without authorization [Suite:openshift/conformance/parallel]" is a new test that failed 1 time(s) against the current commit
pull-ci-openshift-origin-main-e2e-aws-proxy High - "[sig-instrumentation][Late] OpenShift service monitors [apigroup:image.openshift.io] should not be accessible without authorization [Suite:openshift/conformance/parallel]" is a new test that failed 1 time(s) against the current commit
pull-ci-openshift-origin-main-e2e-azure High - "[sig-instrumentation][Late] OpenShift service monitors [apigroup:image.openshift.io] should not be accessible without authorization [Suite:openshift/conformance/parallel]" is a new test that failed 1 time(s) against the current commit
pull-ci-openshift-origin-main-e2e-azure-ovn-etcd-scaling High - "[sig-instrumentation][Late] OpenShift service monitors [apigroup:image.openshift.io] should not be accessible without authorization [Suite:openshift/conformance/parallel]" is a new test that failed 1 time(s) against the current commit
pull-ci-openshift-origin-main-e2e-gcp-ovn High - "[sig-instrumentation][Late] OpenShift service monitors [apigroup:image.openshift.io] should not be accessible without authorization [Suite:openshift/conformance/parallel]" is a new test that failed 1 time(s) against the current commit
pull-ci-openshift-origin-main-e2e-gcp-ovn-techpreview-serial-2of2 High - "[sig-instrumentation][Late] OpenShift service monitors [apigroup:image.openshift.io] should not be accessible without authorization [Suite:openshift/conformance/parallel]" is a new test that failed 1 time(s) against the current commit
pull-ci-openshift-origin-main-e2e-hypershift-conformance High - "[sig-instrumentation][Late] OpenShift service monitors [apigroup:image.openshift.io] should not be accessible without authorization [Suite:openshift/conformance/parallel]" is a new test that failed 1 time(s) against the current commit
pull-ci-openshift-origin-main-e2e-metal-ipi-ovn High - "[sig-instrumentation][Late] OpenShift service monitors [apigroup:image.openshift.io] should not be accessible without authorization [Suite:openshift/conformance/parallel]" is a new test that failed 1 time(s) against the current commit
pull-ci-openshift-origin-main-e2e-metal-ipi-ovn-dualstack High - "[sig-instrumentation][Late] OpenShift service monitors [apigroup:image.openshift.io] should not be accessible without authorization [Suite:openshift/conformance/parallel]" is a new test that failed 1 time(s) against the current commit
(...showing 20 of 34 rows)

New tests seen in this PR at sha: b105e70

  • "[sig-instrumentation][Late] OpenShift service monitors [apigroup:image.openshift.io] should not be accessible without authorization [Suite:openshift/conformance/parallel]" [Total: 34, Pass: 0, Fail: 34, Flake: 0]

Copy link
Contributor

openshift-ci bot commented Jul 30, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: hongkailiu
Once this PR has been reviewed and has the lgtm label, please assign simonpasquier for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link

openshift-trt bot commented Jul 30, 2025

Job Failure Risk Analysis for sha: ba8f912

Job Name Failure Risk
pull-ci-openshift-origin-main-e2e-aws-disruptive Medium
Job run should complete before timeout
This test has passed 92.86% of 14 runs on release 4.20 [Architecture:amd64 FeatureSet:default Installer:ipi JobTier:hidden Network:ovn NetworkStack:ipv4 Owner:eng Platform:aws Procedure:none SecurityMode:default Topology:ha Upgrade:micro-downgrade] in the last week.
pull-ci-openshift-origin-main-e2e-azure-ovn-upgrade IncompleteTests
Tests for this run (2146) are below the historical average (4013): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)

Risk analysis has seen new tests most likely introduced by this PR.
Please ensure that new tests meet guidelines for naming and stability.

New Test Risks for sha: ba8f912

Job Name New Test Risk
pull-ci-openshift-origin-main-e2e-agnostic-ovn-cmd High - "[sig-instrumentation][Late] OpenShift service monitors [apigroup:image.openshift.io] should not be accessible without authorization [Suite:openshift/conformance/parallel]" is a new test that failed 1 time(s) against the current commit
pull-ci-openshift-origin-main-e2e-aws-ovn High - "[sig-instrumentation][Late] OpenShift service monitors [apigroup:image.openshift.io] should not be accessible without authorization [Suite:openshift/conformance/parallel]" is a new test that failed 1 time(s) against the current commit
pull-ci-openshift-origin-main-e2e-aws-ovn-cgroupsv2 High - "[sig-instrumentation][Late] OpenShift service monitors [apigroup:image.openshift.io] should not be accessible without authorization [Suite:openshift/conformance/parallel]" is a new test that failed 1 time(s) against the current commit
pull-ci-openshift-origin-main-e2e-aws-ovn-edge-zones High - "[sig-instrumentation][Late] OpenShift service monitors [apigroup:image.openshift.io] should not be accessible without authorization [Suite:openshift/conformance/parallel]" is a new test that failed 1 time(s) against the current commit
pull-ci-openshift-origin-main-e2e-aws-ovn-etcd-scaling High - "[sig-instrumentation][Late] OpenShift service monitors [apigroup:image.openshift.io] should not be accessible without authorization [Suite:openshift/conformance/parallel]" is a new test that failed 1 time(s) against the current commit
pull-ci-openshift-origin-main-e2e-aws-ovn-fips High - "[sig-instrumentation][Late] OpenShift service monitors [apigroup:image.openshift.io] should not be accessible without authorization [Suite:openshift/conformance/parallel]" is a new test that failed 1 time(s) against the current commit
pull-ci-openshift-origin-main-e2e-aws-ovn-kube-apiserver-rollout High - "[sig-instrumentation][Late] OpenShift service monitors [apigroup:image.openshift.io] should not be accessible without authorization [Suite:openshift/conformance/parallel]" is a new test that failed 1 time(s) against the current commit
pull-ci-openshift-origin-main-e2e-aws-ovn-serial-1of2 High - "[sig-instrumentation][Late] OpenShift service monitors [apigroup:image.openshift.io] should not be accessible without authorization [Suite:openshift/conformance/parallel]" is a new test that failed 1 time(s) against the current commit
pull-ci-openshift-origin-main-e2e-aws-ovn-serial-2of2 High - "[sig-instrumentation][Late] OpenShift service monitors [apigroup:image.openshift.io] should not be accessible without authorization [Suite:openshift/conformance/parallel]" is a new test that failed 1 time(s) against the current commit
pull-ci-openshift-origin-main-e2e-aws-ovn-single-node High - "[sig-instrumentation][Late] OpenShift service monitors [apigroup:image.openshift.io] should not be accessible without authorization [Suite:openshift/conformance/parallel]" is a new test that failed 1 time(s) against the current commit
pull-ci-openshift-origin-main-e2e-aws-ovn-single-node-serial High - "[sig-instrumentation][Late] OpenShift service monitors [apigroup:image.openshift.io] should not be accessible without authorization [Suite:openshift/conformance/parallel]" is a new test that failed 1 time(s) against the current commit
pull-ci-openshift-origin-main-e2e-aws-proxy High - "[sig-instrumentation][Late] OpenShift service monitors [apigroup:image.openshift.io] should not be accessible without authorization [Suite:openshift/conformance/parallel]" is a new test that failed 1 time(s) against the current commit
pull-ci-openshift-origin-main-e2e-azure High - "[sig-instrumentation][Late] OpenShift service monitors [apigroup:image.openshift.io] should not be accessible without authorization [Suite:openshift/conformance/parallel]" is a new test that failed 1 time(s) against the current commit
pull-ci-openshift-origin-main-e2e-azure-ovn-etcd-scaling High - "[sig-instrumentation][Late] OpenShift service monitors [apigroup:image.openshift.io] should not be accessible without authorization [Suite:openshift/conformance/parallel]" is a new test that failed 1 time(s) against the current commit
pull-ci-openshift-origin-main-e2e-gcp-fips-serial-1of2 High - "[sig-instrumentation][Late] OpenShift service monitors [apigroup:image.openshift.io] should not be accessible without authorization [Suite:openshift/conformance/parallel]" is a new test that failed 1 time(s) against the current commit
pull-ci-openshift-origin-main-e2e-gcp-fips-serial-2of2 High - "[sig-instrumentation][Late] OpenShift service monitors [apigroup:image.openshift.io] should not be accessible without authorization [Suite:openshift/conformance/parallel]" is a new test that failed 1 time(s) against the current commit
pull-ci-openshift-origin-main-e2e-gcp-ovn High - "[sig-instrumentation][Late] OpenShift service monitors [apigroup:image.openshift.io] should not be accessible without authorization [Suite:openshift/conformance/parallel]" is a new test that failed 1 time(s) against the current commit
pull-ci-openshift-origin-main-e2e-gcp-ovn-etcd-scaling High - "[sig-instrumentation][Late] OpenShift service monitors [apigroup:image.openshift.io] should not be accessible without authorization [Suite:openshift/conformance/parallel]" is a new test that failed 1 time(s) against the current commit
pull-ci-openshift-origin-main-e2e-gcp-ovn-techpreview High - "[sig-instrumentation][Late] OpenShift service monitors [apigroup:image.openshift.io] should not be accessible without authorization [Suite:openshift/conformance/parallel]" is a new test that failed 1 time(s) against the current commit
pull-ci-openshift-origin-main-e2e-gcp-ovn-techpreview-serial-1of2 High - "[sig-instrumentation][Late] OpenShift service monitors [apigroup:image.openshift.io] should not be accessible without authorization [Suite:openshift/conformance/parallel]" is a new test that failed 1 time(s) against the current commit
(...showing 20 of 38 rows)

New tests seen in this PR at sha: ba8f912

  • "[sig-instrumentation][Late] OpenShift service monitors [apigroup:image.openshift.io] should not be accessible without authorization [Suite:openshift/conformance/parallel]" [Total: 38, Pass: 0, Fail: 38, Flake: 0]

@hongkailiu
Copy link
Member Author

hongkailiu commented Jul 30, 2025

Interesting, I need to figure out how these are protected:

{  fail [github.com/openshift/origin/test/extended/prometheus/prometheus.go:138]: Expected
    <[]error | len:4, cap:4>: [
        <*errors.errorString | 0xc002c39550>{
            s: "service monitor openshift-cluster-node-tuning-operator/node-tuning-operator has no authorization",
        },
        <*errors.errorString | 0xc002c395b0>{
            s: "service monitor openshift-cluster-storage-operator/cluster-storage-operator has no authorization",
        },
        <*errors.errorString | 0xc002c39970>{
            s: "service monitor openshift-etcd-operator/etcd has no authorization",
        },
        <*errors.errorString | 0xc002c399c0>{
            s: "service monitor openshift-etcd-operator/etcd-minimal has no authorization",
        },
    ]
to be empty}

They (checked the first two) returned 403.

My current guess:

https://github.com/openshift/enhancements/blob/master/enhancements/monitoring/client-cert-scraping.md

A servicemonitor grows a way to specify using this client cert. Technically this is optional. If you find it too difficult, just always use the client-cert for scraping since the secrets never leave disk. Those targets which don't support it will simply ignore the client-cert.

They can be protected without any configuration on the service monitor?

Edit:

servicemonitor.spec.endpoints.tlsConfig has to be configured in that case.

@hongkailiu hongkailiu force-pushed the servicemonitor branch 2 times, most recently from dee0844 to f68c4ec Compare July 30, 2025 12:01
@hongkailiu hongkailiu force-pushed the servicemonitor branch 3 times, most recently from 7dda1a8 to 9ccab35 Compare August 5, 2025 20:23
@hongkailiu
Copy link
Member Author

Let's add that target/namespace to the ones to ignore and open an OCPBUGS to the team to make sure the appropriate code is returned.

https://issues.redhat.com/browse/OCPBUGS-60159

Ignored the namespace openshift-cluster-csi-drivers.

Copy link

openshift-trt bot commented Aug 6, 2025

Risk analysis has seen new tests most likely introduced by this PR.
Please ensure that new tests meet guidelines for naming and stability.

New tests seen in this PR at sha: 9ccab35

  • "[sig-instrumentation][Late] Platform Prometheus targets [apigroup:image.openshift.io] should not be accessible without auth [Serial] [Suite:openshift/conformance/serial]" [Total: 4, Pass: 4, Fail: 0, Flake: 0]

@hongkailiu hongkailiu force-pushed the servicemonitor branch 2 times, most recently from 8312214 to cd59271 Compare August 6, 2025 01:19
Copy link

openshift-trt bot commented Aug 6, 2025

Job Failure Risk Analysis for sha: cd59271

Job Name Failure Risk
pull-ci-openshift-origin-main-e2e-aws-ovn-cgroupsv2 Medium
[sig-node] Pods Extended Pod Container lifecycle evicted pods should be terminal [Suite:openshift/conformance/parallel] [Suite:k8s]
This test has passed 95.15% of 2060 runs on release 4.20 [Overall] in the last week.

Open Bugs
Evicted pods should be terminal test flakes too often

Risk analysis has seen new tests most likely introduced by this PR.
Please ensure that new tests meet guidelines for naming and stability.

New tests seen in this PR at sha: cd59271

  • "[sig-instrumentation][Late] Platform Prometheus targets [apigroup:image.openshift.io] should not be accessible without auth [Serial] [Suite:openshift/conformance/serial]" [Total: 6, Pass: 6, Fail: 0, Flake: 0]

Copy link

openshift-trt bot commented Aug 6, 2025

Job Failure Risk Analysis for sha: cd59271

Job Name Failure Risk
pull-ci-openshift-origin-main-e2e-aws-ovn-cgroupsv2 Medium
[sig-node] Pods Extended Pod Container lifecycle evicted pods should be terminal [Suite:openshift/conformance/parallel] [Suite:k8s]
This test has passed 95.05% of 1959 runs on release 4.20 [Overall] in the last week.

Open Bugs
Evicted pods should be terminal test flakes too often

Risk analysis has seen new tests most likely introduced by this PR.
Please ensure that new tests meet guidelines for naming and stability.

New tests seen in this PR at sha: cd59271

  • "[sig-instrumentation][Late] Platform Prometheus targets [apigroup:image.openshift.io] should not be accessible without auth [Serial] [Suite:openshift/conformance/serial]" [Total: 7, Pass: 7, Fail: 0, Flake: 0]

Copy link

openshift-trt bot commented Aug 6, 2025

Job Failure Risk Analysis for sha: cd59271

Job Name Failure Risk
pull-ci-openshift-origin-main-e2e-aws-ovn-cgroupsv2 Medium
[sig-node] Pods Extended Pod Container lifecycle evicted pods should be terminal [Suite:openshift/conformance/parallel] [Suite:k8s]
This test has passed 95.05% of 1959 runs on release 4.20 [Overall] in the last week.

Open Bugs
Evicted pods should be terminal test flakes too often

Risk analysis has seen new tests most likely introduced by this PR.
Please ensure that new tests meet guidelines for naming and stability.

New tests seen in this PR at sha: cd59271

  • "[sig-instrumentation][Late] Platform Prometheus targets [apigroup:image.openshift.io] should not be accessible without auth [Serial] [Suite:openshift/conformance/serial]" [Total: 8, Pass: 8, Fail: 0, Flake: 0]

}
})

g.It("should not be accessible without auth [Serial]", func() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: as along as no disruptive test is running in parallel, I don't see why we need the Serial in here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I started the pull with Parallel but hit the error like this in CI.
I do not see the error any more after switching to Serial.

I would like to get the case covered and avoid noises from CI at the beginning.
After some iteration, we can always turn it into Parallel if needed.

@hongkailiu hongkailiu force-pushed the servicemonitor branch 2 times, most recently from f270eb5 to 3016340 Compare August 8, 2025 15:15
Copy link

openshift-trt bot commented Aug 8, 2025

Job Failure Risk Analysis for sha: ec95d6e

Job Name Failure Risk
pull-ci-openshift-origin-main-e2e-aws-disruptive IncompleteTests
Tests for this run (31) are below the historical average (327): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-aws-ovn-single-node-upgrade IncompleteTests

Risk analysis has seen new tests most likely introduced by this PR.
Please ensure that new tests meet guidelines for naming and stability.

New Test Risks for sha: ec95d6e

Job Name New Test Risk
pull-ci-openshift-origin-main-e2e-aws-ovn-microshift-serial High - "[sig-instrumentation][Late] Platform Prometheus targets should not be accessible without auth [Serial] [Suite:openshift/conformance/serial]" is a new test that failed 1 time(s) against the current commit

New tests seen in this PR at sha: ec95d6e

  • "[sig-instrumentation][Late] Platform Prometheus targets should not be accessible without auth [Serial] [Suite:openshift/conformance/serial]" [Total: 6, Pass: 5, Fail: 1, Flake: 0]

@hongkailiu
Copy link
Member Author

https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/30014/pull-ci-openshift-origin-main-e2e-aws-ovn-microshift-serial/1953890592245157888 failed with

unable to get the prometheus-k8s route in the openshift-monitoring namespace: routes.route.openshift.io "prometheus-k8s" not found

/test e2e-aws-ovn-microshift-serial

Copy link

openshift-trt bot commented Aug 9, 2025

Job Failure Risk Analysis for sha: ec95d6e

Job Name Failure Risk
pull-ci-openshift-origin-main-e2e-aws-disruptive IncompleteTests
Tests for this run (31) are below the historical average (346): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)

Risk analysis has seen new tests most likely introduced by this PR.
Please ensure that new tests meet guidelines for naming and stability.

New Test Risks for sha: ec95d6e

Job Name New Test Risk
pull-ci-openshift-origin-main-e2e-aws-ovn-microshift-serial High - "[sig-instrumentation][Late] Platform Prometheus targets should not be accessible without auth [Serial] [Suite:openshift/conformance/serial]" is a new test that was not present in all runs against the current commit, and also failed 1 time(s).

New tests seen in this PR at sha: ec95d6e

  • "[sig-instrumentation][Late] Platform Prometheus targets should not be accessible without auth [Serial] [Suite:openshift/conformance/serial]" [Total: 11, Pass: 10, Fail: 1, Flake: 0]

Copy link

openshift-trt bot commented Aug 9, 2025

Job Failure Risk Analysis for sha: ec95d6e

Job Name Failure Risk
pull-ci-openshift-origin-main-e2e-aws-disruptive IncompleteTests
Tests for this run (31) are below the historical average (346): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)

Risk analysis has seen new tests most likely introduced by this PR.
Please ensure that new tests meet guidelines for naming and stability.

New Test Risks for sha: ec95d6e

Job Name New Test Risk
pull-ci-openshift-origin-main-e2e-aws-ovn-microshift-serial High - "[sig-instrumentation][Late] Platform Prometheus targets should not be accessible without auth [Serial] [Suite:openshift/conformance/serial]" is a new test that failed 2 time(s) against the current commit

New tests seen in this PR at sha: ec95d6e

  • "[sig-instrumentation][Late] Platform Prometheus targets should not be accessible without auth [Serial] [Suite:openshift/conformance/serial]" [Total: 11, Pass: 9, Fail: 2, Flake: 0]

@juzhao
Copy link

juzhao commented Aug 11, 2025

/retest-required

Copy link
Contributor

openshift-ci bot commented Aug 11, 2025

@hongkailiu: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-gcp-disruptive 81a4b61 link false /test e2e-gcp-disruptive
ci/prow/e2e-openstack-serial 81a4b61 link false /test e2e-openstack-serial
ci/prow/e2e-azure-ovn-upgrade 81a4b61 link false /test e2e-azure-ovn-upgrade
ci/prow/e2e-gcp-fips-serial-1of2 81a4b61 link false /test e2e-gcp-fips-serial-1of2
ci/prow/e2e-azure-ovn-etcd-scaling 81a4b61 link false /test e2e-azure-ovn-etcd-scaling
ci/prow/e2e-aws-ovn-etcd-scaling 81a4b61 link false /test e2e-aws-ovn-etcd-scaling
ci/prow/e2e-gcp-ovn-etcd-scaling 81a4b61 link false /test e2e-gcp-ovn-etcd-scaling
ci/prow/e2e-vsphere-ovn-dualstack-primaryv6 81a4b61 link false /test e2e-vsphere-ovn-dualstack-primaryv6
ci/prow/e2e-vsphere-ovn-etcd-scaling 81a4b61 link false /test e2e-vsphere-ovn-etcd-scaling
ci/prow/e2e-gcp-fips-serial-2of2 81a4b61 link false /test e2e-gcp-fips-serial-2of2
ci/prow/e2e-gcp-ovn-rt-upgrade ec95d6e link false /test e2e-gcp-ovn-rt-upgrade
ci/prow/e2e-aws-proxy ec95d6e link false /test e2e-aws-proxy
ci/prow/e2e-openstack-ovn ec95d6e link false /test e2e-openstack-ovn
ci/prow/okd-scos-e2e-aws-ovn ec95d6e link false /test okd-scos-e2e-aws-ovn
ci/prow/e2e-metal-ipi-virtualmedia ec95d6e link false /test e2e-metal-ipi-virtualmedia
ci/prow/e2e-hypershift-conformance ec95d6e link false /test e2e-hypershift-conformance
ci/prow/e2e-aws-ovn-single-node-serial ec95d6e link false /test e2e-aws-ovn-single-node-serial
ci/prow/e2e-azure ec95d6e link false /test e2e-azure
ci/prow/e2e-aws-disruptive ec95d6e link false /test e2e-aws-disruptive
ci/prow/e2e-gcp-ovn-techpreview-serial-2of2 ec95d6e link false /test e2e-gcp-ovn-techpreview-serial-2of2
ci/prow/e2e-aws-ovn-single-node-upgrade ec95d6e link false /test e2e-aws-ovn-single-node-upgrade
ci/prow/e2e-aws-ovn-kube-apiserver-rollout ec95d6e link false /test e2e-aws-ovn-kube-apiserver-rollout
ci/prow/e2e-gcp-ovn-upgrade ec95d6e link true /test e2e-gcp-ovn-upgrade
ci/prow/e2e-aws-ovn-serial-2of2 ec95d6e link true /test e2e-aws-ovn-serial-2of2
ci/prow/e2e-aws-ovn-microshift-serial ec95d6e link true /test e2e-aws-ovn-microshift-serial

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Copy link

openshift-trt bot commented Aug 11, 2025

Job Failure Risk Analysis for sha: ec95d6e

Job Name Failure Risk
pull-ci-openshift-origin-main-e2e-aws-disruptive IncompleteTests
Tests for this run (31) are below the historical average (361): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)

Risk analysis has seen new tests most likely introduced by this PR.
Please ensure that new tests meet guidelines for naming and stability.

New Test Risks for sha: ec95d6e

Job Name New Test Risk
pull-ci-openshift-origin-main-e2e-aws-ovn-microshift-serial High - "[sig-instrumentation][Late] Platform Prometheus targets should not be accessible without auth [Serial] [Suite:openshift/conformance/serial]" is a new test that failed 3 time(s) against the current commit
pull-ci-openshift-origin-main-e2e-aws-ovn-serial-2of2 High - "[sig-instrumentation][Late] Platform Prometheus targets should not be accessible without auth [Serial] [Suite:openshift/conformance/serial]" is a new test that was not present in all runs against the current commit.

New tests seen in this PR at sha: ec95d6e

  • "[sig-instrumentation][Late] Platform Prometheus targets should not be accessible without auth [Serial] [Suite:openshift/conformance/serial]" [Total: 13, Pass: 10, Fail: 3, Flake: 0]

Copy link

openshift-trt bot commented Aug 11, 2025

Job Failure Risk Analysis for sha: ec95d6e

Job Name Failure Risk
pull-ci-openshift-origin-main-e2e-aws-disruptive IncompleteTests
Tests for this run (31) are below the historical average (361): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)

Risk analysis has seen new tests most likely introduced by this PR.
Please ensure that new tests meet guidelines for naming and stability.

New Test Risks for sha: ec95d6e

Job Name New Test Risk
pull-ci-openshift-origin-main-e2e-aws-ovn-microshift-serial High - "[sig-instrumentation][Late] Platform Prometheus targets should not be accessible without auth [Serial] [Suite:openshift/conformance/serial]" is a new test that failed 3 time(s) against the current commit

New tests seen in this PR at sha: ec95d6e

  • "[sig-instrumentation][Late] Platform Prometheus targets should not be accessible without auth [Serial] [Suite:openshift/conformance/serial]" [Total: 13, Pass: 10, Fail: 3, Flake: 0]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants