-
Notifications
You must be signed in to change notification settings - Fork 4.7k
Ensure Platform Prometheus targets are protected #30014
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
/cc |
307a1cc
to
1d33862
Compare
Job Failure Risk Analysis for sha: 1d33862
|
1d33862
to
ef1d7b2
Compare
Some testing result: The failure is expected as https://issues.redhat.com/browse/OCPBUGS-57585 is not fixed yet.
$ git --no-pager log --pretty=oneline -1
ef1d7b2fb67c5db7aa441e270b5b6792f56112c0 (HEAD -> servicemonitor, hongkailiu/servicemonitor) Ensure ServiceMonitor's endpoints are protected
$ make WHAT=cmd/openshift-tests
$ cat /tmp/osServicePrincipal.json
{}
$ COMPONENT_NAMESPACE=openshift-cluster-version KUBECONFIG=/Users/hongkliu/.kube/config AZURE_AUTH_LOCATION=/tmp/osServicePrincipal.json ./openshift-tests run-test "[sig-instrumentation][Late] OpenShift service monitors [apigroup:image.openshift.io] should not be accessible without authorization"
...
Running Suite: - /Users/hongkliu/repo/openshift/origin
=======================================================
Random Seed: 1753732342 - will randomize all specs
Will run 1 of 1 specs
------------------------------
[sig-instrumentation][Late] OpenShift service monitors [apigroup:image.openshift.io] should not be accessible without authorization
github.com/openshift/origin/test/extended/prometheus/prometheus.go:72
STEP: Creating a kubernetes client @ 07/28/25 15:52:23.353
I0728 15:52:23.353887 21800 discovery.go:214] Invalidating discovery information
STEP: verifying all service monitors are configured with authorization @ 07/28/25 15:52:23.426
I0728 15:52:23.468686 21800 prometheus.go:92] service monitor openshift-cluster-version/cluster-version-operator has authorization
STEP: verifying all targets returns 401 or 403 without authorization @ 07/28/25 15:52:23.468
I0728 15:52:24.250271 21800 builder.go:121] Running '/Users/hongkliu/bin/kubectl --server=https://api.ci-ln-j1glv1b-1d09d.ci.azure.devcluster.openshift.com:6443 --kubeconfig=/Users/hongkliu/.kube/config --namespace=openshift-monitoring exec prometheus-k8s-0 -- /bin/sh -x -c curl -k -s -o /dev/null -w '%{http_code}' "https://10.0.0.5:9099/metrics"'
I0728 15:52:25.138655 21800 builder.go:146] stderr: "+ curl -k -s -o /dev/null -w '%{http_code}' https://10.0.0.5:9099/metrics\n"
I0728 15:52:25.138757 21800 builder.go:147] stdout: "200"
[FAILED] in [It] - github.com/openshift/origin/test/extended/prometheus/prometheus.go:119 @ 07/28/25 15:52:25.139
• [FAILED] [1.799 seconds]
[sig-instrumentation][Late] OpenShift service monitors [apigroup:image.openshift.io] [It] should not be accessible without authorization
github.com/openshift/origin/test/extended/prometheus/prometheus.go:72
[FAILED] Expected
<[]error | len:1, cap:1>: [
<*fmt.wrapError | 0x1400789e0c0>{
msg: "the scaple url https://10.0.0.5:9099/metrics for namespace openshift-cluster-version is accessible without authorization: last response from server was not in [401 403]: 200",
err: <*errors.errorString | 0x140079f81b0>{
s: "last response from server was not in [401 403]: 200",
},
},
]
to be empty
In [It] at: github.com/openshift/origin/test/extended/prometheus/prometheus.go:119 @ 07/28/25 15:52:25.139
------------------------------
Summarizing 1 Failure:
[FAIL] [sig-instrumentation][Late] OpenShift service monitors [apigroup:image.openshift.io] [It] should not be accessible without authorization
github.com/openshift/origin/test/extended/prometheus/prometheus.go:119
Ran 1 of 1 Specs in 1.799 seconds
FAIL! -- 0 Passed | 1 Failed | 0 Pending | 0 Skipped
[
{
"name": "[sig-instrumentation][Late] OpenShift service monitors [apigroup:image.openshift.io] should not be accessible without authorization",
"lifecycle": "blocking",
"duration": 1798,
"startTime": "2025-07-28 19:52:23.342847 UTC",
"endTime": "2025-07-28 19:52:25.141598 UTC",
"result": "failed",
"output": " STEP: Creating a kubernetes client @ 07/28/25 15:52:23.353\n STEP: verifying all service monitors are configured with authorization @ 07/28/25 15:52:23.426\nI0728 15:52:23.468686 21800 prometheus.go:92] service monitor openshift-cluster-version/cluster-version-operator has authorization\n STEP: verifying all targets returns 401 or 403 without authorization @ 07/28/25 15:52:23.468\nI0728 15:52:24.250271 21800 builder.go:121] Running '/Users/hongkliu/bin/kubectl --server=https://api.ci-ln-j1glv1b-1d09d.ci.azure.devcluster.openshift.com:6443 --kubeconfig=/Users/hongkliu/.kube/config --namespace=openshift-monitoring exec prometheus-k8s-0 -- /bin/sh -x -c curl -k -s -o /dev/null -w '%{http_code}' \"https://10.0.0.5:9099/metrics\"'\nI0728 15:52:25.138655 21800 builder.go:146] stderr: \"+ curl -k -s -o /dev/null -w '%{http_code}' https://10.0.0.5:9099/metrics\\n\"\nI0728 15:52:25.138757 21800 builder.go:147] stdout: \"200\"\n [FAILED] in [It] - github.com/openshift/origin/test/extended/prometheus/prometheus.go:119 @ 07/28/25 15:52:25.139\n",
"error": "fail [github.com/openshift/origin/test/extended/prometheus/prometheus.go:119]: Expected\n \u003c[]error | len:1, cap:1\u003e: [\n \u003c*fmt.wrapError | 0x1400789e0c0\u003e{\n msg: \"the scaple url https://10.0.0.5:9099/metrics for namespace openshift-cluster-version is accessible without authorization: last response from server was not in [401 403]: 200\",\n err: \u003c*errors.errorString | 0x140079f81b0\u003e{\n s: \"last response from server was not in [401 403]: 200\",\n },\n },\n ]\nto be empty"
}
]Error: 1 tests failed
error: 1 tests failed |
FYI: see bugs for endpoints without authorization(except CVO) in https://issues.redhat.com/browse/MON-4304 |
/retest-required |
Job Failure Risk Analysis for sha: ef1d7b2
|
/test verify |
Risk analysis has seen new tests most likely introduced by this PR. New Test Risks for sha: b105e70
New tests seen in this PR at sha: b105e70
|
b105e70
to
c8ea5a6
Compare
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: hongkailiu The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
c8ea5a6
to
ba8f912
Compare
Job Failure Risk Analysis for sha: ba8f912
Risk analysis has seen new tests most likely introduced by this PR. New Test Risks for sha: ba8f912
New tests seen in this PR at sha: ba8f912
|
Interesting, I need to figure out how these are protected:
They (checked the first two) returned 403. My current guess:
They can be protected without any configuration on the service monitor? Edit:
|
dee0844
to
f68c4ec
Compare
7dda1a8
to
9ccab35
Compare
https://issues.redhat.com/browse/OCPBUGS-60159 Ignored the namespace |
Risk analysis has seen new tests most likely introduced by this PR. New tests seen in this PR at sha: 9ccab35
|
8312214
to
cd59271
Compare
Job Failure Risk Analysis for sha: cd59271
Risk analysis has seen new tests most likely introduced by this PR. New tests seen in this PR at sha: cd59271
|
Job Failure Risk Analysis for sha: cd59271
Risk analysis has seen new tests most likely introduced by this PR. New tests seen in this PR at sha: cd59271
|
Job Failure Risk Analysis for sha: cd59271
Risk analysis has seen new tests most likely introduced by this PR. New tests seen in this PR at sha: cd59271
|
} | ||
}) | ||
|
||
g.It("should not be accessible without auth [Serial]", func() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: as along as no disruptive test is running in parallel, I don't see why we need the Serial
in here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I started the pull with Parallel but hit the error like this in CI.
I do not see the error any more after switching to Serial.
I would like to get the case covered and avoid noises from CI at the beginning.
After some iteration, we can always turn it into Parallel if needed.
f270eb5
to
3016340
Compare
3016340
to
ec95d6e
Compare
Job Failure Risk Analysis for sha: ec95d6e
Risk analysis has seen new tests most likely introduced by this PR. New Test Risks for sha: ec95d6e
New tests seen in this PR at sha: ec95d6e
|
/test e2e-aws-ovn-microshift-serial |
Job Failure Risk Analysis for sha: ec95d6e
Risk analysis has seen new tests most likely introduced by this PR. New Test Risks for sha: ec95d6e
New tests seen in this PR at sha: ec95d6e
|
Job Failure Risk Analysis for sha: ec95d6e
Risk analysis has seen new tests most likely introduced by this PR. New Test Risks for sha: ec95d6e
New tests seen in this PR at sha: ec95d6e
|
/retest-required |
@hongkailiu: The following tests failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Job Failure Risk Analysis for sha: ec95d6e
Risk analysis has seen new tests most likely introduced by this PR. New Test Risks for sha: ec95d6e
New tests seen in this PR at sha: ec95d6e
|
Job Failure Risk Analysis for sha: ec95d6e
Risk analysis has seen new tests most likely introduced by this PR. New Test Risks for sha: ec95d6e
New tests seen in this PR at sha: ec95d6e
|
The new test checks (only for OpenShift components)
if each service monitor has authorization configuration, andEither of the above is not satisfied leads to failure of the test.
Moreover, The env. var.
MONITORING_AUTH_TEST_NAMESPACE
can be used to focus onvalidating the resources from a single namespace.