Skip to content

Revert PAO and later changes #330

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Mar 29, 2022
Merged

Revert PAO and later changes #330

merged 2 commits into from
Mar 29, 2022

Conversation

stbenjam
Copy link
Member

This PR reverts both #326 and #322. As there is an unexplained increase in watches is causing upwards of 10% of jobs to fail, the org policy is to revert first and base any fix on an unrevert of this PR.

/assign @deads2k

Some of the increase was expected, but there are odd behaviors we're seeing, such as multiple watches lasting only milliseconds:

16:13:51 [ WATCH][1m14.563923s] [200] /apis/performance.openshift.io/v2/performanceprofiles?allowWatchBookmarks=true&resourceVersion=75288&timeoutSeconds=488&watch=true  [system:serviceaccount:openshift-cluster-node-tuning-operator:cluster-node-tuning-operator]
15:05 [ WATCH][   115.857ms] [200] /apis/performance.openshift.io/v2/performanceprofiles?allowWatchBookmarks=true&resourceVersion=75588&timeoutSeconds=522&watch=true  [system:serviceaccount:openshift-cluster-node-tuning-operator:cluster-node-tuning-operator]

Watch counts

From yesterday (3/28)

$ ~/git/cluster-debug-tools/kubectl-dev_tool audit -f $AUDIT_LOGS -otop --verb=watch --by resource --user=system:serviceaccount:openshift-cluster-node-tuning-operator:cluster-node-tuning-operator
had 6230 line read failures
count: 345, first: 2022-03-28T12:44:14-04:00, last: 2022-03-28T14:36:01-04:00, duration: 1h51m46.864891s
58x                  performance.openshift.io/v2/performanceprofiles
51x                  tuned.openshift.io/profiles
48x                  tuned.openshift.io/tuneds
35x                  machineconfiguration.openshift.io/v1/machineconfigpools
32x                  machineconfiguration.openshift.io/v1/machineconfigs
27x                  v1/nodes
17x                  tuned.openshift.io/v1/profiles
17x                  tuned.openshift.io/v1/tuneds
16x                  machineconfiguration.openshift.io/v1/kubeletconfigs

From last week (3/23)

count: 88, first: 2022-03-23T11:25:54-04:00, last: 2022-03-23T13:02:38-04:00, duration: 1h36m44.257099s
14x                  machineconfiguration.openshift.io/v1/machineconfigs
14x                  tuned.openshift.io/tuneds
13x                  v1/nodes
13x                  tuned.openshift.io/profiles
12x                  machineconfiguration.openshift.io/v1/machineconfigpools
11x                  config.openshift.io/v1/clusteroperators
11x                  apps/daemonsets

Additional info:

Last week the tuned watches were scoped to openshift-cluster-node-tuning-operator namespace, today the watches are now also watching openshift-cluster-node-tuning-operator, openshift-performance-addon-operator, and across all namespaces. It's possible this is entirely explicable, though it appears markedly different. And it's really strange to watch across all namespaces and some specific namespaces.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 29, 2022

@stbenjam: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-gcp-pao 126fdf8 link true /test e2e-gcp-pao
ci/prow/e2e-aws-operator 126fdf8 link true /test e2e-aws-operator
ci/prow/e2e-aws 126fdf8 link true /test e2e-aws
ci/prow/e2e-upgrade 126fdf8 link true /test e2e-upgrade

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@deads2k
Copy link
Contributor

deads2k commented Mar 29, 2022

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Mar 29, 2022
@jmencak
Copy link
Contributor

jmencak commented Mar 29, 2022

/approve

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 29, 2022

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jmencak, stbenjam

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 29, 2022
@deads2k
Copy link
Contributor

deads2k commented Mar 29, 2022

/payload 4.11 ci blocking

need non-aws jobs because aws has a pruning problem. On a green install and upgrade I will merge.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 29, 2022

@deads2k: trigger 5 jobs of type blocking for the ci release of OCP 4.11

  • periodic-ci-openshift-release-master-ci-4.11-e2e-aws-ovn-upgrade
  • periodic-ci-openshift-release-master-ci-4.11-upgrade-from-stable-4.10-e2e-aws-ovn-upgrade
  • periodic-ci-openshift-release-master-ci-4.11-upgrade-from-stable-4.10-e2e-azure-upgrade
  • periodic-ci-openshift-release-master-ci-4.11-e2e-gcp-upgrade
  • periodic-ci-openshift-release-master-ci-4.11-e2e-aws-serial

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/fb6d7aa0-af60-11ec-93ee-4cd3bfa7f070-0

@deads2k
Copy link
Contributor

deads2k commented Mar 29, 2022

This reverts the latest commits in order. We'll revert here.

@deads2k deads2k merged commit e2375c3 into openshift:master Mar 29, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants