Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pleg: enable tests on file changes #33463

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

oxxenix
Copy link

@oxxenix oxxenix commented Sep 12, 2024

Fix: Enable Automatic Test Triggering for PLEG Component
Currently, tests are not triggered automatically as seen in PR #126843.

  • Added configuration to trigger tests automatically when files in the PLEG component are modified.
  • Updated run_if_changed to include the PLEG path: pkg/kubelet/pleg.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Sep 12, 2024
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: oxxenix
Once this PR has been reviewed and has the lgtm label, please assign priyankasaggu11929 for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot
Copy link
Contributor

Welcome @oxxenix!

It looks like this is your first PR to kubernetes/test-infra 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes/test-infra has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot
Copy link
Contributor

Hi @oxxenix. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Sep 12, 2024
@k8s-ci-robot k8s-ci-robot added area/config Issues or PRs related to code in /config area/jobs sig/node Categorizes an issue or PR as relevant to SIG Node. sig/testing Categorizes an issue or PR as relevant to SIG Testing. labels Sep 12, 2024
@oxxenix oxxenix marked this pull request as draft September 12, 2024 09:52
@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Sep 12, 2024
@k8s-ci-robot k8s-ci-robot added size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. and removed size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Sep 12, 2024
@oxxenix oxxenix marked this pull request as ready for review September 12, 2024 10:00
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Sep 12, 2024
@oxxenix
Copy link
Author

oxxenix commented Sep 12, 2024

/cc @bart0sh, @pohly

Signed-off-by: Oksana Baranova <oksana.baranova@intel.com>
@kannon92
Copy link
Contributor

Hmm. Why are you doing this?

@kannon92
Copy link
Contributor

You can trigger pleg jobs manually and I hope we have them running in the periodic.

@aojea
Copy link
Member

aojea commented Sep 12, 2024

/hold

we are not going to add automatic presubmits for alpha features, that does not scale well,

Please follow Kevin suggestions, it is ok to setup periodic jobs, but we need to keep a high bar on presubmits

Thanks

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 12, 2024
@kannon92
Copy link
Contributor

@aojea One minor correction. We have a high bar for required presubmits. Please feel free to add presubmits for alpha features but we should not require them to run on changes.

@bart0sh
Copy link
Contributor

bart0sh commented Sep 12, 2024

@kannon92

You can trigger pleg jobs manually and I hope we have them running in the periodic.

Does this mean that we're ok not to test PLEG changes automatically because this is an Alpha feature? So, we basically allow to break CI and prefer to investigate breakages later, when periodic job breaks, correct?

@bart0sh
Copy link
Contributor

bart0sh commented Sep 12, 2024

@aojea

we are not going to add automatic presubmits for alpha features, that does not scale well,

There are alpha features that are tested this way as far as I can see.

@aojea
Copy link
Member

aojea commented Sep 13, 2024

@aojea

we are not going to add automatic presubmits for alpha features, that does not scale well,

There are alpha features that are tested this way as far as I can see.

well , that I don't agree with, specially for alphas like EventedPLEG and Pod Vertical Autoscaling that are very brittle, as a developer don't want to see my PR blocked for a job like this.

As an EventedPLEG or Pod Vertical Autoscaling I can run my presubmit manually or locally, no need to make the entire project to run the tests of my feature

@aojea
Copy link
Member

aojea commented Sep 13, 2024

Does this mean that we're ok not to test PLEG changes automatically because this is an Alpha feature? So, we basically allow to break CI and prefer to investigate breakages later, when periodic job breaks, correct?

That statement is not fair, evented PLEG has been broken since always, and has downgraded in 1.30, there was a bug in 1.27 reporting the problem and nobody checked at it until I put a job that consistenty failing on that problem, there are periodic jobs and manual jobs and users reporting issues ..., let's don't use the automation on presubmits as excuse,

@pohly
Copy link
Contributor

pohly commented Sep 13, 2024

as a developer don't want to see my PR blocked for a job like this

But it wouldn't be blocked, would it? The pre-submit jobs for alpha features have to be optional (only run in some cases) and must not be required (don't block merging when they fail).

What can be a problem is the resulting confusion when some optional job gets triggered by someone who happened to make some minor change in a file that is being watched and then the job fails. That developer then typically won't know whether it's really okay to ignore the failure.

I think whoever owns that job is on the hook for ensuring that it runs reliably, despite the alpha feature. This disqualifies pull-kubernetes-node-crio-cgrpv1-evented-pleg-e2e which has known issues.

@bart0sh
Copy link
Contributor

bart0sh commented Sep 13, 2024

@aojea

As an EventedPLEG or Pod Vertical Autoscaling I can run my presubmit manually or locally, no need to make the entire project to run the tests of my feature

To be honest, I'm having a hard time to understand the general idea. So far I was under impression that CI jobs aim to guard the project from breakages by running certain checks when PR changes certain part of the codebase. Can you explain what's the idea of not running for example DRA(???) jobs on DRA code changes? To free some resources on CI infra?

@aojea
Copy link
Member

aojea commented Sep 13, 2024

To be honest, I'm having a hard time to understand the general idea. So far I was under impression that CI jobs aim to guard the project from breakages by running certain checks when PR changes certain part of the codebase. Can you explain what's the idea of not running for example DRA(???) jobs on DRA code changes? To free some resources on CI infra?

Yes, sorry, I didn't explain myself correctly, my apologies.

We need to follow some strategy that is sustainable and incentivize the right behaviors.

Adding everything as presubmit has some undesired consequences:, it affects ALL developers in the project, not only the ones working in that specific feature.
It also incentivize the wrong behaviors, as people may think: "as a feature owner I don't need to check my feature state because if something breaks the CI will block it". This second part is the one that we need to avoid, because feature X can break because someone breaks it or because and external dependency, and this makes a problem of one group a problem of everybody.

In this example, if I somehow edit some header in pleg and the job is red, should I merge with the job failing or should I debug an alpha feature that I don't know what is about? is warning about the job is red something will trigger some reaction? we have proof this does not work, jobs and tests red for weeks and months that end , EventedPLEG is the better example

To summarize, a strategy for testing that is sustainable

There are two type of jobs, presubmit and periodic.
Presubmit can be: manually triggered, by regex or on every change.
Presubmit avoid to break existing code, presubmit are expensive and impact developer experience
Periodics can detect code breaks and send notifications, it just requires the feature owners to monitor and attend to the alerts, this is the part I see people prefer to put jobs as presubmit so "community" owns the problem
In addition, Authors must ensure the code is tested and Reviewers and Approvers must ensure the code is tested

EDIT

to be clear, I'm scoping this to alpha features ... on enabled by default or other environmental features the topic is more complicated and I agree run_if_changed is appropiate. on same cases

@bart0sh
Copy link
Contributor

bart0sh commented Sep 13, 2024

@aojea Thank you for the explanations. Much appreciated!

In this example, if I somehow edit some header in pleg and the job is red, should I merge with the job failing or should I debug an alpha feature that I don't know what is about?

It depends on if you care or not. You can either ignore it as these check is not required or let the feature developer know (create issue, ping on slack, etc) about a breakage. However, generally it's a useful info and doesn't block you, I believe.

Presubmit can be: manually triggered, by regex or on every change.

That's the main purpose of this PR as far as I understand - to trigger PLEG CI job when PLEG codebase is changed, isn't it? This is similar to other alpha jobs, e.g. DRA and InPlacePodVerticalScaling.

The alternative you're suggesting is that all people changing certain code area should know which presubmits to trigger manually if they care, correct? How they can find out the list of presubmits to trigger, especially if their changes affect a lot of areas?

presubmit are expensive

Only if they're triggered for unrelated codebase changes, I think. It's not the case for this particular PR.

@aojea
Copy link
Member

aojea commented Sep 13, 2024

The alternative you're suggesting is that all people changing certain code area should know which presubmits to trigger manually if they care, correct? How they can find out the list of presubmits to trigger, especially if their changes affect a lot of areas?

I expect the reviewer to know this, and the approver to double check it, though it is not always what happens :(

presubmit are expensive

Only if they're triggered for unrelated codebase changes, I think. It's not the case for this particular PR.

I recognize I don't have a good answer Ed, if you are saying people working on this feature is going to be accountable and react faster, I really don't have much objection, per example, with DRA we all know things are addressed timely ...

my main concern is that we end again in a situation with hundreds of jobs broken and nobody looking at them, at the end is a question of community and trust

Does these jobs have an alert set to warn when they are failing?

I will remove my hold once I see this jobs alert a group of people and/or slack channel, that will give me clear signal the job will not be left orphan

@bart0sh
Copy link
Contributor

bart0sh commented Sep 13, 2024

@aojea I don't have a good answer either. It depends on PLEG devs which way to choose.
@harche @haircommander WDYT?

@bart0sh
Copy link
Contributor

bart0sh commented Sep 13, 2024

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Sep 13, 2024
@harche
Copy link
Contributor

harche commented Sep 13, 2024

@aojea I don't have a good answer either. It depends on PLEG devs which way to choose. @harche @haircommander WDYT?

LGTM

@bart0sh
Copy link
Contributor

bart0sh commented Sep 13, 2024

@aojea

I will remove my hold once I see this jobs alert a group of people and/or slack channel

There is kubernetes-sig-node-test-failures group, with 53 subscribers. I think @harche and @haircommander are subscribed. Is that enough to remove your hold?

@aojea
Copy link
Member

aojea commented Sep 13, 2024

There is kubernetes-sig-node-test-failures group, with 53 subscribers. I think @harche and @haircommander are subscribed. Is that enough to remove your hold?

/hold cancel

fair enough, thanks for the healthy discussion

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 13, 2024
@kannon92
Copy link
Contributor

So my only issue with approving this is I don't really know what the sig-node status is on this feature.

Is anyone working on getting this out of alpha?

@bart0sh
Copy link
Contributor

bart0sh commented Sep 13, 2024

@kannon92 I can see that PLEG is proposed for consideration on the 1.32 KEP planning board.. @harche are we going to work on it in 1.32 timeframe?

@aojea
Copy link
Member

aojea commented Sep 14, 2024

EventedPLEG feature is alpha and is disabled on alpha jobs because it break the clusters, that is why I was specially hesitant to add these jobs as presubmit

@harche
Copy link
Contributor

harche commented Sep 16, 2024

@harche are we going to work on it in 1.32 timeframe?

I am running short on time due to other more critical commitments, just checking with @pacoxu, @oxxenix and @hshiina if they have some bandwidth in 1.32 for this.

@pacoxu
Copy link
Member

pacoxu commented Sep 18, 2024

I am running short on time due to other more critical commitments, just checking with @pacoxu, @oxxenix and @hshiina if they have some bandwidth in 1.32 for this.

Some Evented PLEG bugfix PRs(kubernetes/kubernetes#122778, kubernetes/kubernetes#124953, kubernetes/kubernetes#126415, kubernetes/kubernetes#127195) are pending on review or approval. We may need approvers' help in this case.

BTW, after a sidecar regression fix(kubernetes/kubernetes#126543), there is a new issue(kubernetes/kubernetes#127312) there and I think we can give it a try to fix it in v1.32. Again, this may need a confirmation from SIG node leads.

@aojea
Copy link
Member

aojea commented Sep 18, 2024

I still think you can achieve the same with periodics, there is no need to block all the project on this alpha feature, specially this one that has 5 pending fixes and a large record of instability

@pacoxu
Copy link
Member

pacoxu commented Sep 18, 2024

I still think you can achieve the same with periodics, there is no need to block all the project on this alpha feature, specially this one that has 5 pending fixes and a large record of instability

Agree with @aojea.
And I prefer to do this after EventedPLEG is promoted to BETA or at least blockers are fixed.

@bart0sh
Copy link
Contributor

bart0sh commented Sep 18, 2024

I agree as well, and I'm ok to close this PR, considering the above mentioned reasons.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/config Issues or PRs related to code in /config area/jobs cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.
Projects
Status: PRs Waiting on Author
Development

Successfully merging this pull request may close these issues.

8 participants