KEP-4781: Fix inconsistent container start and ready state after kubelet restart #4784

pololowww · 2024-08-09T09:14:14Z

One-line PR description: Adding new KEP to fix inconsistent container start and ready state after kubelet restart

Issue link: Fix inconsistent container ready state after kubelet restart #4781

Other comments:
/cc @thockin

k8s-ci-robot · 2024-08-09T09:14:18Z

@pololowww: GitHub didn't allow me to request PR reviews from the following users: chenk008.

Note that only kubernetes members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

One-line PR description: Adding new KEP to fix inconsistent container ready state after kubelet restart

Issue link: Fix inconsistent container ready state after kubelet restart #4781

Other comments:
/cc @chenk008 @thockin

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

linux-foundation-easycla · 2024-08-09T09:14:18Z

The committers listed above are authorized under a signed CLA.

✅ login: pololowww / name: pololowww (8a625c4, 5a6fb75, e9a8264, 1819fe6)

k8s-ci-robot · 2024-08-09T09:14:23Z

Welcome @pololowww!

It looks like this is your first PR to kubernetes/enhancements 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes/enhancements has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

k8s-ci-robot · 2024-08-09T09:14:24Z

Hi @pololowww. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

pacoxu · 2024-08-09T09:37:24Z

/cc @bart0sh @SergeyKanzhelev
/assign @mrunalp

pacoxu

/cc @matthyx

keps/sig-node/4781-kublet-restart-pod-status/kep.yaml

keps/sig-node/4781-kublet-restart-pod-status/README.md

bart0sh · 2024-08-09T20:41:54Z

Thank you for your PR. Please sign the CLA to proceed further, thanks.

keps/sig-node/4781-kublet-restart-pod-status/kep.yaml

matthyx · 2024-08-11T20:10:25Z

keps/sig-node/4781-kublet-restart-pod-status/README.md

+```
+##### Changes in probe
+
+We will update the UpdatePodStatus function to:


maybe you can also properly set started for a container (ref kubernetes/kubernetes#115553)

matthyx · 2024-08-11T20:11:58Z

keps/sig-node/4781-kublet-restart-pod-status/README.md

+We will update the UpdatePodStatus function to:
+
+1. Track containers with readiness probes.
+2. Preserve the previous ready state for containers with readiness probes that haven't run yet after kubelet restart.


I wonder how you plan to persist ready states

from API server - read the state of Pods as an initial state for probe managers

Yes, the ready state will be achieved from pod.Status.Conditions.

pololowww · 2024-08-14T07:46:05Z

I have modified the details of my KEP to support inheritance of the Start and Ready states, and reconsidered the scenarios where the kubelet startup time exceeds the period. Updated in the new commit already.
Generally, when containerStartTime is before the Kubelet, we default to inheriting the previous container Start status, and the Ready status is achieved from API Server. If there are any unreasonable aspects to consider, please let me know :-)

keps/sig-node/4781-kublet-restart-pod-status/README.md

HirazawaUi · 2024-08-14T14:30:43Z

keps/sig-node/4781-kublet-restart-pod-status/README.md

+## Drawbacks
+
+<!--
+Why should this KEP _not_ be implemented?


Hypothetically, if an edge case occurs where a pod that was ready transitions to an unready state when kubelet restarts, and we choose to treat it as still ready, the service's endpoints would still include this pod. This could lead to network traffic being directed to a pod that has encountered an issue.

This KEP is not without risks. We should articulate its drawbacks here and explain why, despite these drawbacks, we are still choosing to proceed with it.

Thanks for your suggestions:-). Risks or drawbacks are already mentioned in my new commit. We also plan to trigger the probe immediately to reduce the risks.
BTW this is my first to write a KEP and i am not sure how detailed it should be.

HirazawaUi · 2024-08-14T14:44:43Z

keps/sig-node/4781-kublet-restart-pod-status/README.md

+
+##### Changes in kubelet_pods.go
+
+We will be updating the `convertToAPIContainerStatuses` function, which is responsible for calculating the API-recognizable v1.ContainerStatus based on the old and current container states. Our modification will involve preserving the Started status from the oldStatus when the container's start time is earlier than the kubelet's start time. This will facilitate the subsequent `UpdatePodStatus` to inherit the Started state.


How can we obtain the startup time of kubelet, could you briefly explain?

Actually, the startup time of Kubelet probeManager is more accurate(worker.probeManager.start). Modified already in the new commit:)

HirazawaUi · 2024-08-16T17:19:07Z

keps/sig-node/4781-kublet-restart-pod-status/README.md

+extending the production code to implement this enhancement.
+-->
+
+- `<package>`: `<date>` - `<test coverage>`


You should add your test plan. I assume you've already implemented a POC, so this should be easy for you.

This still applies.

+1, this feature will definitely need strong test coverage and may not be straightforward to test. we should start planning that early.

ffromani · 2024-08-17T07:55:43Z

/cc

songminglong · 2024-08-22T01:57:43Z

/cc

thockin

I am super excited to see this finally being addressed. THANK YOU!

thockin · 2024-09-10T01:26:29Z

keps/sig-node/4781-kublet-restart-pod-status/README.md

+
+##### Changes in kubelet_pods.go
+
+We will be updating the `convertToAPIContainerStatuses` function, which is responsible for calculating the API-recognizable v1.ContainerStatus based on the old and current container states. Our modification will involve preserving the Started status from the oldStatus when the container's start time is earlier than the kubelet's start time. This will facilitate the subsequent `UpdatePodStatus` to inherit the Started state.


I am a little confused about the use of time as an indicator here, but I admit this is not an area of the code I am familiar with.

The way I think about this:

When kubelet starts up, it reads the list of intended pods (and their last known state) from apiserver and it reads the list of actual pods from the CRI.

If it finds a pod in the API that is not running, it starts the pod (yes, I know there's nuance here, especially around state) and updates status.

If it finds a running pod that the API does not have (yes, include mirror pods in that) it kills the pod and cleans up.

If it finds a running pod which corresponds to a pod in the API, it should try to resume the state-machine where the last state indicates. If it was Ready, it stays ready until the probes fail. If it was Unready, it stays unready until the probes succeed. This is the main issue at hand in this KEP, right?

Where does comparing timestamps come in?

Your understanding is correct. The comparison of timestamps occurs before checking all containers. It mainly serves two purposes:

to narrow down the comparison scope

to avoid cases where the previous container has been restarted. If the container's start time is later than the Kubelet, we will no longer retain the previous state.

This is still unclear to me, and I want to make sure the spec is as clear as possible.

to narrow down the comparison scope

What does this mean?

to avoid cases where the previous container has been restarted. If the container's start time is later than the Kubelet, we will no longer retain the previous state.

Let me see if I can construct the case you're worried about, please tell me if I am missing it.

t0: container is running and Ready
t1: kubelet goes down
t2: container crashes
t3: CRI impl restarts the container (which is, practically, NotReady)
t4: kubelet comes back up and finds the container running
t5: kubelet does not yet have probe results, so it leaves the container as Ready
t6: eventually the probe shows the container as NotReady

Is that right? If so, I am really not very worried about it.

Referencing myself (sorry) in kubernetes/kubernetes#100277 (comment) : In a case like this the pod stayed in service a little longer than it should have. This is not a huge deal and happens any time there is a "surprise" change (e.g. the node crashed).

Additionally, the pod start time isn't the only relevant indicator. That timeline could just as easily be:

t0: container is running and Ready
t1: kubelet goes down
t2: container becomes unready (probes would fail, but app does not crash)
t3: kubelet comes back up and finds the container running
t4: kubelet does not yet have probe results, so it leaves the container as Ready
t5: eventually the probe shows the container as NotReady

There's no timestamp you can compare (I think?) that detects that situation, and the whole reason I think we need to change this behavior is because "assuming the worst" (setting it to NotReady) is way more expensive and problematic than "wait and see".

Upon a kubelet restart, we should probably prioritize "run probes ASAP".

TL;DR - I think that kubelet should only set the status when it actually HAS information, not as a side-effect of anything else. If you don't KNOW a pod is unready (by probes or some other explicit mechanism) then you shouldn't set it unready.

Sorry for not replying recently :( When updating the pod status, we iterate through all the containers within the pod to identify the containers whose states need to be inherited. Therefore, directly skipping containers with a later start time can narrow down the comparison scope.
Regarding the first scenario you described and your considerations, I think your understanding is reasonable. It's indeed not a big issue if the pod's state is retained slightly longer.
In the second scenario, we plan to immediately trigger the probes to let them determine the final state of the pod.

In summary, I think the comparison of the start times can be kept for the purpose of narrowing down the scope. Perhaps I shouldn't have emphasized this point in the KEP. What do you think?

pacoxu · 2024-09-20T09:41:10Z

/ok-to-test

haircommander · 2024-10-03T13:48:46Z

Can you also add a prod-readiness file in https://github.com/kubernetes/enhancements/tree/master/keps/prod-readiness/sig-node named 4781.yaml and choose a PRR reviewer to do PRR review (similar to https://github.com/kubernetes/enhancements/blob/master/keps/prod-readiness/sig-node/127.yaml, but you only need alpha). Don't worry too much about which reviewer you choose, as they'll internally load balance.

k8s-ci-robot · 2024-10-04T15:07:58Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: pololowww
Once this PR has been reviewed and has the lgtm label, please ask for approval from mrunalp and additionally assign soltysh for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

keps/prod-readiness/OWNERS
keps/sig-node/OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

SergeyKanzhelev · 2024-10-04T18:28:09Z

@pololowww did you mean to close this PR?

pololowww · 2024-10-05T01:22:04Z

@pololowww did you mean to close this PR?

@SergeyKanzhelev No, i just wanted to close a comment and maybe made some mistakes... Can we reopen it? I need to merge this PR actually.

HirazawaUi · 2024-10-06T15:55:36Z

@pololowww Could you please fix the failing test? You just need to run make update-toc locally and submit the updated content.

BenTheElder

[PRR Shadow]

BenTheElder · 2024-10-08T03:40:45Z

keps/sig-node/4781-kublet-restart-pod-status/README.md

+extending the production code to implement this enhancement.
+-->
+
+- `<package>`: `<date>` - `<test coverage>`


+1, this feature will definitely need strong test coverage and may not be straightforward to test. we should start planning that early.

BenTheElder · 2024-10-08T03:41:52Z

keps/sig-node/4781-kublet-restart-pod-status/README.md

+- [ ] Feature gate (also fill in values in `kep.yaml`)
+  - Feature gate name: `PodUnreadyOnKubeletRestart`
+  - Components depending on the feature gate: `kubelet`
+- [ ] Other


BenTheElder · 2024-10-08T03:42:25Z

keps/sig-node/4781-kublet-restart-pod-status/README.md

+
+Due to the use of a feature gate, the feature can be disabled by setting the gate to false.
+
+###### What happens if we reenable the feature if it was previously rolled back?


Answer this one?

BenTheElder · 2024-10-08T03:43:06Z

keps/sig-node/4781-kublet-restart-pod-status/README.md

+### Rollout, Upgrade and Rollback Planning
+
+<!--
+This section must be completed when targeting beta to a release.


While it must be completed only by Beta, I'd encourage thinking about beta/GA requirements sooner than later.

jpbetz · 2024-10-08T14:59:59Z

keps/sig-node/4781-kublet-restart-pod-status/README.md

+[existing list]: https://kubernetes.io/docs/reference/command-line-tools-reference/feature-gates/
+-->
+
+- [ ] Feature gate (also fill in values in `kep.yaml`)


Suggested change

- [ ] Feature gate (also fill in values in `kep.yaml`)

- [x] Feature gate (also fill in values in `kep.yaml`)

jpbetz · 2024-10-08T15:09:50Z

keps/sig-node/4781-kublet-restart-pod-status/README.md

+feature gate after having objects written with the new field) are also critical.
+You can take a look at one potential example of such test in:
+https://github.com/kubernetes/kubernetes/pull/97058/files#diff-7826f7adbc1996a05ab52e3f5f02429e94b68ce6bce0dc534d1be636154fded3R246-R282
+-->


Let's add something here. At the very minimum, for this feature, we should include tests that verify the behavior with the feature turned off.

jpbetz · 2024-10-10T13:20:58Z

Quick reminder that enhancement freeze is today and there is outstanding Production Readiness Review feedback that would need to be addressed for this to be approved.

intUnderflow · 2024-12-18T16:28:04Z

Hey @pololowww, this is a really excellent first time KEP and a lot of folks are really excited about it. I saw you haven't been around for a few months, but we'd love to help you drive this through to merge. Is this still active on your side / would you like some support in closing this out?

k8s-ci-robot requested a review from thockin August 9, 2024 09:14

k8s-ci-robot added cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/node Categorizes an issue or PR as relevant to SIG Node. labels Aug 9, 2024

k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Aug 9, 2024

k8s-ci-robot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Aug 9, 2024

k8s-ci-robot assigned mrunalp Aug 9, 2024

k8s-ci-robot requested review from bart0sh and SergeyKanzhelev August 9, 2024 09:37

pacoxu reviewed Aug 9, 2024

View reviewed changes

k8s-ci-robot requested a review from matthyx August 9, 2024 10:23

SergeyKanzhelev reviewed Aug 9, 2024

View reviewed changes

keps/sig-node/4781-kublet-restart-pod-status/README.md Show resolved Hide resolved

HirazawaUi reviewed Aug 11, 2024

View reviewed changes

keps/sig-node/4781-kublet-restart-pod-status/kep.yaml Show resolved Hide resolved

matthyx reviewed Aug 11, 2024

View reviewed changes

pololowww changed the title ~~KEP-4781: Fix inconsistent container ready state after kubelet restart~~ KEP-4781: Fix inconsistent container start and ready state after kubelet restart Aug 14, 2024

HirazawaUi reviewed Aug 14, 2024

View reviewed changes

pololowww requested review from SergeyKanzhelev, HirazawaUi and matthyx August 16, 2024 09:00

HirazawaUi reviewed Aug 16, 2024

View reviewed changes

k8s-ci-robot requested a review from ffromani August 17, 2024 07:55

pololowww mentioned this pull request Aug 23, 2024

Fix inconsistent container ready state after kubelet restart #4781

Open

4 tasks

add KEP4781

8a625c4

pololowww force-pushed the kep4781 branch from 8279ca5 to 8a625c4 Compare August 23, 2024 02:59

k8s-ci-robot added sig/apps Categorizes an issue or PR as relevant to SIG Apps. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. and removed cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. labels Aug 23, 2024

Merge branch 'master' of github.com:pololowww/enhancements into kep4781

1819fe6

thockin reviewed Sep 10, 2024

View reviewed changes

This was referenced Sep 19, 2024

Fix inconsistent container start and ready state after kubelet restart kubernetes/kubernetes#127425

Open

fix node is in the notready state while the pod is still running 1/1 kubernetes/kubernetes#125340

Open

k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Sep 20, 2024

add prod-readiness

5a6fb75

pololowww closed this Oct 4, 2024

pololowww reopened this Oct 5, 2024

update toc

e9a8264

BenTheElder reviewed Oct 8, 2024

View reviewed changes

jpbetz reviewed Oct 8, 2024

View reviewed changes

HirazawaUi mentioned this pull request Nov 13, 2024

kubelet cause pod unready when stop kubelet, sleep 40s, start kubelet kubernetes/kubernetes#128561

Open


		##### Changes in kubelet_pods.go

		We will be updating the `convertToAPIContainerStatuses` function, which is responsible for calculating the API-recognizable v1.ContainerStatus based on the old and current container states. Our modification will involve preserving the Started status from the oldStatus when the container's start time is earlier than the kubelet's start time. This will facilitate the subsequent `UpdatePodStatus` to inherit the Started state.


		Due to the use of a feature gate, the feature can be disabled by setting the gate to false.

		###### What happens if we reenable the feature if it was previously rolled back?

	- [ ] Feature gate (also fill in values in `kep.yaml`)
	- [x] Feature gate (also fill in values in `kep.yaml`)

KEP-4781: Fix inconsistent container start and ready state after kubelet restart #4784

Are you sure you want to change the base?

KEP-4781: Fix inconsistent container start and ready state after kubelet restart #4784

Conversation

pololowww commented Aug 9, 2024 • edited Loading

k8s-ci-robot commented Aug 9, 2024

linux-foundation-easycla bot commented Aug 9, 2024 • edited Loading

k8s-ci-robot commented Aug 9, 2024

k8s-ci-robot commented Aug 9, 2024

pacoxu commented Aug 9, 2024

pacoxu left a comment

Choose a reason for hiding this comment

bart0sh commented Aug 9, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pololowww commented Aug 14, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pololowww Aug 15, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ffromani commented Aug 17, 2024

songminglong commented Aug 22, 2024

thockin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pacoxu commented Sep 20, 2024

haircommander commented Oct 3, 2024

k8s-ci-robot commented Oct 4, 2024

SergeyKanzhelev commented Oct 4, 2024

pololowww commented Oct 5, 2024

HirazawaUi commented Oct 6, 2024

BenTheElder left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jpbetz commented Oct 10, 2024

intUnderflow commented Dec 18, 2024

pololowww commented Aug 9, 2024 •

edited

Loading

linux-foundation-easycla bot commented Aug 9, 2024 •

edited

Loading

pololowww Aug 15, 2024 •

edited

Loading