WIP: DNM: Hack on kep 3085 #3501

thockin · 2022-09-09T20:35:32Z

I didn't get to review 3085 and there is no good way to add comments to a merged KEP, so I am trying this approach.

k8s-ci-robot · 2022-09-09T20:35:37Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: thockin
Once this PR has been reviewed and has the lgtm label, please assign derekwaynecarr for approval by writing /assign @derekwaynecarr in a comment. For more information see:The Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

keps/sig-node/OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

BenTheElder · 2022-09-09T20:56:19Z

keps/sig-node/3085-pod-conditions-for-starting-completition-of-sandbox-creation/README.md

+ controller for managing pods that refer to PVCs associated with node local
+ storage (e.g. Rook-Ceph) may decide to recreate PVCs (based on a specified PVC
+ template in the custom resource the controller is managing) if the sandbox
+ creation is repeatedly failing to complete, indicated by the new `PodHasNetwork`


Seems like we should have a status for the sandbox overall instead of specifically "has network" ?

I see #3087 (review), but it still seems like the motivation is knowing if the sandbox is ready and because the sandbox is somewhat nebulous I think it's wrong to couple directly to "has network" when we're really looking for "CRI finished sandbox setup"?

BenTheElder · 2022-09-09T21:14:17Z

keps/sig-node/3085-pod-conditions-for-starting-completition-of-sandbox-creation/README.md

+ around pod sandbox creation latency, surfacing the `PodHasNetwork` condition in
+ pod status will allow a metric collection service to directly access the
+ relevant data without requiring the ability to collect and parse OpenTelemetry
+ traces. As mentioned in the User Stories, popular community managed services


But if the user is looking for more granular data on timings, surely a common tracing standard is the answer?

How granular would we eventually go with conditions vs tracing?

jpbetz · 2022-09-09T20:55:59Z

keps/sig-node/3085-pod-conditions-for-starting-completition-of-sandbox-creation/README.md

+ `PodHasNetwork` condition reporting `true`) but the pod's containers fail to
+ become ready. Further details of this is covered in a [user story](#story-2-consuming-podhasnetwork-condition-in-a-controller) below.
+
+ When a pod's sandbox no longer exists, the `status` of `PodHasNetwork` condition


When I see PodHasNetwork my first thought was that this is a "is the network working right now?" type indicator. Should this be named something like PodNetworkConfigured?

jpbetz · 2022-09-09T21:14:55Z

keps/sig-node/3085-pod-conditions-for-starting-completition-of-sandbox-creation/README.md

+ runtime and associated runtime handlers, CNI plugins, etc. The duration between
+ `lastTransitionTime` field of the `PodHasNetwork` condition (with `status` set
+ to `true` for a pod for the first time) and the existing `PodScheduled`
+ condition will allow metrics collection services to compute total latency of all


I'm concerned about using transition times of different conditions to measure duration of operations. For phases that must be performed in series and that have a guaranteed stable ordering, this would seem okay, but it seems like network configuration could happen in parallel with other steps, maybe as an optimization in the future. Also, in the future, if there needs to be some step after network it looks like it would mess up metrics collection (i.e. if X happens after network configuration, you could end up measuring the latency of network configuration + latency of X).

jpbetz · 2022-09-09T21:27:59Z

keps/sig-node/3085-pod-conditions-for-starting-completition-of-sandbox-creation/README.md

+ This also prevents other controllers from acting on sandbox status as
+ mentioned in User Stories section.
+
+ ### Report sandbox creation stages using Kubelet tracing


Would something along these lines be viable? If we continued to add monitoring of pod sandbox latencies for other things over time, how far can we go? If we want more granular monitoring, would we be okay with having 10 or even 50 conditions? My hunch is that some type of tracing approach, or some other way to export metrics data scales much better.

jpbetz · 2022-09-09T21:30:42Z

keps/sig-node/3085-pod-conditions-for-starting-completition-of-sandbox-creation/README.md

+ runtime and associated runtime handlers, CNI plugins, etc. The duration between
+ `lastTransitionTime` field of the `PodHasNetwork` condition (with `status` set
+ to `true` for a pod for the first time) and the existing `PodScheduled`
+ condition will allow metrics collection services to compute total latency of all


Are we proposing any in-tree metrics collection? Or is the intention make it possible for 3rd party metrics collection of network configuration? (I'm assuming the later, but wanted to double check)

thockin

Hi, I am sorry I didn't see this one before it went in for alpha. I have some real reservations about this at multiple layers.

First, this makes an assumption that I don't think necessarily holds - that network is the last step in setup. You're calling it "has network" but you MEAN "sandbox is done". Why are we mis-naming it?

Second, it's possible that we may yet support multi-network pods in Kube. There's a discussion going in sig-net. If/when that lands, does this now mean "has all networks" or "has one network"?

Third, I am not a fan of Conditions, in general, and this feels like a bit of a force-fit. All thru reading I kept asking "why not use a new field which exists FOR THIS PURPOSE? Conditions are supposed to be orthogonal - not related to each other. Describing them in a sequence means you have defines a state-machine, and that makes a TERRIBLE API.

I really like the main idea (or at least what I interpret as the main idea) - publish information about how long pod startup took and which components ate how much of that time. This feels like a partial answer at best.

Stepping back, we have finalizers and we have readiness gates, which are kind-of oppiste ends, but they both cover time while the pod is running. This KEP is covering time before it is running, but the pattern - "a set of arbitrary check-boxes need to be checked" seems to hold. Can we / should we embrace that?

For example, imagine we kept a list of "steps" with timestamps. Si,ilar to conditions but actually designed for this, no baggage. kubelet could look at the pod at admission and build up a list of the steps it knows (which volumes, which networks, etc). It could record the start and stop times (or just deltas) for each step. This does not assume linearity or concurrency - that's up to kubelet to manage. It's also extensible - so new components could add steps without breaking the published state-machine (which is what this really becomes, as-is).

Lastly - I am still confused about "Initialized". Does that mean "ready to run non-init-containers? It sounds like it is muddy. Would be nice to have a diagram showing the actual timeline, e.g.

|--<scheduled>-|-<admitted>-|-<setup begun>-|-<components...>-|-<setup complete>-|-<initcontainers begun>-|-<initcontainers done>-|-<running>-|-<deleted>-|-<finalizers>--|

So I do not think this KEP shoudl proceed beyond alpha as it is written, at least I feel like I need to understand a whole lot better why these tradeoffs are correct. I am happy to get on a meeting some time (kubecon?) and discuss.

thockin · 2022-09-09T20:43:52Z

keps/sig-node/3085-pod-conditions-for-starting-completition-of-sandbox-creation/README.md

+ container runtime handlers.
+
+ Conclusion of the creation of the pod sandbox is marked by the presence of a
+ sandbox with networking configured. A new dedicated condition marking this -


Why is "has network" necessarily the last step? Is that part of a contract I do not know about?

What I mean is: It seems like theres a list (or graph?) of things that have to happen as a pod starts. That list may not be knowable a priori (e.g. a "special" RuntimeClass might maybe have extra steps that a "regular one wouldn't") so I am immediately wary of this sort of statement.

thockin · 2022-09-09T20:51:43Z

keps/sig-node/3085-pod-conditions-for-starting-completition-of-sandbox-creation/README.md

+ `lastTransitionTime` field of the `PodHasNetwork` condition (with `status` set
+ to `true` for a pod for the first time) and the existing `PodScheduled`
+ condition will allow metrics collection services to compute total latency of all
+ the components involved in pod sandbox creation as an SLI. Cluster operators can


I really like the idea of creating a way to measure the contribution of each "step" to the total startup time of a pod. That said, are all of the steps guranteed to be linear? Or can they happen in parallel (e.g. I can pull an image while I mount volumes and configure network all at the same time)? How do we represent that?

thockin · 2022-09-09T21:13:38Z

keps/sig-node/3085-pod-conditions-for-starting-completition-of-sandbox-creation/README.md

+ Timestamps around completion of pod sandbox creation may be surfaced as a
+ dedicated field in the pod status rather than a pod condition. However, since
+ the successful creation of pod sandbox is essentially a "milestones" in the life
+ of a pod (similar to Scheduled, Ready, etc), pod conditions is the ideal place


I Have argued in the past that "Ready" is a terrible use of conditions, so I have a problem with using "we did that" as an excuse to do it again. Past mistakes are still mistakes.

thockin · 2022-09-09T21:19:37Z

keps/sig-node/3085-pod-conditions-for-starting-completition-of-sandbox-creation/README.md

+
+ Each infrastructural plugin that Kubelet calls out to (in the process of setting
+ up a pod sandbox) can mark start and completion timestamps on the pod as
+ conditions. This approach would be similar to how readiness gates work today.


I'm not quite buying this - if kubelet is the internal "switchboard" for running messages between plugins, that feels like a purely internal thing - don't make these plugins THEMSELVES update the "pre-readiness gates", make kubelet do it.

k8s-ci-robot · 2022-09-23T05:35:47Z

@thockin: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

thockin · 2022-09-29T17:37:30Z

FTR: I am not expecting dramatic action on 1.26, given timing, but this CAN NOT proceed to beta without discussion.

ddebroy · 2022-09-29T19:28:21Z

ack, @thockin. Appreciate the review and the thoughts. I will DM to find a time to meet (virtually or during Kubecon).

I really like the idea of a list of "steps" with timestamps. Similar to conditions but actually designed for this, no baggage. We indeed tried to force-fit our requirement into pod conditions mainly based on the existing Initialized pod condition but deciding not to alter that for compatibility concerns.

k8s-triage-robot · 2022-12-28T19:45:30Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

thockin · 2023-01-14T00:06:00Z

@ddebroy what did we decide to do about this?

Add sig-net to PodHasNetwork KEP

1fb541f

k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Sep 9, 2022

k8s-ci-robot requested a review from dchen1107 September 9, 2022 20:35

k8s-ci-robot added the kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory label Sep 9, 2022

k8s-ci-robot requested a review from derekwaynecarr September 9, 2022 20:35

k8s-ci-robot added the sig/node Categorizes an issue or PR as relevant to SIG Node. label Sep 9, 2022

thockin marked this pull request as draft September 9, 2022 20:35

k8s-ci-robot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Sep 9, 2022

WIP: DO NOT MERGE: thockin review of KEP 3085

d37e553

thockin force-pushed the hack-on-kep-3085 branch from 6ae61fc to d37e553 Compare September 9, 2022 20:36

BenTheElder reviewed Sep 9, 2022

View reviewed changes

jpbetz reviewed Sep 9, 2022

View reviewed changes

thockin assigned derekwaynecarr, ddebroy and dchen1107 Sep 9, 2022

thockin commented Sep 9, 2022

View reviewed changes

k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 23, 2022

thockin self-assigned this Sep 29, 2022

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 28, 2022

thockin removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 14, 2023

thockin closed this Jan 29, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: DNM: Hack on kep 3085 #3501

WIP: DNM: Hack on kep 3085 #3501

thockin commented Sep 9, 2022

k8s-ci-robot commented Sep 9, 2022

BenTheElder Sep 9, 2022

BenTheElder Sep 9, 2022

jpbetz Sep 9, 2022

jpbetz Sep 9, 2022

jpbetz Sep 9, 2022

jpbetz Sep 9, 2022 •

edited

Loading

thockin left a comment •

edited

Loading

thockin Sep 9, 2022

thockin Sep 9, 2022

thockin Sep 9, 2022

thockin Sep 9, 2022

k8s-ci-robot commented Sep 23, 2022

thockin commented Sep 29, 2022

ddebroy commented Sep 29, 2022 •

edited

Loading

k8s-triage-robot commented Dec 28, 2022

thockin commented Jan 14, 2023

WIP: DNM: Hack on kep 3085 #3501

WIP: DNM: Hack on kep 3085 #3501

Conversation

thockin commented Sep 9, 2022

k8s-ci-robot commented Sep 9, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jpbetz Sep 9, 2022 • edited Loading

Choose a reason for hiding this comment

thockin left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

k8s-ci-robot commented Sep 23, 2022

thockin commented Sep 29, 2022

ddebroy commented Sep 29, 2022 • edited Loading

k8s-triage-robot commented Dec 28, 2022

thockin commented Jan 14, 2023

jpbetz Sep 9, 2022 •

edited

Loading

thockin left a comment •

edited

Loading

ddebroy commented Sep 29, 2022 •

edited

Loading