Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set the kubelet --cgroup-driver flag conditionally at kubeadm init time for Docker #844

Closed
luxas opened this issue May 22, 2018 · 20 comments · Fixed by kubernetes/kubernetes#64347
Assignees
Labels
area/releasing area/upgrades kind/feature Categorizes issue or PR as related to a new feature. lifecycle/active Indicates that an issue or PR is actively being worked on by a contributor. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. sig/node Categorizes an issue or PR as relevant to SIG Node.
Milestone

Comments

@luxas
Copy link
Member

luxas commented May 22, 2018

When kubernetes/kubernetes#63887 merges, we should set the kubelet's --cgroup-driver flag to either systemd or cgroupfs conditionally at kubeadm init time, in cmd/kubeadm/app/phases/kubelet/flags.go.

This instead of doing it statically in the deb/rpm packages.
xref related issue: #822

PR:
kubernetes/kubernetes#64347

@luxas luxas added this to the v1.11 milestone May 22, 2018
@luxas luxas added kind/enhancement priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. area/releasing area/upgrades labels May 22, 2018
@neolit123
Copy link
Member

i really wanted to see a solution where they make the default kubelet --cgroup-driver mode auto, by default instead of cgroupfs. looking at the source code of the kubelet that does that validation to match the drivers it seemed quite possible, but maybe there is a catch.

this would avoid kubeadm setting the driver explicitly, given the kubelet knows what driver to use.

@luxas
Copy link
Member Author

luxas commented May 22, 2018

WDYT @kubernetes/sig-node-feature-requests about the suggestion above? cc @mtaufen

@k8s-ci-robot k8s-ci-robot added sig/node Categorizes an issue or PR as relevant to SIG Node. kind/feature Categorizes issue or PR as related to a new feature. labels May 22, 2018
@yujuhong
Copy link

/cc @filbranden

@filbranden
Copy link

@neolit123 I meant to work on something like that... But talking to @Random-Liu we noticed it was hard to support that since the Runtime (CRI) doesn't need to be up when kubelet starts and we would like to avoid blocking on the Runtime for kubelet startup...

We discussed a few alternative proposals, but we have nothing to show yet.

One possible alternative is to make the CRI the slave and kubelet the master for that attribute. (If kubelet sends the Runtime a systemd cgroup path, then Runtime also uses the systemd cgroup driver and vice-versa.) We might need some changes in libcontainer to do that... It should be feasible.

@neolit123
Copy link
Member

@filbranden

One possible alternative is to make the CRI the slave and kubelet the master for that attribute. (If kubelet sends the Runtime a systemd cgroup path, then Runtime also uses the systemd cgroup driver and vice-versa.) We might need some changes in libcontainer to do that... It should be feasible.

it's a complicated change and a long-term one, but i guess it would work!

I meant to work on something like that... But talking to @Random-Liu we noticed it was hard to support that since the Runtime (CRI) doesn't need to be up when kubelet starts and we would like to avoid blocking on the Runtime for kubelet startup...

We discussed a few alternative proposals, but we have nothing to show yet.

you have probably considered this already, but wouldn't the following be the easiest short-term solution that doesn't modify the current kubelet / CRI (master-on-slave) setup:

  • add the auto parameter to kubelet's cgroup-driver argument
  • if the CRI is running take it's cgroup driver
  • if the CRI isn't running use cgroupfs

@filbranden
Copy link

The current way the code is laid out makes it tricky...

Even with docker-shim, it turns out there is code that depends on the cgroup driver setting that runs before the first connection to docker-shim is made... So it's actually quite a big refactor. That's when I started talking to @Random-Liu about this and we looked at the other shortcomings of the approach.

Agreed, having CRI runtimes do both cgroup drivers (and have the request from kubelet decide) is a long term project, it's gonna take a while...

@neolit123
Copy link
Member

@filbranden
understood, i was almost sure that there is a catch here, otherwise auto would have easily made it as the default option. thank you for the detailed explaination!

@luxas
Copy link
Member Author

luxas commented May 22, 2018

@neolit123 let's go with this approach for the moment then
@filbranden do you have an issue in k/k to track the auto feature so we can track the progress there and eventually switch over?

@neolit123
Copy link
Member

@luxas what do you have in mind for the detection of the CRI driver at kubeadm runtime?

something like this would fetch it:

docker info | grep -i cgroupfs

but it's a command line and only for docker...

@luxas
Copy link
Member Author

luxas commented May 22, 2018

We expect the CRI to be running at the time kubeadm is running. Can you query the CRI runtime for that information for the general case or just query docker info or similar for the special case?

@neolit123
Copy link
Member

neolit123 commented May 22, 2018

Can you query the CRI runtime for that information for the general case

should that be done with crictl?

or just query docker info or similar for the special case?

could you elaborate on general case vs special case.
i'm sorry for the silly questions.

@filbranden
Copy link

do you have an issue in k/k to track the auto feature so we can track the progress there and eventually switch over?

No I don't, I never really filed one, I was trying to get a commit at least to a prototype stage before doing that, and given I found it wasn't really feasible, I didn't move forward...

Can you query the CRI runtime for that information for the general case?

That's also part of the problem... CRI doesn't really have an official way to expose that either...

IIRC, for containerd, if you query it using the Status CRI call (crictl status) and set Verbose: true then you get a string containing a JSON that has the cgroup driver somewhere in there...

When I checked cri-o, it doesn't have anything like that...

So if you want to query it, you might need to add some more support in CRI to export that... (Perhaps the Status request is the appropriate place... just might need an explicit proto field to export that, and update existing implementations.)

@neolit123
Copy link
Member

@filbranden

thanks for the details! i will have a look at this tomorrow:

https://github.com/filbranden/community/blob/009f067c6bc106d062a416d0a1537de5ffa8c023/contributors/design-proposals/node/cgroup-driver-kubelet-and-runtime.md

That's also part of the problem... CRI doesn't really have an official way to expose that either...
When I checked cri-o, it doesn't have anything like that...

So if you want to query it, you might need to add some more support in CRI to export that... (Perhaps the Status request is the appropriate place... just might need an explicit proto field to export that, and update existing implementations.)

oh, well. i'm not exactly sure how we can proceed with this for 1.11.
@luxas what do you think about the outlined complication?

@luxas
Copy link
Member Author

luxas commented May 23, 2018

hmm, let's do this only for docker then. So if the CRISocket is dockershim, check docker ps output or whatever to determine what to use. We really need this for docker in the general case, otherwise all our rpm-based installations fail.
@neolit123 seems legit to do this?

@stealthybox
Copy link
Member

stealthybox commented May 23, 2018

available since 1.13

$ docker system info --format '{{.CgroupDriver}}'
cgroupfs

@neolit123
Copy link
Member

@luxas
seems good.

@stealthybox
thanks for the shortcut.
i will parse the docker info output as i don't want to deprecate 1.11 and 1.12 because of that.

@luxas luxas added the lifecycle/active Indicates that an issue or PR is actively being worked on by a contributor. label May 29, 2018
@neolit123
Copy link
Member

NOTE: keeping this issue for the rest of the CRI.

@neolit123 neolit123 removed this from the v1.11 milestone May 30, 2018
@neolit123 neolit123 added this to the v1.12 milestone May 30, 2018
@neolit123 neolit123 added priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. and removed priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. labels May 30, 2018
@luxas
Copy link
Member Author

luxas commented May 30, 2018

@neolit123 Would you be okay with keeping this in the v1.11 milestone to track what's needed for v1.11 and then open a new to track the rest of the CRI work in order to not change purpose midterm?

@neolit123
Copy link
Member

@luxas ok!

@neolit123 neolit123 modified the milestones: v1.12, v1.11 May 30, 2018
@neolit123 neolit123 added priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. and removed priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. labels May 30, 2018
@neolit123 neolit123 changed the title Set the kubelet --cgroup-driver flag conditionally at kubeadm init time Set the kubelet --cgroup-driver flag conditionally at kubeadm init time for Docker May 30, 2018
jingax10 pushed a commit to jingax10/kubernetes that referenced this issue May 31, 2018
Automatic merge from submit-queue (batch tested with PRs 64338, 64219, 64486, 64495, 64347). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

kubeadm: conditionally set the kubelet cgroup driver for Docker

**What this PR does / why we need it**:
Add a new utility file - util/cgroupdriver.go.
Currently it only contains the public function GetCgroupDriverDocker().
The function uses 'docker info' to obtain the cgroup driver
for Docker.

On a later stage this file can contain more methods for different
CRI.

Use GetCgroupDriverDocker() in phases/kubelet/flags.go
to conditionally set the 'cgroup-driver' argument. On error
print a warning and don't set the argument value.

Add unit tests in cgroupdriver_test.go.

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes kubernetes/kubeadm#844

**Special notes for your reviewer**:
NONE

**Release note**:

```release-note
kubeadm: conditionally set the kubelet cgroup driver for Docker
```

/area kubeadm
/kind enhancement
@kubernetes/sig-cluster-lifecycle-pr-reviews 
@luxas
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/releasing area/upgrades kind/feature Categorizes issue or PR as related to a new feature. lifecycle/active Indicates that an issue or PR is actively being worked on by a contributor. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. sig/node Categorizes an issue or PR as relevant to SIG Node.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants