Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hyperkube: kube-proxy 1.16.11 does not work on CentOS7 #92250

Closed
dmitry-irtegov opened this issue Jun 18, 2020 · 13 comments
Closed

hyperkube: kube-proxy 1.16.11 does not work on CentOS7 #92250

dmitry-irtegov opened this issue Jun 18, 2020 · 13 comments
Assignees
Labels
area/release-eng Issues or PRs related to the Release Engineering subproject kind/bug Categorizes issue or PR as related to a bug. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. sig/network Categorizes an issue or PR as relevant to SIG Network. sig/release Categorizes an issue or PR as relevant to SIG Release. triage/unresolved Indicates an issue that can not or will not be resolved.

Comments

@dmitry-irtegov
Copy link

What happened:
We install k8s 1.16.11 on CentOS 7.7.1908 (Core) node using custom installer
kubelet and CNI binaries are extracted from k8s.gcr.io/hyperkube-amd64:v1.16.11 image.
Everything works, except kube-proxy (and, obviously, anything that depends on it).
It spams log with the message:

E0618 09:42:42.811741       7 proxier.go:1418] Failed to execute iptables-restore: exit status 4 (iptables-restore v1.8.2 (nf_tables): 
line 86: CHAIN_USER_DEL failed (Device or resource busy): chain KUBE-SEP-ARQWXEG2ZEM2P3WE
)

Service IP are not available from this node.

What you expected to happen:
kube-proxy works.

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:
This does NOT happen on 1.16.11/Ubuntu 18.04 and 1.16.10/Centos 7 on the same OS image.
Looks like iptables native/legacy mode detection is somehow broken on CentOS 7.

Environment:

  • Kubernetes version (use kubectl version): v1.16.11
  • Cloud provider or hardware configuration: vSphere, 4 CPU / 8192 MB RAM
  • OS (e.g: cat /etc/os-release): CentOS Linux release 7.7.1908 (Core)
  • Kernel (e.g. uname -a): 3.10.0-1062.9.1.el7.x86_64 Unit test coverage in Kubelet is lousy. (~30%) #1 SMP
  • Install tools: custom
  • Network plugin and version (if this is a network-related bug):
  • Others:
@dmitry-irtegov dmitry-irtegov added the kind/bug Categorizes issue or PR as related to a bug. label Jun 18, 2020
@k8s-ci-robot k8s-ci-robot added the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Jun 18, 2020
@athenabot
Copy link

/sig network

These SIGs are my best guesses for this issue. Please comment /remove-sig <name> if I am incorrect about one.

🤖 I am a bot run by vllry. 👩‍🔬

@k8s-ci-robot k8s-ci-robot added sig/network Categorizes an issue or PR as relevant to SIG Network. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Jun 18, 2020
@justaugustus
Copy link
Member

/sig release
/area release-eng
/priority critical-urgent
cc: @kubernetes/sig-network-bugs @kubernetes/release-engineering

@k8s-ci-robot k8s-ci-robot added sig/release Categorizes an issue or PR as relevant to SIG Release. area/release-eng Issues or PRs related to the Release Engineering subproject priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. labels Jun 18, 2020
@athenabot
Copy link

/triage unresolved

Comment /remove-triage unresolved when the issue is assessed and confirmed.

🤖 I am a bot run by vllry. 👩‍🔬

@k8s-ci-robot k8s-ci-robot added the triage/unresolved Indicates an issue that can not or will not be resolved. label Jun 18, 2020
@superseb
Copy link

What binaries are you extracting? The base image for hyperkube got bumped from v1.16.10 to v1.16.11 (and 1.17.7 and 1.18.4) from Debian 9 (stretch) to Debian 10 (buster). And Debian 10 has uses nftables by default (https://wiki.debian.org/iptables). Still depends on what you are extracting from the hyperkube image if this is what is affecting you.

@dmitry-irtegov
Copy link
Author

dmitry-irtegov commented Jun 18, 2020

What binaries are you extracting?

/opt/cni/* and /hyperkube for k8s 1.16.*
/opt/cni/* and /usr/local/bin/kubelet for k8s 1.17 and 18

We do NOT extract iptables binary, and the error is obviously produced by kube-proxy container which we do not touch.

@aojea
Copy link
Member

aojea commented Jun 18, 2020

This seems a duplicate of #71305 (comment)

iptables-restore: exit status 4 (iptables-restore v1.8.2 (nf_tables):

@dmitry-irtegov
Copy link
Author

This is not duplicate, because CentOS 7 does NOT use nf_tables, and that issue about machines which do (say, CentOS 8).

@aojea
Copy link
Member

aojea commented Jun 18, 2020

well, that's what the output you pasted says 😄 it seems it is using iptables-nft instead of iptables-legacy, that's what I assumed it was the same issue 😅

E0618 09:42:42.811741 7 proxier.go:1418] Failed to execute iptables-restore: exit status 4 (iptables-restore v1.8.2 (nf_tables):

@justaugustus
Copy link
Member

/assign
I'm attempting a fix in #92354.

@justaugustus
Copy link
Member

We've published v1.18.5-rc.1, v1.17.8-rc.1, and v1.16.12-rc.1, which include new hyperkube images.

Can you test these and let us know if this resolves your issue?
We're holding off on any patch releases until we get feedback here, so please let report back when you can.

cc: @kubernetes/release-engineering

@justaugustus justaugustus changed the title kube-proxy 1.16.11 does not work on CentOS7 hyperkube: kube-proxy 1.16.11 does not work on CentOS7 Jun 26, 2020
@justaugustus
Copy link
Member

k8s.gcr.io/hyperkube:v1.16.12 is live, as part of https://github.com/kubernetes/kubernetes/releases/tag/v1.16.12, so this should be resolved now.

Other patches:
https://github.com/kubernetes/kubernetes/releases/tag/v1.18.5 --> k8s.gcr.io/hyperkube:v1.18.5
https://github.com/kubernetes/kubernetes/releases/tag/v1.17.8 --> k8s.gcr.io/hyperkube:v1.17.8

/close

@k8s-ci-robot
Copy link
Contributor

@justaugustus: Closing this issue.

In response to this:

k8s.gcr.io/hyperkube:v1.16.12 is live, as part of https://github.com/kubernetes/kubernetes/releases/tag/v1.16.12, so this should be resolved now.

Other patches:
https://github.com/kubernetes/kubernetes/releases/tag/v1.18.5 --> k8s.gcr.io/hyperkube:v1.18.5
https://github.com/kubernetes/kubernetes/releases/tag/v1.17.8 --> k8s.gcr.io/hyperkube:v1.17.8

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@dmitry-irtegov
Copy link
Author

dmitry-irtegov commented Jun 29, 2020

Sorry, tested already on 1.16.12. Yes, it does solve the issue.

Thank you!!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/release-eng Issues or PRs related to the Release Engineering subproject kind/bug Categorizes issue or PR as related to a bug. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. sig/network Categorizes an issue or PR as relevant to SIG Network. sig/release Categorizes an issue or PR as relevant to SIG Release. triage/unresolved Indicates an issue that can not or will not be resolved.
Projects
None yet
Development

No branches or pull requests

6 participants