Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Job for k3s.service failed because the control process exited with error code #556

Closed
Aliabbask08 opened this issue Jun 20, 2019 · 46 comments

Comments

@Aliabbask08
Copy link

Hello Team,

Trying to run k3s cluster on raspberrypi using official doc but causing this issue.
● k3s.service - Lightweight Kubernetes
Loaded: loaded (/etc/systemd/system/k3s.service; enabled; vendor preset: enabled)
Active: failed (Result: exit-code) since Thu 2019-06-20 12:18:07 UTC; 4min 13s ago
Docs: https://k3s.io
Process: 1722 ExecStart=/usr/local/bin/k3s server (code=exited, status=1/FAILURE)
Process: 1719 ExecStartPre=/sbin/modprobe overlay (code=exited, status=0/SUCCESS)
Process: 1716 ExecStartPre=/sbin/modprobe br_netfilter (code=exited, status=0/SUCCESS)
Main PID: 1722 (code=exited, status=1/FAILURE)
CPU: 2.150s

Jun 20 12:18:06 master systemd[1]: k3s.service: Unit entered failed state.
Jun 20 12:18:06 master systemd[1]: k3s.service: Failed with result 'exit-code'.
Jun 20 12:18:07 master systemd[1]: k3s.service: Service hold-off time over, scheduling restart.
Jun 20 12:18:07 master systemd[1]: Stopped Lightweight Kubernetes.
Jun 20 12:18:07 master systemd[1]: k3s.service: Start request repeated too quickly.
Jun 20 12:18:07 master systemd[1]: Failed to start Lightweight Kubernetes.
Jun 20 12:18:07 master systemd[1]: k3s.service: Unit entered failed state.
Jun 20 12:18:07 master systemd[1]: k3s.service: Failed with result 'exit-code'.

@erikwilson
Copy link
Contributor

Are you able to provide some more information such as how you are installing and the k3s logs? When using systemd logs you should be able to find the logs in /var/log/syslog or using journalctl -u k3s.service

@Aliabbask08
Copy link
Author

Screen Shot 2019-06-20 at 7 37 55 PM

@Aliabbask08
Copy link
Author

Screen Shot 2019-06-20 at 7 38 49 PM
This is journalctl -u k3s.service logs

@erikwilson
Copy link
Contributor

Thanks! The screenshots make it hard to work with, if you can copy & paste the complete line where it says level=fatal msg="starting tls server:, hopefully there is more info where the text is cut off.

@Aliabbask08
Copy link
Author

@erikwilson I followed below steps:

  1. Edit /boot/cmdline.txt and add cgroup_enable=cpuset cgroup_memory=1 cgroup_enable=memory
  2. curl -sfL https://get.k3s.io | sh -
    after this step I'm getting this error

@Aliabbask08
Copy link
Author

Thanks! The screenshots make it hard to work with, if you can copy & paste the complete line where it says level=fatal msg="starting tls server:, hopefully there is more info where the text is cut off.

level=fatal msg="starting tls server: Get https://localhost:6444/apis/apiextensions.k8s.io/v1beta1/customresourcedefinitions: dial tcp [::1]:6444: connect: connection refused"

@tdewitt
Copy link

tdewitt commented Jun 28, 2019

I just had two fresh raspbian lite installed Pi 3 B+ nodes become non responsive after installing k3s. The installer runs, the service starts and the nodes die almost immediately. I let them sit overnight, in case they were just locked up temporarily but they're dead. If I reboot, I have a ~30 second window to get in and stop k3s before it becomes non responsive again.

I've tried with and without cgroup_enable=cpuset cgroup_memory=1 cgroup_enable=memory appended to cmdline.txt, with no change.

@erikwilson
Copy link
Contributor

Thanks for the info @Aliabbask08, is it possible to share the output of netstat -tlnp?

Are you able to find any more info from the logs @tdewitt or how it dies? Kubectl commands work initially but then... hang or produce an error?

If it is possible to try out v0.7.0-rc1, I am curious if it helps with the issue at all.

@tdewitt
Copy link

tdewitt commented Jun 28, 2019

The entire node dies about 20s after service startup. I measured from when ansible completes (using setup in contrib now so I can do things concurrently) until I'm no longer receiving ping replies. This is with 0.6.1. I can try with 0.7.0-rc1 in a little while.

Service startup logs here: https://gist.github.com/tdewitt/bb2031446aa9b309e92ec0b7628bf98f

@tdewitt
Copy link

tdewitt commented Jun 28, 2019

Just tried with 0.7.0-rc1. Same results. Node service seems to be fine. Master dies.

@tdewitt
Copy link

tdewitt commented Jun 28, 2019

Swap disabled. Looked OK but turns out it's not. This is everything before it dies: https://gist.github.com/tdewitt/75e5342f85b3f6f9d0f5ba3af2d1d685

@tdewitt
Copy link

tdewitt commented Jun 28, 2019

My problem was networking. My local network collides with the default networks in k3s. Moved them to a couple new blocks and all is well. Thanks @erikwilson for helping me work this out.

@skarlekar
Copy link

skarlekar commented Sep 23, 2019

@tdewitt Can you kindly explain what you did to resolve this issue? I am having the same failure while starting up the master. Any pointers will be appreciated.

I have edited /etc/dhcpcd.conf and set the static ip as follows:

sudo cat >> /etc/dhcpcd.conf
interface eth0
static ip_address=192.168.1.52/24
static routers=192.168.1.1
static domain_name_servers=192.168.1.1

Error as follows:

● k3s.service - Lightweight Kubernetes
   Loaded: loaded (/etc/systemd/system/k3s.service; enabled; vendor preset: enabled)
   Active: activating (auto-restart) (Result: exit-code) since Sun 2019-09-22 21:21:09 EDT; 4s ago
     Docs: https://k3s.io
  Process: 19579 ExecStartPre=/sbin/modprobe br_netfilter (code=exited, status=0/SUCCESS)
  Process: 19580 ExecStartPre=/sbin/modprobe overlay (code=exited, status=0/SUCCESS)
  Process: 19581 ExecStart=/usr/local/bin/k3s server --write-kubeconfig-mode 644 KillMode=process (code=exited, status=1/FAILURE)
 Main PID: 19581 (code=exited, status=1/FAILURE)

@tdewitt
Copy link

tdewitt commented Sep 24, 2019 via email

@Southporter
Copy link

I'm running into the same issue as @Aliabbask08 . I'm trying to setup the server on a RPI 2B+ and keep getting the error:

level=fatal msg="starting tls server: Get https://localhost:6444/apis/apiextensions.k8s.io/v1beta1/customresourcedefinitions: dial tcp [::1]:6444: connect: connection refused"

I'm using version v0.10.0

netstat -tlnp produces the following:
Screenshot from 2019-10-23 22-46-26

journalctl -u k3s.service:
Screenshot from 2019-10-23 22-44-24

@Aliabbask08
Copy link
Author

@ssedrick can u please share the steps you follow?
and what are the changes u made in 1. /etc/hosts and 2. /boot/cmdline.txt

@Southporter
Copy link

Southporter commented Oct 24, 2019

I had a clean install of Raspbian Buster. From there I ran the instructions on https://www.k3s.io. Didn't work.

I've done a little bit of troubleshooting, and followed the k3sup project https://k3sup.dev, and that worked. It is using 0.9.1

/etc/hosts:

127.0.0.1	localhost
::1		localhost ip6-localhost ip6-loopback
ff02::1		ip6-allnodes
ff02::2		ip6-allrouters

127.0.1.1	k3s-master

/boot/cmdline.txt
console=serial0,115200 console=tty1 root=PARTUUID=6c586e13-02 rootfstype=ext4 elevator=deadline fsck.repair=yes rootwait cgroup_enable=cpuset cgroup_memory=1 cgroup_enable=memory

@otto-dev
Copy link

otto-dev commented Oct 25, 2019

I'm running into what I believe to be the same issue, except that my error message is slightly different. I'm getting

[...] level=fatal msg="starting tls server: Get https://127.0.0.1:6444/apis/apiextensions.k8s.io/v1beta1/customresourcedefinitions: net/http: TLS handshake timeout"

More context:

Oct 25 13:06:36 alpha1.lan k3s[7877]: I1025 13:06:36.699054    7877 plugins.go:161] Loaded 7 validating admission controller(s) successfully in the following order: LimitRanger,ServiceAccount,Priority,PersistentVolumeClaimResize,ValidatingAdmissionWebhook,RuntimeClass,ResourceQuota.
Oct 25 13:06:37 alpha1.lan k3s[7877]: time="2019-10-25T13:06:37.216363858+01:00" level=info msg="Running kube-controller-manager --allocate-node-cidrs=true --bind-address=127.0.0.1 --cluster-cidr=10.42.0.0/16 --cluster-signing-cert-file=/var/lib/rancher/k3s/server/tls/server-ca.crt --cluster-signing-key-file=/var/lib/rancher/k3s/server/tls/server-ca.key --kubeconfig=/var/lib/rancher/k3s/server/cred/controller.kubeconfig --leader-elect=false --port=10252 --root-ca-file=/var/lib/rancher/k3s/server/tls/server-ca.crt --secure-port=0 --service-account-private-key-file=/var/lib/rancher/k3s/server/tls/service.key --use-service-account-credentials=true"
Oct 25 13:06:37 alpha1.lan k3s[7877]: time="2019-10-25T13:06:37.228451710+01:00" level=info msg="Running kube-scheduler --bind-address=127.0.0.1 --kubeconfig=/var/lib/rancher/k3s/server/cred/scheduler.kubeconfig --leader-elect=false --port=10251 --secure-port=0"
Oct 25 13:06:38 alpha1.lan k3s[7877]: I1025 13:06:38.900959    7877 server.go:143] Version: v1.16.2-k3s.1
Oct 25 13:06:38 alpha1.lan k3s[7877]: I1025 13:06:38.905452    7877 defaults.go:91] TaintNodesByCondition is enabled, PodToleratesNodeTaints predicate is mandatory
Oct 25 13:06:39 alpha1.lan k3s[7877]: I1025 13:06:39.029636    7877 controllermanager.go:161] Version: v1.16.2-k3s.1
Oct 25 13:06:39 alpha1.lan k3s[7877]: W1025 13:06:39.049020    7877 authorization.go:47] Authorization is disabled
Oct 25 13:06:39 alpha1.lan k3s[7877]: W1025 13:06:39.051023    7877 authentication.go:79] Authentication is disabled
Oct 25 13:06:39 alpha1.lan k3s[7877]: I1025 13:06:39.054275    7877 deprecated_insecure_serving.go:51] Serving healthz insecurely on [::]:10251
Oct 25 13:06:39 alpha1.lan k3s[7877]: I1025 13:06:39.191349    7877 deprecated_insecure_serving.go:53] Serving insecurely on [::]:10252
Oct 25 13:06:48 alpha1.lan k3s[7877]: time="2019-10-25T13:06:48.187072448+01:00" level=fatal msg="starting tls server: Get https://127.0.0.1:6444/apis/apiextensions.k8s.io/v1beta1/customresourcedefinitions: net/http: TLS handshake timeout"
Oct 25 13:06:48 alpha1.lan systemd[1]: k3s.service: Main process exited, code=exited, status=1/FAILURE

Both on Raspberry Pi 3 and Zero W.

@mfriedenhagen
Copy link

Just another me-too without any further (real) clues.

  • System is a raspberrypi 3+.
  • I tried updating from 0.8.1 to 0.10.0 using systemctl stop k3s.service && curl -sfL https://get.k3s.io | sh -s - server --no-deploy=traefik, got above message.
  • Then I did a complete k3s-uninstall. That did not help, got above message again.
  • Then I just replaced 0.10.0 executable with 0.8.1. Now k3s is up and running again. I had to reapply my tillerless helm charts again and everything was fine.

@otto-dev
Copy link

otto-dev commented Oct 25, 2019

Can confirm that the issue does not occur with v0.9.1. It's a regression. Maybe we should open a new issue for this?

Tested on RPI B+ and Zero W

@erikwilson
Copy link
Contributor

erikwilson commented Oct 25, 2019

I am tracking the arm issue in #939, hopefully this is fixed with https://github.com/rancher/k3s/releases/tag/v0.10.1-rc1 and v0.10.1 will be released here shortly.

@erikwilson
Copy link
Contributor

Sorry the previous referenced issue was for a segfault, #869 & #970 also have logs for "TLS handshake timeout"

@madyasiwi
Copy link

Can confirm that the issue does not occur with v0.9.1. It's a regression. Maybe we should open a new issue for this?

Tested on RPI B+ and Zero W

Having similar case with Armbian on OrangePi One/PC.

@manhluong
Copy link

Got similar error on Archlinux, Raspberry PI 3B+, latest version of k3s (v0.10.2).

Uninstalled and reinstalled v0.9.1 with:

curl -sfL https://get.k3s.io | INSTALL_K3S_VERSION=v0.9.1 sh -

k3s service started with no issues.

@Rambou
Copy link

Rambou commented Nov 2, 2019

Got similar error on Archlinux, Raspberry PI 3B+, latest version of k3s (v0.10.2).

Uninstalled and reinstalled v0.9.1 with:

curl -sfL https://get.k3s.io | INSTALL_K3S_VERSION=v0.9.1 sh -

k3s service started with no issues.

I've just tested v0.9.1 with Raspberry Pi 3B and Works too! 0.10.2 fails

@gvanderberg
Copy link

downgrading to k3s version 0.9.1 worked for me too.

Running on RPi 3B+ with OS:

Distributor ID: Raspbian
Description:    Raspbian GNU/Linux 10 (buster)
Release:        10
Codename:       buster

The error I got on version 0.10.2 and 0.10.0 was starting tls server: Get https://127.0.0.1:6444/apis/apiextensions.k8s.io/v1beta1/customresourcedefinitions: net/http: TLS handshake timeout

@xiaods
Copy link
Contributor

xiaods commented Nov 10, 2019

Running on RPi 3B+ with OS:

$ k3s --version
k3s version v0.10.2 (8833bfd9)

i setup a airdrop environment and networking is completely disabled. so i have to add below add a default route:

sudo ip -c address add 192.168.123.123/24 dev eno1
sudo ip route add default via 192.168.123.1

then sudo k3s server , it finally raised blow error:

https://127.0.0.1:6444/apis/apiextensions.k8s.io/v1beta1/customresourcedefinitions: net/http: TLS handshake timeout

don't know what reason.

@b0nete
Copy link

b0nete commented Nov 13, 2019

Got similar error on Archlinux, Raspberry PI 3B+, latest version of k3s (v0.10.2).

Uninstalled and reinstalled v0.9.1 with:

curl -sfL https://get.k3s.io | INSTALL_K3S_VERSION=v0.9.1 sh -

k3s service started with no issues.

Same for here.
Also with Raspberry PI 3B+ and Arch ARM.
Uninstall k3s 0.10.2 and install 0.9.1 and now works!

@drestauri
Copy link

I had a similar error I was working through for hours and it turned out I needed to update this file on my agent node:
/etc/systemd/system/k3s-agent.service.env
Replace the token (even if it appears to be the same) and make sure the K3S_URL matches your server node's IP address and port including the https:// prefix. For example:
K3S_TOKEN=<your_token>
K3S_URL=https://<server_ip>:6443

@alepee
Copy link

alepee commented Nov 15, 2020

Just ran into the same issue with k3s version v1.19.3+k3s3 (0e4fbfef) on RPI4 2GB with a fresh RaspberryOS install. I get it up and running by enable CGROUPS through cmdline instruction.

cgroup_enable=cpuset cgroup_memory=1 cgroup_enable=memory

@mfriedenhagen
Copy link

I had problems with Raspbian "Buster" because it updated the kernel to major 5. Going back to Linux kernel 4 fixed this for me.

@alepee
Copy link

alepee commented Nov 17, 2020

@mfriedenhagen could it be linked to iptables / nftables issue ?
https://rancher.com/docs/k3s/latest/en/advanced/#enabling-legacy-iptables-on-raspbian-buster

@mfriedenhagen
Copy link

Hello @alepee , indeed, thanks for the link. I think I will give this a try together with kernel 5.

@mfriedenhagen
Copy link

Hm, I already tried this back in the day.

  339  update-alternatives
  340  update-alternatives iptables
  341  update-alternatives --query iptables
  342  update-alternatives --set iptables
  343  update-alternatives --set iptables /usr/sbin/iptables-legacy
  344  iptables --version
  345  k3s check-config

but it did not work with the new kernel back then.
What I had to do was restore the kernel back to "Linux raspberrypi 4.19.118-v7+ #1311 SMP Mon Apr 27 14:21:24 BST 2020 armv7l GNU/Linux" by running:
rpi-update e1050e94821a70b2e4c72b318d6c6c968552e9a2

@mfriedenhagen
Copy link

  • I just gave it another try following and ran apt-get update ; apt-get -y dist-upgrade.
  • Before rebooting, I ran systemctl stop k3s.service ; /usr/local/bin/k3s-killall.sh to really stop everything
  • Now I am on "Linux raspberrypi 5.4.72-v7+ Systemctl start does not return on HA server install #1356 SMP Thu Oct 22 13:56:54 BST 2020 armv7l GNU/Linux" and everything seems to be up and running.
  • So basically, this issue may be another problem, sorry for the noise.
  • Thanks again @alepee.

@johnrkriter
Copy link

Ran the following and resulted in successful deployment

  • Pi 4B 4Gb
  • fresh raspbian install based on 2020-08-20-raspios-buster-armhf-lite
  • Full upgrade to all system components
  • modify /boot/cmdline.txt to add cgroup_enable=cpuset cgroup_enable=memory cgroup_memory=1
  • execute curl -sfL https://get.k3s.io | sh - and install version v1.19.3+k3s3
  • installation works fine, and results in a working master, same as @alepee

@brandond
Copy link
Member

brandond commented Dec 5, 2020

Closing due to age. Anyone experiencing similar problems should open a new issue and fill out the template.

@brandond brandond closed this as completed Dec 5, 2020
@cianiandreadev
Copy link

Don't know if it may help but I fixed this issue by adding these at the end of /boot/firmware/cmdline.txt

[...]cgroup_enable=cpuset cgroup_memory=1 cgroup_enable=memory

@quangthe
Copy link

quangthe commented Jul 2, 2021

Got similar error on Archlinux, Raspberry PI 3B+, latest version of k3s (v0.10.2).

Uninstalled and reinstalled v0.9.1 with:

curl -sfL https://get.k3s.io | INSTALL_K3S_VERSION=v0.9.1 sh -

k3s service started with no issues.

Thanks it saves me a day!

@brandond
Copy link
Member

brandond commented Jul 2, 2021

That's a really old version of k3s, I wouldn't recommend using it.

@quangthe
Copy link

quangthe commented Jul 3, 2021

That's a really old version of k3s, I wouldn't recommend using it.

Luckily, after uninstall the v0.9.1, and try with latest version again, now it works! Thanks @brandond

@tylerharpool
Copy link

@quangthe Do you have any idea why we need to install v0.9.1 before the latest version to get it to work? Surely this is a bug.

@brandond
Copy link
Member

brandond commented Aug 25, 2021

They didn't have to install the old version first... they're saying that the new version worked for them where the old one did not, and that they uninstalled the old version before trying the new version.

@tylerharpool
Copy link

tylerharpool commented Sep 13, 2021

@brandond Thank you for correcting my misunderstanding. Was able to get it working by upgrading:

Installed:
  kernel-4.18.0-305.17.1.el8_4.x86_64          kernel-core-4.18.0-305.17.1.el8_4.x86_64 
  kernel-modules-4.18.0-305.17.1.el8_4.x86_64 
Removed:
  kernel-4.18.0-240.10.1.el8_3.x86_64          kernel-core-4.18.0-240.10.1.el8_3.x86_64 
  kernel-modules-4.18.0-240.10.1.el8_3.x86_64 

I also updated selinux policy on RHEL8:
selinux-policy-3.14.3-67.el8_4.1.noarch

@doormat18
Copy link

Had similar issue on RHEL 9 running on AWS, fixed it with this.

systemctl disable nm-cloud-setup.service nm-cloud-setup.timer
reboot

@brandond
Copy link
Member

This is covered here: https://docs.k3s.io/installation/requirements?os=rhel#operating-systems

But in general please do not bump years old issues with unrelated comments.

@k3s-io k3s-io locked and limited conversation to collaborators Jul 25, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests