Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build ISO minikube image for ARM (aarch64) #9228

Closed
afbjorklund opened this issue Sep 12, 2020 · 35 comments · Fixed by #13762
Closed

Build ISO minikube image for ARM (aarch64) #9228

afbjorklund opened this issue Sep 12, 2020 · 35 comments · Fixed by #13762
Assignees
Labels
area/guest-vm General configuration issues with the minikube guest VM co/kvm2-driver KVM2 driver related issues kind/feature Categorizes issue or PR as related to a new feature. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete.
Milestone

Comments

@afbjorklund
Copy link
Collaborator

If we want to run any hypervisor driver on arm64, we need a new "minikube.iso" that works on the architecture.

Currently the image is full of amd64 binaries that we download from other places, so it does not build from source.

Buildroot does support the ARM architectures (armv7, arm64).

For instance the Raspberry Pi OS still uses 32-bit by default...

@afbjorklund afbjorklund added kind/feature Categorizes issue or PR as related to a new feature. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. area/guest-vm General configuration issues with the minikube guest VM labels Sep 12, 2020
@afbjorklund
Copy link
Collaborator Author

Needed for #6159 and #9224

@afbjorklund
Copy link
Collaborator Author

afbjorklund commented Nov 15, 2020

Looked into this a bit, and there is no syslinux support on the arm64 platform - so it is unlikely to be an ISO-9660.

BR2_TARGET_ROOTFS_ISO9660=y
BR2_TARGET_SYSLINUX=y

It would work the same way though, but more like a "minikube.img". The nearest config is qemu_aarch64_virt_defconfig

BR2_TARGET_ROOTFS_EXT2=y
BR2_TARGET_ROOTFS_EXT2_4=y

So far I have used Raspberry Pi, and it has a custom bootloader and a custom config.

The other real hardware is using "Das U-Boot", but we only need something for a VM...

BR2_TARGET_ROOTFS_CPIO=y
BR2_TARGET_ROOTFS_CPIO_GZIP=y

@afbjorklund
Copy link
Collaborator Author

Here is how it starts QEMU by default:

amd64

make qemu_x86_64_defconfig world
qemu-system-x86_64 -M pc -kernel output/images/bzImage -drive file=output/images/rootfs.ext2,if=virtio,format=raw -append "rootwait root=/dev/vda console=tty1 console=ttyS0" -serial stdio -net nic,model=virtio -net user

  • output/images/bzImage
  • output/images/rootfs.ext2

arm64

make qemu_aarch64_virt_defconfig world
qemu-system-aarch64 -M virt -cpu cortex-a53 -nographic -smp 1 -kernel output/images/Image -append "rootwait root=/dev/vda console=ttyAMA0" -netdev user,id=eth0 -device virtio-net-device,netdev=eth0 -drive file=output/images/rootfs.ext4,if=none,format=raw,id=hd0 -device virtio-blk-device,drive=hd0

  • output/images/Image
  • output/images/rootfs.ext4

So what is needed is a nice way to bundle the kernel (Image) and the rootfs (initrd.cpio.gz) into one disk image.

The buildroot "genimage" script could help with this, perhaps. Then we just need some simple bootloader for it...

@afbjorklund
Copy link
Collaborator Author

afbjorklund commented Nov 15, 2020

Using grub2 with efi seems to be the simplest, since it has built-in support (unlike syslinux or gummiboot).

boot/grub2/readme.txt

qemu-grub

amd64

board/pc
configs/pc_x86_64_efi_defconfig

qemu-system-x86_64 \
	-M pc \
	-bios </path/to/OVMF_CODE.fd> \
	-drive file=output/images/disk.img,if=virtio,format=raw \
	-net nic,model=virtio \
	-net user

ovmf: /usr/share/OVMF/OVMF_CODE.fd

arm64

board/aarch64-efi/
configs/aarch64_efi_defconfig

qemu-system-aarch64 \
	-M virt \
	-cpu cortex-a57 \
	-m 512 \
	-nographic \
	-bios </path/to/QEMU_EFI.fd> \
	-drive file=output/images/disk.img,if=none,format=raw,id=hd0 \
	-device virtio-blk-device,drive=hd0 \
	-netdev user,id=eth0 \
	-device virtio-net-device,netdev=eth0

qemu-efi-aarch64: /usr/share/qemu-efi-aarch64/QEMU_EFI.fd

For now we will continue without a root partition, since minikube assumes that it will be running from tmpfs.

@afbjorklund afbjorklund self-assigned this Nov 15, 2020
@afbjorklund
Copy link
Collaborator Author

afbjorklund commented Nov 15, 2020

Here are the reference board sizes:

4.9M	output/images/bzImage
88M	output/images/rootfs.ext2
93M	output/images/disk-amd64.img
20M	output/images/Image
42M	output/images/rootfs.ext4
63M	output/images/disk-arm64.img

Will put them up as a separate project.

See: https://github.com/afbjorklund/minimal-buildroot


PS. They are both using ext4, just have different filenames when generating (and then a symlink between them)
Comes in handy when having both in the same file structure, just as with the compressed/uncompressed kernel...

For the real minikube OS, we have the files in an initrd so it will go on a "boot" partition instead of the "root" partition.
Then the init will create a tmpfs (needed for containers), and copy everything from the rootfs and do a switch_root.

Device                        Start    End Sectors  Size Type
output/images/disk-amd64.img1    64  32831   32768   16M EFI System
output/images/disk-amd64.img2 32832 278591  245760  120M Linux root (x86)
Device                        Boot Start    End Sectors  Size Id Type
output/images/disk-arm64.img1          1  65536   65536   32M ef EFI (FAT-12/16/32)
output/images/disk-arm64.img2      65537 475136  409600  200M 83 Linux

@bluestealth
Copy link
Contributor

@afbjorklund It is possible to create an iso image to boot on arm64 and (hybrid bios/uefi) amd64 if you wanted to maintain backwards compatibility, and not require UEFI firmware on Intel. I have this working in a branch.
I don't have any access to arm64 boards that support kvm, so I had to test this in qemu so far. https://github.com/kubernetes/minikube/compare/master...bluestealth:arm64?expand=1

@afbjorklund
Copy link
Collaborator Author

@bluestealth : nice, seems like you have already gotten started on it. I saw that debian had a livecd also for arm64, so it should be possible. that will probably make it easier to interface with libmachine, but it seems like your kvm2 driver needed some hacks in it...
will take a look at your branch later, so far it seems that replacing isolinux with grub is pretty much all that is needed for booting. I'm just using Raspberry Pi at the moment, but trying to get something different together for the arm64 hackathon later this week.

@bluestealth
Copy link
Contributor

bluestealth commented Nov 18, 2020

@afbjorklund Yes, I used the debian documentation to get it working, which is really good. Some of my KVM hacks were to get it working with Qemu, I have even more hacks in another branch to allow minikube to work as a client to foreign architectures, which is kind a mess.

@afbjorklund
Copy link
Collaborator Author

afbjorklund commented Nov 18, 2020

I put the client to a remote server in a different issue (#9593), it would still be useful but I think we will handle it separately.

@afbjorklund
Copy link
Collaborator Author

Will see if I can add a ISO target to the "hello world"... Then look more at the other changes, join Slack (k8s/cncf) to chat

@tstromberg
Copy link
Contributor

I'd love to try this on my Raspberry Pi. What all is it going to take, as far as we know today?

Here's my naive assumption:

  • Update Makefiles to allow a difference between architecture outputs
  • Plumb target arch from Makefile to buildroot scripts
  • Figure out the boot story
  • Change some of our packages that download binaries into not running at all, or grabbing the correct architecture

@bluestealth - it looks like you've poured a lot of work into your fork -- what's left before we can start playing with it?

@afbjorklund
Copy link
Collaborator Author

afbjorklund commented Jan 15, 2021

Figure out the boot story

We played with this during Kubecon, and there was no problems as long as you shifted over to GRUB (from isolinux)

It was possible to use that (efi) for amd64 as well, if we wanted to have the same bootloader for both architectures...

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 15, 2021
@medyagh medyagh added priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. and removed priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. labels May 3, 2021
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten

@troian
Copy link

troian commented Oct 28, 2021

any chances to use even beta version of the image? m1 macbooks pro are out and it would be good to use parallels driver with minukube

@spowelljr spowelljr modified the milestones: 1.24.0, 1.25.0-candidate Nov 5, 2021
@klaases
Copy link
Contributor

klaases commented Nov 17, 2021

/assign

@spowelljr spowelljr added priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. and removed priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. labels Dec 29, 2021
@afbjorklund afbjorklund self-assigned this Feb 22, 2022
@sharifelgamal
Copy link
Collaborator

sharifelgamal commented Apr 11, 2022

This will be the issue where I track all our combined ISO progress, for both x86_64 EFI and aarch64.

The PR where I'm testing (and where the change eventually should end up) is #13762.

Things that have already happened:

  • Refactored the deploy/iso/minikube-iso directory to account for two different ISOs and for clarity
  • Modified each iso board config to use grub2 instead of isolinux (to support UEFI bootloading)
  • For aarch64, removed sysdig kernel module so things will, uh, build
  • Added a CHANGELOG to the ISO for future auditing
  • Updated the Makefile to allow building both ISOs, and making all associated automation work with the new ISOs
  • Fixed the KVM2 and Hyperkit driver code to work with the new ISOs

Things that still need to happen:

  • Figure out if AppArmor is an issue for x86_64 (workaround currently is to suppress AppArmor)
  • Figure out why QEMU can't find the ISO's initrd
  • Test the aarch64 ISO with the new QEMU2 minikube driver (Add the qemu2 driver to the minikube registry #13639)
  • Make sure the KVM2 driver reads the nvram variable location for OVMF from config rather than going with the default (workaround is to create a symbolic link with the expected file name)
  • Figure out KVM2 networking issue

@sharifelgamal
Copy link
Collaborator

Current initrd failure:
Screen Shot 2022-04-11 at 5 13 55 PM

@sharifelgamal
Copy link
Collaborator

The above error was fixed by giving QEMU more memory allocation. The regulatory db error is spurious.

@sharifelgamal
Copy link
Collaborator

The networking issue for the amd64 KVM2 driver was an issue with the machine I was testing on rather than an issue with our configuration. I can verify the ISO works properly. The AppArmor issue remains, which seems to be an issue with the version of libvirt we are using and its incompatibility with UEFI.

@sharifelgamal
Copy link
Collaborator

The workaround for the AppArmor issue is to disable AppArmor as a security driver for libvirt. This is extremely not recommended for actual use, but for my current debugging it's been useful.

Modify /etc/libvirt/qemu.conf (which needs root), uncomment the line that starts with security_driver = and make it security_driver = "none". Again, this is extremely unadvisable as a permanent solution.

After the config file is saved, restart libvirt: sudo systemctl restart libvirtd. Now AppArmor will be disabled for libvirt and you can go on your merry way.

For the record, we plan on fixing this issue (which I believe is related to the verison of libvirt we're using).

@sharifelgamal
Copy link
Collaborator

Current error with Hyper-V:
Screenshot 2022-04-14 170537

@sharifelgamal
Copy link
Collaborator

Testing the arm64 ISO on QEMU showed that the arm64 ISO rootfs isn't getting some of the systemd file copied in. That's what I'm currently looking into.

@sharifelgamal
Copy link
Collaborator

The arm64 ISO with QEMU is now booting properly, but kubeadm is crashing.

It looks like all the appropriate k8s docker images are loaded properly but are never started, so none of the essential pods can boot.

@sharifelgamal
Copy link
Collaborator

🎉 QEMU on M1 mac (almost) works!

  • coredns and storage-provisioner aren't starting properly (coredns is never marked ready and sp crashes repeatedly)
  • curl isn't getting installed, which at least breaks some of our health checks for GCR, this is confirmed by the ISO build logs skipping libcurl package

relevant logging:

Run-time dependency libcurl found: NO (tried pkgconfig and cmake)
         disabled features: libcryptsetup, PAM, pwquality, p11kit, libfido2, AUDIT, IMA, AppArmor, SELinux, SECCOMP, SMACK, zlib, xz, zstd, lz4, bzip2, ACL, gcrypt, qrencode, microhttpd, gnutls, libcurl, idn, initrd, compat-mutable-uid-boundaries, libidn2, libidn, libiptc, elfutils, binfmt, repart, vconsole, quotacheck, tmpfiles, environment.d, sysusers, firstboot, randomseed, backlight, rfkill, xdg-autostart, logind, machined, portabled, userdb, homed, importd, hostnamed, timedated, timesyncd, localed, networkd, resolve, DNS-over-TLS(gnutls), coredump, pstore, oomd, polkit, legacy pkla, efi, gnu-efi, xkbcommon, pcre2, blkid, dbus, glib, nss-myhostname, nss-mymachines, nss-resolve, nss-systemd, hwdb, tpm, man pages, html pages, man page indices, SysV compat, utmp, ldconfig, hibernate, adm group, wheel group, gshadow, debug hashmap, debug mmap cache, debug siphash, valgrind, trace logging, install tests, kernel-install
-- Using bundled curl in '/home/selgamal/minikube/out/buildroot/output-x86_64/build/falco-module-0.31.1/curl-prefix/src/curl'
-- Using SSL for curl in '--with-ssl=/home/selgamal/minikube/out/buildroot/output-x86_64/build/falco-module-0.31.1/openssl-prefix/src/openssl/target'
jenkins@45283 minikube % kc get po -A
NAMESPACE     NAME                               READY   STATUS    RESTARTS     AGE
kube-system   coredns-64897985d-pzfdq            0/1     Running   0            48s
kube-system   etcd-minikube                      1/1     Running   0            54s
kube-system   kube-apiserver-minikube            1/1     Running   0            54s
kube-system   kube-controller-manager-minikube   1/1     Running   0            54s
kube-system   kube-proxy-kc7p5                   1/1     Running   0            49s
kube-system   kube-scheduler-minikube            1/1     Running   0            71s
kube-system   storage-provisioner                1/1     Running   1 (4s ago)   44s

coredns logs:

jenkins@45283 minikube % kc logs coredns-64897985d-pzfdq -n kube-system
[INFO] plugin/ready: Still waiting on: "kubernetes"
[INFO] plugin/ready: Still waiting on: "kubernetes"
[WARNING] plugin/kubernetes: starting server with unsynced Kubernetes API
.:53
[INFO] plugin/reload: Running configuration MD5 = db32ca3650231d74073ff4cf814959a7
CoreDNS-1.8.6
linux/arm64, go1.17.1, 13a9191
[INFO] plugin/ready: Still waiting on: "kubernetes"
[INFO] plugin/ready: Still waiting on: "kubernetes"
[INFO] plugin/ready: Still waiting on: "kubernetes"
[INFO] plugin/ready: Still waiting on: "kubernetes"
[INFO] Reloading
[INFO] plugin/health: Going into lameduck mode for 5s
[INFO] plugin/ready: Still waiting on: "kubernetes,kubernetes"
[WARNING] plugin/kubernetes: starting server with unsynced Kubernetes API
[INFO] plugin/reload: Running configuration MD5 = 2fa1404210fc2611e23b3bbda829bdce
[INFO] Reloading complete
[INFO] plugin/ready: Still waiting on: "kubernetes,kubernetes"
[INFO] plugin/ready: Still waiting on: "kubernetes,kubernetes"
[INFO] plugin/ready: Still waiting on: "kubernetes,kubernetes"
[INFO] plugin/ready: Still waiting on: "kubernetes,kubernetes"
[INFO] plugin/ready: Still waiting on: "kubernetes,kubernetes"
[INFO] plugin/ready: Still waiting on: "kubernetes,kubernetes"
[INFO] plugin/ready: Still waiting on: "kubernetes,kubernetes"
[INFO] plugin/ready: Still waiting on: "kubernetes,kubernetes"
[INFO] plugin/ready: Still waiting on: "kubernetes,kubernetes"
[INFO] plugin/ready: Still waiting on: "kubernetes,kubernetes"
...

storage-prov logs:

jenkins@45283 minikube % kc logs storage-provisioner -n kube-system
I0502 21:56:51.967643       1 storage_provisioner.go:116] Initializing the minikube storage provisioner...
F0502 21:57:22.010952       1 main.go:39] error getting server version: Get "https://10.96.0.1:443/version?timeout=32s": dial tcp 10.96.0.1:443: i/o timeout

@medyagh
Copy link
Member

medyagh commented May 3, 2022

I tried with this PR and this ISO sharif gave me, and I see also corednes pod saying

https://storage.googleapis.com/minikube-builds/iso/testing/minikube-arm64.iso

  Warning  Unhealthy  5s (x9 over 62s)  kubelet            Readiness probe failed: HTTP probe failed with statuscode: 503

in kc logs of cordns it shows same thing

$ kc logs coredns-64897985d-f95sj -n kube-system
[INFO] plugin/ready: Still waiting on: "kubernetes"
[INFO] plugin/ready: Still waiting on: "kubernetes"
[INFO] plugin/ready: Still waiting on: "kubernetes"
[WARNING] plugin/kubernetes: starting server with unsynced Kubernetes API
.:53
[INFO] plugin/reload: Running configuration MD5 = 2fa1404210fc2611e23b3bbda829bdce
CoreDNS-1.8.6
linux/arm64, go1.17.1, 13a9191
[INFO] plugin/ready: Still waiting on: "kubernetes"

Storage provisioner uses the Client-Go to get client k8s but the IP it gets is 10.96.0.1:443

$ kc logs storage-provisioner -n kube-system
I0503 21:46:04.800198       1 storage_provisioner.go:116] Initializing the minikube storage provisioner...
F0503 21:46:34.915366       1 main.go:39] error getting server version: Get "https://10.96.0.1:443/version?timeout=32s": dial tcp 10.96.0.1:443: i/o timeout

in the pods I see the ip is set to "10.0.2.15" (I dont know if that is supposed to be same or not)

$ kc get pods -o wide -A
NAMESPACE     NAME                               READY   STATUS             RESTARTS        AGE   IP           NODE       NOMINATED NODE   READINESS GATES
kube-system   coredns-64897985d-wfwhs            0/1     Running            0               14m   172.17.0.3   minikube   <none>           <none>
kube-system   etcd-minikube                      1/1     Running            0               14m   10.0.2.15    minikube   <none>           <none>
kube-system   kube-apiserver-minikube            1/1     Running            0               14m   10.0.2.15    minikube   <none>           <none>
kube-system   kube-controller-manager-minikube   1/1     Running            1 (14m ago)     14m   10.0.2.15    minikube   <none>           <none>
kube-system   kube-proxy-89pbz                   1/1     Running            0               14m   10.0.2.15    minikube   <none>           <none>
kube-system   kube-scheduler-minikube            1/1     Running            0               14m   10.0.2.15    minikube   <none>           <none>
kube-system   storage-provisioner                0/1     CrashLoopBackOff   6 (3m58s ago)   13m   10.0.2.15    minikube   <none>           <none>

The IP that the client-go is trying to hit is the cluster ip that I also see in the services.

$ kubectl get service --all-namespaces
NAMESPACE     NAME         TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)                  AGE
default       kubernetes   ClusterIP   10.96.0.1    <none>        443/TCP                  17m
kube-system   kube-dns     ClusterIP   10.96.0.10   <none>        53/UDP,53/TCP,9153/TCP   17m

and I confirmed that manually I can not hit the service URL either as this command times out:

$ curl -vvv 10.96.0.1:443
* Failed to connect to 10.96.0.1 port 443 after 31434 ms: Connection timed out
curl: (28) Failed to connect to 10.96.0.1 port 443 after 31434 ms: Connection timed out

@medyagh
Copy link
Member

medyagh commented May 6, 2022

@josedonizetti suggested using tcp dump to see what if coredns is going through apiserver
here are the tcp dumps any experts that like to analsyze it?

I tired this

$ sudo docker run -it --rm --net container:k8s_kube-apiserver_kube-apiserver-minikube_kube-system_5bcf1fcc4bb4871e78ea95adc9b14d79_0 nicolaka/netshoot tcpdump -i eth0 -s 0 -Xvv tcp port 443
tcpdump: listening on eth0, link-type EN10MB (Ethernet), snapshot length 262144 bytes
22:33:10.163669 IP (tos 0x0, ttl 63, id 34220, offset 0, flags [DF], proto TCP (6), length 60)
    control-plane.minikube.internal.60846 > 10.96.0.1.443: Flags [S], cksum 0xc3f3 (correct), seq 1536750669, win 64240, options [mss 1460,sackOK,TS val 571281743 ecr 0,nop,wscale 7], length 0
	0x0000:  4500 003c 85ac 4000 3f06 9fa0 0a00 020f  E..<..@.?.......
	0x0010:  0a60 0001 edae 01bb 5b98 f44d 0000 0000  .`......[..M....
	0x0020:  a002 faf0 c3f3 0000 0204 05b4 0402 080a  ................
	0x0030:  220d 114f 0000 0000 0103 0307            "..O........
22:33:18.013246 IP (tos 0x0, ttl 64, id 1799, offset 0, flags [none], proto TCP (6), length 40)
    10.96.0.1.443 > control-plane.minikube.internal.60516: Flags [R.], cksum 0x4ba1 (correct), seq 0, ack 1898507895, win 65535, length 0
	0x0000:  4500 0028 0707 0000 4006 5d5a 0a60 0001  E..(....@.]Z.`..
	0x0010:  0a00 020f 01bb ec64 0000 0000 7128 ee77  .......d....q(.w
	0x0020:  5014 ffff 4ba1 0000                      P...K...
22:33:18.354614 IP (tos 0x0, ttl 63, id 34221, offset 0, flags [DF], proto TCP (6), length 60)
    control-plane.minikube.internal.60846 > 10.96.0.1.443: Flags [S], cksum 0xa3f3 (correct), seq 1536750669, win 64240, options [mss 1460,sackOK,TS val 571289935 ecr 0,nop,wscale 7], length 0
	0x0000:  4500 003c 85ad 4000 3f06 9f9f 0a00 020f  E..<..@.?.......
	0x0010:  0a60 0001 edae 01bb 5b98 f44d 0000 0000  .`......[..M....
	0x0020:  a002 faf0 a3f3 0000 0204 05b4 0402 080a  ................
	0x0030:  220d 314f 0000 0000 0103 0307            ".1O........

here are the ips I get from the -o wide

$ kc get pods -A -o wide
NAMESPACE     NAME                               READY   STATUS    RESTARTS      AGE   IP           NODE       NOMINATED NODE   READINESS GATES
kube-system   coredns-64897985d-29fqn            0/1     Running   0             15m   172.17.0.2   minikube   <none>           <none>
kube-system   etcd-minikube                      1/1     Running   0             15m   10.0.2.15    minikube   <none>           <none>
kube-system   kube-apiserver-minikube            1/1     Running   0             15m   10.0.2.15    minikube   <none>           <none>
kube-system   kube-controller-manager-minikube   1/1     Running   1 (15m ago)   15m   10.0.2.15    minikube   <none>           <none>
kube-system   kube-proxy-kkwsm                   1/1     Running   0             15m   10.0.2.15    minikube   <none>           <none>
kube-system   kube-scheduler-minikube            1/1     Running   0             15m   10.0.2.15    minikube   <none>           <none>

I did tcpdump on the coredns container and I get this "weird" incorrecrt things I see

$ sudo docker run -it --rm --net container:k8s_coredns_coredns-64897985d-29fqn_kube-system_3befc83e-6731-49f9-a443-286c9f827efc_0 nicolaka/netshoot tcpdump -i eth0 -s 0 -Xvv tcp         
tcpdump: listening on eth0, link-type EN10MB (Ethernet), snapshot length 262144 bytes
22:37:20.362448 IP (tos 0x0, ttl 64, id 2232, offset 0, flags [DF], proto TCP (6), length 60)
    172.17.0.1.58566 > coredns-64897985d-29fqn.8181: Flags [S], cksum 0x5854 (incorrect -> 0x02a6), seq 3358621795, win 64240, options [mss 1460,sackOK,TS val 356421557 ecr 0,nop,wscale 7], length 0
	0x0000:  4500 003c 08b8 4000 4006 d9de ac11 0001  E..<..@.@.......
	0x0010:  ac11 0002 e4c6 1ff5 c830 8063 0000 0000  .........0.c....
	0x0020:  a002 faf0 5854 0000 0204 05b4 0402 080a  ....XT..........
	0x0030:  153e 8fb5 0000 0000 0103 0307            .>..........
22:37:20.362798 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 60)
    coredns-64897985d-29fqn.8181 > 172.17.0.1.58566: Flags [S.], cksum 0x5854 (incorrect -> 0x5514), seq 1453321003, ack 3358621796, win 65160, options [mss 1460,sackOK,TS val 2819211283 ecr 356421557,nop,wscale 7], length 0
	0x0000:  4500 003c 0000 4000 4006 e296 ac11 0002  E..<..@.@.......
	0x0010:  ac11 0001 1ff5 e4c6 569f eb2b c830 8064  ........V..+.0.d
	0x0020:  a012 fe88 5854 0000 0204 05b4 0402 080a  ....XT..........
	0x0030:  a809 c013 153e 8fb5 0103 0307            .....>......
22:37:20.363054 IP (tos 0x0, ttl 64, id 2233, offset 0, flags [DF], proto TCP (6), length 52)
    172.17.0.1.58566 > coredns-64897985d-29fqn.8181: Flags [.], cksum 0x584c (incorrect -> 0x8072), seq 1, ack 1, win 502, options [nop,nop,TS val 356421558 ecr 2819211283], length 0
	0x0000:  4500 0034 08b9 4000 4006 d9e5 ac11 0001  E..4..@.@.......
	0x0010:  ac11 0002 e4c6 1ff5 c830 8064 569f eb2c  .........0.dV..,
	0x0020:  8010 01f6 584c 0000 0101 080a 153e 8fb6  ....XL.......>..
	0x0030:  a809 c013                                ....
22:37:20.372246 IP (tos 0x0, ttl 64, id 2234, offset 0, flags [DF], proto TCP (6), length 159)
    172.17.0.1.58566 > coredns-64897985d-29fqn.8181: Flags [P.], cksum 0x58b7 (incorrect -> 0xbc1c), seq 1:108, ack 1, win 502, options [nop,nop,TS val 356421567 ecr 2819211283], length 107
	0x0000:  4500 009f 08ba 4000 4006 d979 ac11 0001  E.....@.@..y....
	0x0010:  ac11 0002 e4c6 1ff5 c830 8064 569f eb2c  .........0.dV..,
	0x0020:  8018 01f6 58b7 0000 0101 080a 153e 8fbf  ....X........>..
	0x0030:  a809 c013 4745 5420 2f72 6561 6479 2048  ....GET./ready.H
	0x0040:  5454 502f 312e 310d 0a48 6f73 743a 2031  TTP/1.1..Host:.1
	0x0050:  3732 2e31 372e 302e 323a 3831 3831 0d0a  72.17.0.2:8181..
	0x0060:  5573 6572 2d41 6765 6e74 3a20 6b75 6265  User-Agent:.kube
	0x0070:  2d70 726f 6265 2f31 2e32 330d 0a41 6363  -probe/1.23..Acc
	0x0080:  6570 743a 202a 2f2a 0d0a 436f 6e6e 6563  ept:.*/*..Connec
	0x0090:  7469 6f6e 3a20 636c 6f73 650d 0a0d 0a    tion:.close....
22:37:20.372331 IP (tos 0x0, ttl 64, id 6376, offset 0, flags [DF], proto TCP (6), length 52)
    coredns-64897985d-29fqn.8181 > 172.17.0.1.58566: Flags [.], cksum 0x584c (incorrect -> 0x7fed), seq 1, ack 108, win 509, options [nop,nop,TS val 2819211293 ecr 356421567], length 0
	0x0000:  4500 0034 18e8 4000 4006 c9b6 ac11 0002  E..4..@.@.......
	0x0010:  ac11 0001 1ff5 e4c6 569f eb2c c830 80cf  ........V..,.0..
	0x0020:  8010 01fd 584c 0000 0101 080a a809 c01d  ....XL..........
	0x0030:  153e 8fbf

@prezha
Copy link
Contributor

prezha commented May 6, 2022

tl;dr: i think that the problem is not with coredns, rather, the problem is with the iptables, that is - the complete lack of k8s-related rules thereof

details:

i've modified coredns deployment to use latest image, so that it's a bit more specific about the error (should remember to revert changes back to what it was):

jenkins@45276 ~ % kubectl logs coredns-6cb8949c44-stzx7 -n kube-system
...
[INFO] plugin/ready: Still waiting on: "kubernetes"
[WARNING] plugin/kubernetes: Kubernetes API connection failure: Get "[https://10.96.0.1:443/version](https://10.96.0.1/version)": dial tcp 10.96.0.1:443: i/o timeout
[INFO] plugin/ready: Still waiting on: "kubernetes"
...
jenkins@45276 ~ % netstat -an -p tcp
Active Internet connections (including servers)
Proto Recv-Q Send-Q  Local Address          Foreign Address        (state)
tcp4       0      0  207.254.73.159.49822   10.96.0.1.443          SYN_SENT
tcp4       0      0  207.254.73.159.49821   10.96.0.1.443          SYN_SENT
tcp4       0      0  207.254.73.159.49820   10.96.0.1.443          SYN_SENT
tcp4       0      0  207.254.73.159.49819   10.96.0.1.443          SYN_SENT
...
jenkins@45276 ~ % kubectl get service -A
NAMESPACE     NAME         TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)                  AGE
default       kubernetes   ClusterIP   10.96.0.1    <none>        443/TCP                  139m
kube-system   kube-dns     ClusterIP   10.96.0.10   <none>        53/UDP,53/TCP,9153/TCP   138m
jenkins@45276 ~ % curl 10.96.0.1:443/api/
curl: (28) Failed to connect to 10.96.0.1 port 443 after 75010 ms: Operation timed out
jenkins@45276 ~ % kubectl get pod -A -o wide
NAMESPACE     NAME                               READY   STATUS             RESTARTS         AGE    IP           NODE       NOMINATED NODE   READINESS GATES
...
kube-system   kube-apiserver-minikube            1/1     Running            0                137m   10.0.2.15    minikube   <none>           <none>
...

ok, let's see what's going from the inside:

jenkins@45276 minikube % out/minikube ssh -p minikube
$ curl [https://10.96.0.1:443/api/](https://10.96.0.1/api/)
curl: (7) Failed to connect to 10.96.0.1 port 443 after 74964 ms: Connection refused

would not reach the api server via cluster ip, but it actually works:

$ curl https://10.0.2.15:8443/api/
{
  "kind": "Status",
  "apiVersion": "v1",
  "metadata": {},
  "status": "Failure",
  "message": "forbidden: User \"system:anonymous\" cannot get path \"/api/\"",
  "reason": "Forbidden",
  "details": {},
  "code": 403
}

let's check the iptables - indeed rules are missing:

$ sudo iptables -t nat -nvL
Chain PREROUTING (policy ACCEPT 278 packets, 16636 bytes)
 pkts bytes target     prot opt in     out     source               destination
  278 16636 KUBE-SERVICES  all  --  *      *       0.0.0.0/0            0.0.0.0/0            /* kubernetes service portals */
    8   352 DOCKER     all  --  *      *       0.0.0.0/0            0.0.0.0/0            ADDRTYPE match dst-type LOCAL

Chain INPUT (policy ACCEPT 5 packets, 220 bytes)
 pkts bytes target     prot opt in     out     source               destination

Chain OUTPUT (policy ACCEPT 14622 packets, 877K bytes)
 pkts bytes target     prot opt in     out     source               destination
14622  877K KUBE-SERVICES  all  --  *      *       0.0.0.0/0            0.0.0.0/0            /* kubernetes service portals */
 9879  593K DOCKER     all  --  *      *       0.0.0.0/0           !127.0.0.0/8          ADDRTYPE match dst-type LOCAL

Chain POSTROUTING (policy ACCEPT 14622 packets, 877K bytes)
 pkts bytes target     prot opt in     out     source               destination
14895  894K KUBE-POSTROUTING  all  --  *      *       0.0.0.0/0            0.0.0.0/0            /* kubernetes postrouting rules */
  274 16476 MASQUERADE  all  --  *      !docker0  172.17.0.0/16        0.0.0.0/0

Chain DOCKER (2 references)
 pkts bytes target     prot opt in     out     source               destination
    0     0 RETURN     all  --  docker0 *       0.0.0.0/0            0.0.0.0/0

Chain KUBE-MARK-DROP (0 references)
 pkts bytes target     prot opt in     out     source               destination

Chain KUBE-POSTROUTING (1 references)
 pkts bytes target     prot opt in     out     source               destination

Chain KUBE-PROXY-CANARY (0 references)
 pkts bytes target     prot opt in     out     source               destination

Chain KUBE-SERVICES (2 references)
 pkts bytes target     prot opt in     out     source               destination

should have something like (taken from a working non-arm instance):

...
Chain KUBE-SVC-NPX46M4PTMTKRN6Y (1 references)
 pkts bytes target     prot opt in     out     source               destination
    8   480 KUBE-MARK-MASQ  tcp  --  *      *      !10.244.0.0/16        10.96.0.1            /* default/kubernetes:https cluster IP */ tcp dpt:443
    8   480 KUBE-SEP-L7HYKAEHQQQYLMRM  all  --  *      *       0.0.0.0/0            0.0.0.0/0            /* default/kubernetes:https */

Chain KUBE-SVC-TCOU7JCQXEZGVUNU (1 references)
 pkts bytes target     prot opt in     out     source               destination
    0     0 KUBE-MARK-MASQ  udp  --  *      *      !10.244.0.0/16        10.96.0.10           /* kube-system/kube-dns:dns cluster IP */ udp dpt:53
    0     0 KUBE-SEP-SNPTLXDNVSPZ5ND2  all  --  *      *       0.0.0.0/0            0.0.0.0/0            /* kube-system/kube-dns:dns */
$ sudo iptables -t nat -nvL |grep 443
    8   480 DNAT       tcp  --  *      *       0.0.0.0/0            0.0.0.0/0            /* default/kubernetes:https */ tcp to:192.168.39.2:8443
    8   480 KUBE-SVC-NPX46M4PTMTKRN6Y  tcp  --  *      *       0.0.0.0/0            10.96.0.1            /* default/kubernetes:https cluster IP */ tcp dpt:443
    8   480 KUBE-MARK-MASQ  tcp  --  *      *      !10.244.0.0/16        10.96.0.1            /* default/kubernetes:https cluster IP */ tcp dpt:443
...

iptables should be set by the kube-proxy, let's check logs - indeed:

jenkins@45276 ~ % kubectl logs -n kube-system kube-proxy-xfp27       
I0506 19:32:58.156054       1 proxier.go:657] "Failed to load kernel module with modprobe, you can ignore this message when kube-proxy is running inside container without mounting /lib/modules" moduleName="ip_vs"
I0506 19:32:58.186441       1 proxier.go:657] "Failed to load kernel module with modprobe, you can ignore this message when kube-proxy is running inside container without mounting /lib/modules" moduleName="ip_vs_rr"
I0506 19:32:58.216124       1 proxier.go:657] "Failed to load kernel module with modprobe, you can ignore this message when kube-proxy is running inside container without mounting /lib/modules" moduleName="ip_vs_wrr"
I0506 19:32:58.252251       1 proxier.go:657] "Failed to load kernel module with modprobe, you can ignore this message when kube-proxy is running inside container without mounting /lib/modules" moduleName="ip_vs_sh"
I0506 19:32:58.654905       1 node.go:163] Successfully retrieved node IP: 10.0.2.15
I0506 19:32:58.656001       1 server_others.go:138] "Detected node IP" address="10.0.2.15"
I0506 19:32:58.656810       1 server_others.go:561] "Unknown proxy mode, assuming iptables proxy" proxyMode=""
I0506 19:32:59.447871       1 server_others.go:206] "Using iptables Proxier"
I0506 19:32:59.449008       1 server_others.go:213] "kube-proxy running in dual-stack mode" ipFamily=IPv4
I0506 19:32:59.449169       1 server_others.go:214] "Creating dualStackProxier for iptables"
I0506 19:32:59.450253       1 server_others.go:491] "Detect-local-mode set to ClusterCIDR, but no IPv6 cluster CIDR defined, , defaulting to no-op detect-local for IPv6"
I0506 19:32:59.456021       1 server.go:656] "Version info" version="v1.23.5"
I0506 19:32:59.491902       1 config.go:317] "Starting service config controller"
I0506 19:32:59.492149       1 shared_informer.go:240] Waiting for caches to sync for service config
I0506 19:32:59.507128       1 config.go:226] "Starting endpoint slice config controller"
I0506 19:32:59.507292       1 shared_informer.go:240] Waiting for caches to sync for endpoint slice config
I0506 19:32:59.650807       1 shared_informer.go:247] Caches are synced for service config 
I0506 19:32:59.708401       1 shared_informer.go:247] Caches are synced for endpoint slice config 
E0506 19:33:00.778723       1 proxier.go:1527] "Failed to execute iptables-restore" err="exit status 2 (iptables-restore v1.8.7 (legacy): Couldn't load match `mark':No such file or directory\n\nError occurred at line: 10\nTry `iptables-restore -h' or 'iptables-restore --help' for more information.\n)"
I0506 19:33:00.780238       1 proxier.go:832] "Sync failed" retryingTime="30s"
...

now, I'm not sure what mode would work for kube-proxy on arm arch, as, according to the https://kubernetes.io/docs/reference/command-line-tools-reference/kube-proxy/, there is no one:

--proxy-mode ProxyMode
Which proxy mode to use: 'iptables' (Linux-only), 'ipvs' (Linux-only), 'kernelspace' (Windows-only), or 'userspace' (Linux/Windows, deprecated). The default value is 'iptables' on Linux and 'userspace' on Windows.This parameter is ignored if a config file is specified by --config.

i think we just need to understand why kube-proxy is unable to restore the rules - specific error (from the above logs) is: Couldn't load match 'mark':No such file or directory

@sharifelgamal
Copy link
Collaborator

An update here, the arm64 ISO linux kernel config was missing a whole BOATLOAD of networking modules. I have no explanations for why this happens. We're currently testing out a fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/guest-vm General configuration issues with the minikube guest VM co/kvm2-driver KVM2 driver related issues kind/feature Categorizes issue or PR as related to a new feature. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete.
Projects
None yet
Development

Successfully merging a pull request may close this issue.