Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

deploy our custom coredns addon #17008

Draft
wants to merge 11 commits into
base: master
Choose a base branch
from

Conversation

prezha
Copy link
Contributor

@prezha prezha commented Aug 7, 2023

fixes: #17007

instead of using the embedded kubeadm coredns manifests, then scale down from 2 to 1 replica and finally patching the configmap to inject {"host.minikube.internal": hostIP} record, which in turn, will also cause coredns to reload, because the reload plugin is used by default

we can skip the default coredns deployment and deploy our own variant
ref: https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/control-plane-flags/

Note: Customizing the CoreDNS deployment of kubeadm is currently not supported.
You must manually patch the kube-system/coredns ConfigMap and recreate the CoreDNS Pods after that.
Alternatively, you can skip the default CoreDNS deployment and deploy your own variant.
For more details on that see Using init phases with kubeadm.

this pr creates custom coredns addon based on the "original" one that kubeadm deploys by default and, luckily, it's not changed too frequently, so it should not be too hard to maintain the sync

ref: https://github.com/coredns/deployment/tree/master/kubernetes#kubernetes---deprecated

For deploying coredns, use the coredns helm chart, or the yaml templates maintained by kubeadm.

note 1: --disable-optimizations minikube flag will disable this behaviour and will fallback to using default kubeadm coredns manifests

note 2: once kubernetes/kubeadm#1318 is merged, we'll have an option to handle upgrades as well, should we need it

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Aug 7, 2023
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: prezha

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 7, 2023
@prezha
Copy link
Contributor Author

prezha commented Aug 7, 2023

/ok-to-test

@k8s-ci-robot k8s-ci-robot added the ok-to-test Indicates a non-member PR verified by an org member that is safe to test. label Aug 7, 2023
@prezha prezha marked this pull request as draft August 7, 2023 01:43
@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Aug 7, 2023
@minikube-pr-bot

This comment has been minimized.

@minikube-pr-bot

This comment has been minimized.

Copy link
Member

@medyagh medyagh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems like a very interesting PR ! couple requests

1- can you plz make sure if user passes
"--disable-optimizations" they will get the Vanilla Kubernetes experience (the real coredns with 2 replica)

2- can we see time to k8s benchmark Before and After this PR on your machine?

@afbjorklund
Copy link
Collaborator

It would be good to have these as options to kubeadm eventually, the replicas and records.

@medyagh
Copy link
Member

medyagh commented Aug 9, 2023

It would be good to have these as options to kubeadm eventually, the replicas and records.

I think it would be a good option to contribution to kubeadm as well maybe after we test it out in minikube

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Aug 13, 2023
@prezha
Copy link
Contributor Author

prezha commented Aug 13, 2023

@medyagh sure:


1- can you plz make sure if user passes "--disable-optimizations" they will get the Vanilla Kubernetes experience (the real coredns with 2 replica)

$ minikube version
minikube version: v1.31.1
commit: b6fce457997a5e4ffde962751bf6400835724517-dirty
$ minikube start --disable-optimizations
😄  minikube v1.31.1 on Opensuse-Tumbleweed
✨  Automatically selected the docker driver. Other choices: kvm2, qemu2, virtualbox, ssh
📌  Using Docker driver with root privileges
👍  Starting control plane node minikube in cluster minikube
🚜  Pulling base image ...
🔥  Creating docker container (CPUs=2, Memory=15800MB) ...
🐳  Preparing Kubernetes v1.27.3 on Docker 24.0.5 ...
    ▪ Generating certificates and keys ...
    ▪ Booting up control plane ...
    ▪ Configuring RBAC rules ...
🔗  Configuring bridge CNI (Container Networking Interface) ...
🔎  Verifying Kubernetes components...
    ▪ Using image gcr.io/k8s-minikube/storage-provisioner:v5
🌟  Enabled addons: storage-provisioner, default-storageclass
🏄  Done! kubectl is now configured to use "minikube" cluster and "default" namespace by default
$ kubectl --context minikube get all -A -owide
NAMESPACE     NAME                                   READY   STATUS    RESTARTS   AGE   IP             NODE       NOMINATED NODE   READINESS GATES
kube-system   pod/coredns-5d78c9869d-bf8mj           1/1     Running   0          13s   10.244.0.3     minikube   <none>           <none>
kube-system   pod/coredns-5d78c9869d-zv2s2           1/1     Running   0          13s   10.244.0.2     minikube   <none>           <none>
kube-system   pod/etcd-minikube                      1/1     Running   0          27s   192.168.49.2   minikube   <none>           <none>
kube-system   pod/kube-apiserver-minikube            1/1     Running   0          29s   192.168.49.2   minikube   <none>           <none>
kube-system   pod/kube-controller-manager-minikube   1/1     Running   0          28s   192.168.49.2   minikube   <none>           <none>
kube-system   pod/kube-proxy-xjc2t                   1/1     Running   0          13s   192.168.49.2   minikube   <none>           <none>
kube-system   pod/kube-scheduler-minikube            1/1     Running   0          26s   192.168.49.2   minikube   <none>           <none>
kube-system   pod/storage-provisioner                1/1     Running   0          25s   192.168.49.2   minikube   <none>           <none>

NAMESPACE     NAME                 TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)                  AGE   SELECTOR
default       service/kubernetes   ClusterIP   10.96.0.1    <none>        443/TCP                  28s   <none>
kube-system   service/kube-dns     ClusterIP   10.96.0.10   <none>        53/UDP,53/TCP,9153/TCP   26s   k8s-app=kube-dns

NAMESPACE     NAME                        DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR            AGE   CONTAINERS   IMAGES                               SELECTOR
kube-system   daemonset.apps/kube-proxy   1         1         1       1            1           kubernetes.io/os=linux   26s   kube-proxy   registry.k8s.io/kube-proxy:v1.27.3   k8s-app=kube-proxy

NAMESPACE     NAME                      READY   UP-TO-DATE   AVAILABLE   AGE   CONTAINERS   IMAGES                                    SELECTOR
kube-system   deployment.apps/coredns   2/2     2            2           26s   coredns      registry.k8s.io/coredns/coredns:v1.10.1   k8s-app=kube-dns

NAMESPACE     NAME                                 DESIRED   CURRENT   READY   AGE   CONTAINERS   IMAGES                                    SELECTOR
kube-system   replicaset.apps/coredns-5d78c9869d   2         2         2       13s   coredns      registry.k8s.io/coredns/coredns:v1.10.1   k8s-app=kube-dns,pod-template-hash=5d78c9869d
$ kubectl --context minikube get configmap/coredns -n kube-system -oyaml
apiVersion: v1
data:
  Corefile: |
    .:53 {
        log
        errors
        health {
           lameduck 5s
        }
        ready
        kubernetes cluster.local in-addr.arpa ip6.arpa {
           pods insecure
           fallthrough in-addr.arpa ip6.arpa
           ttl 30
        }
        prometheus :9153
        hosts {
           192.168.49.1 host.minikube.internal
           fallthrough
        }
        forward . /etc/resolv.conf {
           max_concurrent 1000
        }
        cache 30
        loop
        reload
        loadbalance
    }
kind: ConfigMap
metadata:
  creationTimestamp: "2023-08-13T15:41:59Z"
  name: coredns
  namespace: kube-system
  resourceVersion: "255"
  uid: 350ec8ae-a8cc-4c83-9c4b-48673bbbf0f5

2- can we see time to k8s benchmark Before and After this PR on your machine?

i've run three full time-to-k8s cycles for both the current minikube version as well as this pr - results:

before - current release v1.31.1:

v1.31.1 startup avg total avg delayed count
1 22.34 54.73 4
2 24.07 57.14 4
3 22.90 55.18 4
overall 23.10 55.68 12

after - this pr [8de8137]:

pr startup avg total avg delayed count
1 22.65 38.94 0
2 23.32 59.46 5
3 22.27 46.47 2
overall 22.75 48.29 7

details:
logs & stats.tar.gz

@minikube-pr-bot

This comment has been minimized.

@prezha prezha marked this pull request as ready for review August 13, 2023 17:02
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Aug 13, 2023
@minikube-pr-bot

This comment has been minimized.

@medyagh
Copy link
Member

medyagh commented Sep 20, 2023

@prezha the PR commenter bechnmarking doesnt not seem to show much difference
#17008 (comment)

how much does this make minikube faster?


const (
// CoreDNSService is the CoreDNS Service manifest
CoreDNSService = `
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@prezha one problem I feel like might happen, how do we generate this yaml file and with future kubernetes versions is it gonna change this yaml and we will need to remember to regenrate it ? is there an automated way of generating this to future proof it ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and also if user chooses an older kubernetes we wanna make sure this yaml works there too or we could use the Old way for Non-Default Kubernets version (the latest that runs by default)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@medyagh i thought about and accounted for that:

based on the history of changes, in the past 2+ years, there were no frequent or significant changes to the kubeadm's embedded coredns deployment manifest, which is well over our target extended support policy of last 6 minor versions
based on this, we do not have an automated way to update coredns manifest atm, but i intentionally made it in diff-friendly format so it would be straight-forward finding any differences and upgrade if/when needed in the future - namely:

this would be the default behaviour, and should a user want/need to use the old way - they can do so just by using the --disable-optimizations flag

last but not least, as a potential next step, with this "separation", we could relatively easily enable users to provide their own coredns manifest (we'd still need eg, to patch it to inject ip address of host.minikube.internal that's only known during the runtime)


const (
// CoreDNSService is the CoreDNS Service manifest
CoreDNSService = `
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and also if user chooses an older kubernetes we wanna make sure this yaml works there too or we could use the Old way for Non-Default Kubernets version (the latest that runs by default)

@prezha
Copy link
Contributor Author

prezha commented Sep 20, 2023

@prezha the PR commenter bechnmarking doesnt not seem to show much difference #17008 (comment)

how much does this make minikube faster?

hey @medyagh, i didn't expect to see any noticeable speed improvement, as the interventions on the originally deployed (embedded) coredns config we are currently doing are usually of sub-second duration (eg, scaling replica count down or patching hosts block in configmap to include the host.minikube.internal record)
what i do expect is improvement in overall reliability and reduction in flakiness by avoiding these changes - coredns restarts while other cluster components are initialising and/or in tests

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 21, 2024
@prezha
Copy link
Contributor Author

prezha commented Jan 21, 2024

i think this is still under consideration

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 21, 2024
@minikube-pr-bot

This comment has been minimized.

@minikube-pr-bot

This comment has been minimized.

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 7, 2024
@k8s-ci-robot
Copy link
Contributor

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@prezha prezha marked this pull request as draft March 7, 2024 22:33
@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Mar 7, 2024
@minikube-pr-bot

This comment has been minimized.

@minikube-pr-bot

This comment has been minimized.

@minikube-pr-bot

This comment has been minimized.

@minikube-pr-bot

This comment has been minimized.

@minikube-pr-bot
Copy link

kvm2 driver with docker runtime

+----------------+----------+---------------------+
|    COMMAND     | MINIKUBE | MINIKUBE (PR 17008) |
+----------------+----------+---------------------+
| minikube start | 50.8s    | 51.7s               |
| enable ingress | 25.6s    | 25.2s               |
+----------------+----------+---------------------+

Times for minikube start: 47.9s 51.4s 52.4s 52.7s 49.7s
Times for minikube (PR 17008) start: 50.4s 52.5s 49.4s 52.6s 53.9s

Times for minikube ingress: 24.1s 26.2s 24.1s 26.6s 27.1s
Times for minikube (PR 17008) ingress: 22.6s 26.1s 24.7s 26.6s 26.1s

docker driver with docker runtime

+----------------+----------+---------------------+
|    COMMAND     | MINIKUBE | MINIKUBE (PR 17008) |
+----------------+----------+---------------------+
| minikube start | 23.9s    | 23.0s               |
| enable ingress | 20.4s    | 21.0s               |
+----------------+----------+---------------------+

Times for minikube ingress: 21.8s 19.8s 20.3s 20.9s 19.3s
Times for minikube (PR 17008) ingress: 20.8s 24.3s 19.8s 20.3s 19.8s

Times for minikube start: 22.2s 22.7s 25.6s 22.6s 26.5s
Times for minikube (PR 17008) start: 22.9s 21.9s 21.7s 22.8s 25.6s

docker driver with containerd runtime

+----------------+----------+---------------------+
|    COMMAND     | MINIKUBE | MINIKUBE (PR 17008) |
+----------------+----------+---------------------+
| minikube start | 23.2s    | 22.9s               |
| enable ingress | 38.2s    | 31.3s               |
+----------------+----------+---------------------+

Times for minikube start: 23.5s 21.6s 22.3s 23.9s 24.8s
Times for minikube (PR 17008) start: 23.3s 20.6s 24.7s 21.2s 24.8s

Times for minikube ingress: 35.8s 30.8s 46.8s 30.8s 46.8s
Times for minikube (PR 17008) ingress: 30.3s 31.3s 31.3s 32.3s 31.3s

@minikube-pr-bot
Copy link

These are the flake rates of all failed tests.

Environment Failed Tests Flake Rate (%)
KVM_Linux_crio TestFunctional/serial/ExtraConfig (gopogh) 0.00 (chart)
KVM_Linux TestStartStop/group/default-k8s-diff-port/serial/SecondStart (gopogh) 0.00 (chart)
KVM_Linux TestStartStop/group/embed-certs/serial/SecondStart (gopogh) 0.00 (chart)
KVM_Linux_containerd TestNoKubernetes/serial/StartNoArgs (gopogh) 1.25 (chart)
Docker_Linux_containerd_arm64 TestStartStop/group/old-k8s-version/serial/SecondStart (gopogh) 12.88 (chart)
QEMU_macOS TestImageBuild/serial/Setup (gopogh) 19.64 (chart)
QEMU_macOS TestJSONOutput/pause/Command (gopogh) 19.64 (chart)
QEMU_macOS TestJSONOutput/start/Command (gopogh) 19.64 (chart)
QEMU_macOS TestJSONOutput/unpause/Command (gopogh) 19.64 (chart)
Docker_Windows TestAddons/Setup (gopogh) 21.01 (chart)
Docker_Windows TestBinaryMirror (gopogh) 21.01 (chart)
Docker_Windows TestDownloadOnlyKic (gopogh) 21.01 (chart)
Docker_Windows TestDownloadOnly/v1.28.4/json-events (gopogh) 21.01 (chart)
Docker_Windows TestDownloadOnly/v1.28.4/kubectl (gopogh) 21.01 (chart)
Docker_Windows TestDownloadOnly/v1.28.4/preload-exists (gopogh) 21.01 (chart)
Docker_Windows TestDownloadOnly/v1.29.0-rc.2/json-events (gopogh) 21.01 (chart)
Docker_Windows TestDownloadOnly/v1.29.0-rc.2/kubectl (gopogh) 21.01 (chart)
Docker_Windows TestDownloadOnly/v1.29.0-rc.2/preload-exists (gopogh) 21.01 (chart)
Docker_Windows TestFunctional/parallel/CertSync (gopogh) 21.01 (chart)
Docker_Windows TestFunctional/parallel/CpCmd (gopogh) 21.01 (chart)
Docker_Windows TestFunctional/parallel/DockerEnv/powershell (gopogh) 21.01 (chart)
Docker_Windows TestFunctional/parallel/DryRun (gopogh) 21.01 (chart)
Docker_Windows TestFunctional/parallel/FileSync (gopogh) 21.01 (chart)
Docker_Windows TestFunctional/parallel/ImageCommands/ImageBuild (gopogh) 21.01 (chart)
Docker_Windows TestFunctional/parallel/ImageCommands/ImageListJson (gopogh) 21.01 (chart)
Docker_Windows TestFunctional/parallel/ImageCommands/ImageListShort (gopogh) 21.01 (chart)
Docker_Windows TestFunctional/parallel/ImageCommands/ImageListTable (gopogh) 21.01 (chart)
Docker_Windows TestFunctional/parallel/ImageCommands/ImageListYaml (gopogh) 21.01 (chart)
Docker_Windows TestFunctional/parallel/ImageCommands/ImageLoadDaemon (gopogh) 21.01 (chart)
Docker_Windows TestFunctional/parallel/ImageCommands/ImageLoadFromFile (gopogh) 21.01 (chart)
More tests... Continued...

Too many tests failed - See test logs for more details.

To see the flake rates of all tests by environment, click here.

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Mark this PR as fresh with /remove-lifecycle stale
  • Close this PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 25, 2024
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Mark this PR as fresh with /remove-lifecycle rotten
  • Close this PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Oct 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CoreDNS addon handling grew too complex
6 participants