Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: AKS node pool KubeletConfig #2781

Merged

Conversation

jackfrancis
Copy link
Contributor

What type of PR is this?

/kind feature

What this PR does / why we need it:

This PR adds the AKS node pool feature "KubeletConfig", based on this:

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #2761

Special notes for your reviewer:

Please confirm that if this PR changes any image versions, then that's the sole change this PR makes.

TODOs:

  • squashed commits
  • includes documentation
  • adds unit tests

Release note:

enable AKS node pool KubeletConfig

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/feature Categorizes issue or PR as related to a new feature. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Nov 5, 2022
@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Nov 5, 2022
@jackfrancis jackfrancis changed the title feat: AKS node pool KubeletConfig WIP feat: AKS node pool KubeletConfig Nov 5, 2022
@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 5, 2022
@jackfrancis jackfrancis added the area/managedclusters Issues related to managed AKS clusters created through the CAPZ ManagedCluster Type label Nov 7, 2022
@jackfrancis jackfrancis force-pushed the aks-pool-kubeletconfig branch 2 times, most recently from 6606fed to daf25f7 Compare November 8, 2022 00:46
@jackfrancis
Copy link
Contributor Author

/retest

@jackfrancis jackfrancis force-pushed the aks-pool-kubeletconfig branch 2 times, most recently from 29d04c5 to 1426f3c Compare November 15, 2022 23:14
@jackfrancis jackfrancis force-pushed the aks-pool-kubeletconfig branch 2 times, most recently from d99e068 to 75a9cce Compare November 16, 2022 23:30
Copy link
Member

@nawazkh nawazkh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a thought below.
Also, thanks for putting this together!

@@ -303,3 +307,41 @@ func (s *ManagedMachinePoolScope) GetCAPIMachinePoolAnnotation(key string) (succ
val, ok := s.MachinePool.Annotations[key]
return ok, val
}

func getKubeletConfig(k *infrav1exp.KubeletConfig) *agentpools.KubeletConfig {
Copy link
Member

@nawazkh nawazkh Nov 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a thought, Is there a way we can pull out this code to a common, shared lib so that we don't need to maintain three implementations of the similar definition going ahead?
I am pointing to the getKubletConfig func tagged here, at azure/converters/managedagentpool.go:55, and azure/services/agentpools/spec.go:286

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

each transformation is actually slightly different! @nojnhuh may have some ideas, I'm confident generics would help us but that would require a farily significant refactor (I think)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, I've been aiming to just get this changeset to a functional place, definitely open to suggestions about how to do these struct transformations in less painful way

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

each transformation is actually slightly different!

Ah yes!

if len(k.AllowedUnsafeSysctls) > 0 {
  ret.AllowedUnsafeSysctls = &k.AllowedUnsafeSysctls
 }

vs

if k.AllowedUnsafeSysctls != nil {
  ret.AllowedUnsafeSysctls = k.AllowedUnsafeSysctls
}

Sorry, I didn't catch this!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way, some follow-up, this translation wasn't required at all, not sure what I did wrong the first time around!

cc @nojnhuh

@jackfrancis jackfrancis added this to the v1.7 milestone Nov 17, 2022
@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 29, 2022
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 29, 2022
@jackfrancis jackfrancis changed the title WIP feat: AKS node pool KubeletConfig feat: AKS node pool KubeletConfig Nov 29, 2022
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 29, 2022
@zioproto
Copy link
Contributor

I am testing this PR but I have some issues:

I am testing with make tilt-up and then I had to update the crd:

kubectl apply -f https://raw.githubusercontent.com/kubernetes-sigs/cluster-api-provider-azure/8723cdd0418778ac846205aec0d73f470235023c/config/crd/bases/infrastructure.cluster.x-k8s.io_azuremanagedmachinepools.yaml

when I try to apply the AzureManagedMachinePool I get the following error:

 k apply -f - <<EOF
heredoc> ---
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: AzureManagedMachinePool
metadata:
  name: pool0
  namespace: default
spec:
  enableNodePublicIP: false
  mode: System
  osDiskSizeGB: 30
  sku: Standard_DC4s_v2
  additionalTags:
    owner: "saverio"
  kubeletConfig:
    cpuManagerPolicy: "static"
    cpuCfsQuota: true
    cpuCfsQuotaPeriod: "110ms"
    imageGcHighThreshold: 70
    imageGcLowThreshold: 50
    topologyManagerPolicy: "best-effort"
    allowedUnsafeSysctls:
      - "net.*"
      - "kernel.msg*"
    failSwapOn: false
    containerLogMaxSizeMB: 500
    containerLogMaxFiles: 50
    podMaxPids: 2048
heredoc> EOF
The AzureManagedMachinePool "pool0" is invalid: spec.kubeletConfig.allowedUnsafeSysctls: Unsupported value: []interface {}{"net.*", "kernel.msg*"}: supported values: "kernel.shm*", "kernel.msg*", "kernel.sem", "fs.mqueue.*", "net.*"

I got the kubeletConfig values from the example in this same PR:

kubeletConfig:
cpuManagerPolicy: "static"
cpuCfsQuota: true
cpuCfsQuotaPeriod: "110ms"
imageGcHighThreshold: 70
imageGcLowThreshold: 50
topologyManagerPolicy: "best-effort"
allowedUnsafeSysctls:
- "net.*"
- "kernel.msg*"
failSwapOn: false
containerLogMaxSizeMB: 500
containerLogMaxFiles: 50
podMaxPids: 2048

Not sure why the yaml produces this difference:

  • Unsupported value: []interface {}{"net.*", "kernel.msg*"}
  • supported values: "kernel.shm*", "kernel.msg*", "kernel.sem", "fs.mqueue.*", "net.*"

@zioproto
Copy link
Contributor

Not sure if this is related:
kubernetes-sigs/controller-tools#316
I need someone with more experience on the validation code to have a look.

It seems the root cause is in the enum being an array ?

enum:
- kernel.shm*
- kernel.msg*
- kernel.sem
- fs.mqueue.*
- net.*
items:
type: string
type: array

@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Nov 30, 2022
@jackfrancis jackfrancis force-pushed the aks-pool-kubeletconfig branch 2 times, most recently from 7b870b7 to db6242b Compare December 1, 2022 19:58
@jackfrancis
Copy link
Contributor Author

/assign @nawazkh

@nawazkh can you take another review look at this, thank you!

Copy link
Member

@nawazkh nawazkh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some thoughts. Thanks for updating the flavors yamls, made my testing much easier!

docs/book/src/topics/managedcluster.md Outdated Show resolved Hide resolved
@nawazkh

This comment was marked as outdated.

Copy link
Member

@nawazkh nawazkh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! 🚀
/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Dec 2, 2022
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Dec 3, 2022
@nawazkh
Copy link
Member

nawazkh commented Dec 5, 2022

Tested it again, works as expected.

$ az aks nodepool show -g nhk-cluster-4 --cluster-name aks-24111  -n pool1 -o json | jq .kubeletConfig
{
  "allowedUnsafeSysctls": [
    "net.*",
    "kernel.msg*"
  ],
  "containerLogMaxFiles": 50,
  "containerLogMaxSizeMb": 500,
  "cpuCfsQuota": true,
  "cpuCfsQuotaPeriod": "110ms",
  "cpuManagerPolicy": "static",
  "failSwapOn": false,
  "imageGcHighThreshold": 70,
  "imageGcLowThreshold": 50,
  "podMaxPids": 2048,
  "topologyManagerPolicy": "best-effort"
}

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Dec 5, 2022
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Dec 6, 2022
@jackfrancis
Copy link
Contributor Author

@nawazkh for final lgtm after rebase, thank you!

@nawazkh
Copy link
Member

nawazkh commented Dec 6, 2022

/lgtm
🚀

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Dec 6, 2022
@jackfrancis
Copy link
Contributor Author

/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jackfrancis

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 6, 2022
@@ -109,3 +109,17 @@ spec:
enableNodePublicIP: false
scaleSetPriority: Regular
name: pool1
kubeletConfig:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do these values match the AKS default or best practices? If not, what's the intention behind including them in the AKS flavor reference template?

(just curious, not blocking)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question:

  1. prefer a value that isn't the AKS default
  2. so long as that value isn't too interesting (and could thus interfere with other features and/or E2E tests)

Basically: wanting to make sure that we engage the actual feature in some way (for AKS to actually build a "configured" node pool) without providing a materially different operational outcome.

Are we concerned that the non-tested "reference" template is getting too interesting? Or do we prefer to expose as much configuration as possible as a feature reference? We can always simplify these configurations for the reference templates and triage them to the test/ci templates only...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we concerned that the non-tested "reference" template is getting too interesting? Or do we prefer to expose as much configuration as possible as a feature reference?

Yes that's my main concern as far as I know some users are actually using that reference template as base and then applying their own kustomizations to it so I would consider those templates we publish as assets to be "best practice" examples and therefore we should avoid using them to configure random values for testing purposes as it may not be clear to a user what is there because it's needed for proper configuration vs. what was added solely for testing purposes

We can always simplify these configurations for the reference templates and triage them to the test/ci templates only

That's seems reasonable

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do we feel about doing that independent of this PR (cleaning up the reference AKS template so it only has the minimum, required configuration)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fine (I see it's already merged) but let's add that to the milestone to make sure we do it before the template gets published as part of the release

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is what we want: #2913

@jackfrancis
Copy link
Contributor Author

/retest

1 similar comment
@jackfrancis
Copy link
Contributor Author

/retest

@k8s-ci-robot k8s-ci-robot merged commit f9cfcb3 into kubernetes-sigs:main Dec 7, 2022
@jackfrancis jackfrancis deleted the aks-pool-kubeletconfig branch December 7, 2022 19:01
@jackfrancis jackfrancis mentioned this pull request Feb 28, 2023
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/managedclusters Issues related to managed AKS clusters created through the CAPZ ManagedCluster Type cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Enable optional configuration for AKS nodepools: PodMaxPids
5 participants