Skip to content

Commit

Permalink
Use kustomize for deployment
Browse files Browse the repository at this point in the history
supported overlays:
- no-webhook - deploy without webhook
- certmanager - deploy with webhook to k8s cluster where
certmanager is available
- openshift - deploy with webhook to the Openshift cluster

Signed-off-by: Yury Kulazhenkov <ykulazhenkov@nvidia.com>
  • Loading branch information
ykulazhenkov committed Sep 28, 2023
1 parent bf17352 commit 3a0c4ce
Show file tree
Hide file tree
Showing 33 changed files with 712 additions and 325 deletions.
75 changes: 52 additions & 23 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,16 @@ NVIDIA IPAM plugin consists of 3 main components:
A Kubernetes(K8s) controller that Watches on IPPools CRs in a predefined Namespace.
It then proceeds by assiging each node via IPPools Status a cluster unique range of IPs of the defined IP Pools.

#### Validation webhook

ipam-controller implements validation webhook for IPPool resource.
The webhook can prevent the creation of IPPool resources with invalid configurations.
Supported X.509 certificate management system should be available in the cluster to enable the webhook.
Currently supported systems are [certmanager](https://cert-manager.io/) and
[Openshift certificate management](https://docs.openshift.com/container-platform/4.13/security/certificates/service-serving-certificate.html)

Activation of the validation webhook is optional. Check the [Deployment](#deployment) section for details.

### ipam-node

The daemon is responsible for:
Expand Down Expand Up @@ -144,48 +154,50 @@ ipam-controller accepts configuration using command line flags and IPPools CRs.
```text
Logging flags:
--log-flush-frequency duration
--log-flush-frequency duration
Maximum number of seconds between log flushes (default 5s)
--log-json-info-buffer-size quantity
[Alpha] In JSON format with split output streams, the info messages can be buffered for a while to increase performance. The default value of zero bytes disables buffering. The
size can be specified as number of bytes (512), multiples of 1000 (1K), multiples of 1024 (2Ki), or powers of those (3M, 4G, 5Mi, 6Gi). Enable the LoggingAlphaOptions feature
gate to use this.
--log-json-split-stream
[Alpha] In JSON format, write error messages to stderr and info messages to stdout. The default is to write a single stream to stdout. Enable the LoggingAlphaOptions feature gate
to use this.
--logging-format string
--log-json-info-buffer-size quantity
[Alpha] In JSON format with split output streams, the info messages can be buffered for a while to increase performance. The default value of zero bytes disables buffering. The size can
be specified as number of bytes (512), multiples of 1000 (1K), multiples of 1024 (2Ki), or powers of those (3M, 4G, 5Mi, 6Gi). Enable the LoggingAlphaOptions feature gate to use this.
--log-json-split-stream
[Alpha] In JSON format, write error messages to stderr and info messages to stdout. The default is to write a single stream to stdout. Enable the LoggingAlphaOptions feature gate to use
this.
--logging-format string
Sets the log format. Permitted formats: "json" (gated by LoggingBetaOptions), "text". (default "text")
-v, --v Level
-v, --v Level
number for the log level verbosity
--vmodule pattern=N,...
--vmodule pattern=N,...
comma-separated list of pattern=N settings for file-filtered logging (only works for text log format)
Common flags:
--feature-gates mapStringBool
--feature-gates mapStringBool
A set of key=value pairs that describe feature gates for alpha/experimental features. Options are:
AllAlpha=true|false (ALPHA - default=false)
AllBeta=true|false (BETA - default=false)
ContextualLogging=true|false (ALPHA - default=false)
LoggingAlphaOptions=true|false (ALPHA - default=false)
LoggingBetaOptions=true|false (BETA - default=true)
--version
--version
print binary version and exit
Controller flags:
--health-probe-bind-address string
--health-probe-bind-address string
The address the probe endpoint binds to. (default ":8081")
--kubeconfig string
--ippools-namespace string
The name of the namespace to watch for IPPools CRs (default "kube-system")
--kubeconfig string
Paths to a kubeconfig. Only required if out-of-cluster.
--leader-elect
--leader-elect
Enable leader election for controller manager. Enabling this will ensure there is only one active controller manager.
--leader-elect-namespace string
--leader-elect-namespace string
Determines the namespace in which the leader election resource will be created. (default "kube-system")
--metrics-bind-address string
--metrics-bind-address string
The address the metric endpoint binds to. (default ":8080")
--ippools-namespace string
The name of the namespace to watch for IPPools CRs. (default "kube-system")
--webhook
Enable validating webhook server as a part of the controller
```

#### IPPool CR
Expand Down Expand Up @@ -331,11 +343,28 @@ interface should have two IP addresses: one IPv4 and one IPv6. (default: network

### Deploy IPAM plugin

> _NOTE:_ This command will deploy latest dev build with default configuration
> _NOTE:_ These commands will deploy latest dev build with default configuration

The plugin can be deployed with kustomize.

Supported overlays are:

`no-webhook` - deploy without webhook

```shell
kubectl kustomize https://github.com/mellanox/nvidia-k8s-ipam/deploy/overlays/no-webhook?ref=main | kubectl apply -f -
```

`certmanager` - deploy with webhook to the Kubernetes cluster where certmanager is available

```shell
kubectl kustomize https://github.com/mellanox/nvidia-k8s-ipam/deploy/overlays/certmanager?ref=main | kubectl apply -f -
```

`openshift` - deploy with webhook to the Openshift cluster

```shell
kubectl apply -f https://raw.githubusercontent.com/Mellanox/nvidia-k8s-ipam/main/deploy/crds/nv-ipam.nvidia.com_ippools.yaml
kubectl apply -f https://raw.githubusercontent.com/Mellanox/nvidia-k8s-ipam/main/deploy/nv-ipam.yaml
kubectl kustomize https://github.com/mellanox/nvidia-k8s-ipam/deploy/overlays/openshift?ref=main | kubectl apply -f -
```

### Create IPPool CR
Expand Down
2 changes: 1 addition & 1 deletion deploy/crds/kustomization.yaml
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
resources:
- nv-ipam.nvidia.com_ippools.yaml
- nv-ipam.nvidia.com_ippools.yaml
25 changes: 25 additions & 0 deletions deploy/manifests/certmanager/certificate.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
# The following manifests contain a self-signed issuer CR and a certificate CR.
# More document can be found at https://docs.cert-manager.io
# WARNING: Targets CertManager v1.0. Check https://cert-manager.io/docs/installation/upgrading/ for breaking changes.
apiVersion: cert-manager.io/v1
kind: Issuer
metadata:
name: selfsigned-issuer
namespace: system
spec:
selfSigned: {}
---
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
name: serving-cert # this name should match the one appeared in kustomizeconfig.yaml
namespace: system
spec:
# $(SERVICE_NAME) and $(SERVICE_NAMESPACE) will be substituted by kustomize
dnsNames:
- $(SERVICE_NAME).$(SERVICE_NAMESPACE).svc
- $(SERVICE_NAME).$(SERVICE_NAMESPACE).svc.cluster.local
issuerRef:
kind: Issuer
name: selfsigned-issuer
secretName: nv-ipam-webhook-server-cert # this secret will not be prefixed, since it's not managed by kustomize
5 changes: 5 additions & 0 deletions deploy/manifests/certmanager/kustomization.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
resources:
- certificate.yaml

configurations:
- kustomizeconfig.yaml
16 changes: 16 additions & 0 deletions deploy/manifests/certmanager/kustomizeconfig.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
# This configuration is for teaching kustomize how to update name ref and var substitution
nameReference:
- kind: Issuer
group: cert-manager.io
fieldSpecs:
- kind: Certificate
group: cert-manager.io
path: spec/issuerRef/name

varReference:
- kind: Certificate
group: cert-manager.io
path: spec/commonName
- kind: Certificate
group: cert-manager.io
path: spec/dnsNames
99 changes: 99 additions & 0 deletions deploy/manifests/controller/deployment.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,99 @@
kind: Deployment
apiVersion: apps/v1
metadata:
name: controller
namespace: system
annotations:
kubernetes.io/description: |
This deployment launches the nv-ipam controller for nv-ipam.
spec:
strategy:
type: RollingUpdate
replicas: 1
selector:
matchLabels:
name: controller
template:
metadata:
labels:
name: controller
spec:
priorityClassName: system-cluster-critical
serviceAccountName: controller
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchExpressions:
- key: name
operator: In
values:
- controller
topologyKey: "kubernetes.io/hostname"
nodeAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 1
preference:
matchExpressions:
- key: node-role.kubernetes.io/master
operator: In
values:
- ""
- weight: 1
preference:
matchExpressions:
- key: node-role.kubernetes.io/control-plane
operator: In
values:
- ""
tolerations:
- key: node-role.kubernetes.io/master
operator: Exists
effect: NoSchedule
- key: node-role.kubernetes.io/control-plane
operator: Exists
effect: NoSchedule
- key: nvidia.com/gpu
operator: Exists
effect: NoSchedule
containers:
- name: controller
image: ghcr.io/mellanox/nvidia-k8s-ipam:latest
imagePullPolicy: IfNotPresent
command: [ "/ipam-controller" ]
args:
- --leader-elect=true
- --leader-elect-namespace=$(POD_NAMESPACE)
- --ippools-namespace=$(POD_NAMESPACE)
env:
- name: POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
securityContext:
allowPrivilegeEscalation: false
capabilities:
drop:
- "ALL"
livenessProbe:
httpGet:
path: /healthz
port: 8081
initialDelaySeconds: 15
periodSeconds: 20
readinessProbe:
httpGet:
path: /readyz
port: 8081
initialDelaySeconds: 5
periodSeconds: 10
resources:
requests:
cpu: 100m
memory: 300Mi
ports:
- containerPort: 9443
name: webhook-server
protocol: TCP
5 changes: 5 additions & 0 deletions deploy/manifests/controller/kustomization.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
resources:
- deployment.yaml
- role.yaml
- role_binding.yaml
- service_account.yaml
60 changes: 60 additions & 0 deletions deploy/manifests/controller/role.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: controller
rules:
- apiGroups:
- ""
resources:
- nodes
verbs:
- get
- list
- patch
- update
- watch
- apiGroups:
- ""
resources:
- configmaps
verbs:
- get
- list
- watch
- delete
- apiGroups:
- nv-ipam.nvidia.com
resources:
- ippools
verbs:
- get
- list
- watch
- create
- apiGroups:
- nv-ipam.nvidia.com
resources:
- ippools/status
verbs:
- get
- update
- patch
- apiGroups:
- coordination.k8s.io
resources:
- leases
verbs:
- get
- list
- watch
- create
- update
- patch
- delete
- apiGroups:
- ""
resources:
- events
verbs:
- create
- patch
12 changes: 12 additions & 0 deletions deploy/manifests/controller/role_binding.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: controller
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: controller
subjects:
- kind: ServiceAccount
name: controller
namespace: system
5 changes: 5 additions & 0 deletions deploy/manifests/controller/service_account.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
apiVersion: v1
kind: ServiceAccount
metadata:
name: controller
namespace: system
Loading

0 comments on commit 3a0c4ce

Please sign in to comment.