generated from onedr0p/cluster-template
-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(helm): update gpu-operator ( v24.6.2 → v24.9.0 ) #544
Open
renovate
wants to merge
1
commit into
main
Choose a base branch
from
renovate/gpu-operator-24.x
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
--- HelmRelease: gpu-operator/gpu-operator ClusterRole: gpu-operator/gpu-operator
+++ HelmRelease: gpu-operator/gpu-operator ClusterRole: gpu-operator/gpu-operator
@@ -52,27 +52,12 @@
- update
- patch
- delete
- apiGroups:
- ''
resources:
- - events
- - pods
- - pods/eviction
- - services
- verbs:
- - create
- - get
- - list
- - watch
- - update
- - patch
- - delete
-- apiGroups:
- - ''
- resources:
- nodes
verbs:
- get
- list
- watch
- update
@@ -86,39 +71,32 @@
- list
- create
- watch
- update
- patch
- apiGroups:
+ - ''
+ resources:
+ - pods
+ - pods/eviction
+ verbs:
+ - create
+ - get
+ - list
+ - watch
+ - update
+ - patch
+ - delete
+- apiGroups:
- apps
resources:
- daemonsets
verbs:
- get
- list
- watch
-- apiGroups:
- - apps
- resources:
- - controllerrevisions
- verbs:
- - get
- - list
- - watch
-- apiGroups:
- - monitoring.coreos.com
- resources:
- - servicemonitors
- - prometheusrules
- verbs:
- - get
- - list
- - create
- - watch
- - update
- - delete
- apiGroups:
- nvidia.com
resources:
- clusterpolicies
- clusterpolicies/finalizers
- clusterpolicies/status
@@ -141,24 +119,12 @@
verbs:
- get
- list
- watch
- create
- apiGroups:
- - coordination.k8s.io
- resources:
- - leases
- verbs:
- - get
- - list
- - watch
- - create
- - update
- - patch
- - delete
-- apiGroups:
- node.k8s.io
resources:
- runtimeclasses
verbs:
- get
- list
--- HelmRelease: gpu-operator/gpu-operator Role: gpu-operator/gpu-operator
+++ HelmRelease: gpu-operator/gpu-operator Role: gpu-operator/gpu-operator
@@ -22,12 +22,20 @@
- update
- patch
- delete
- apiGroups:
- apps
resources:
+ - controllerrevisions
+ verbs:
+ - get
+ - list
+ - watch
+- apiGroups:
+ - apps
+ resources:
- daemonsets
verbs:
- create
- get
- list
- watch
@@ -35,17 +43,47 @@
- patch
- delete
- apiGroups:
- ''
resources:
- configmaps
+ - endpoints
+ - events
+ - pods
+ - pods/eviction
- secrets
+ - services
+ - services/finalizers
- serviceaccounts
verbs:
- create
- get
- list
- watch
- update
- patch
- delete
+- apiGroups:
+ - coordination.k8s.io
+ resources:
+ - leases
+ verbs:
+ - get
+ - list
+ - watch
+ - create
+ - update
+ - patch
+ - delete
+- apiGroups:
+ - monitoring.coreos.com
+ resources:
+ - servicemonitors
+ - prometheusrules
+ verbs:
+ - get
+ - list
+ - create
+ - watch
+ - update
+ - delete
--- HelmRelease: gpu-operator/gpu-operator Deployment: gpu-operator/gpu-operator
+++ HelmRelease: gpu-operator/gpu-operator Deployment: gpu-operator/gpu-operator
@@ -44,13 +44,13 @@
value: ''
- name: OPERATOR_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
- name: DRIVER_MANAGER_IMAGE
- value: nvcr.io/nvidia/cloud-native/k8s-driver-manager:v0.6.10
+ value: nvcr.io/nvidia/cloud-native/k8s-driver-manager:v0.7.0
volumeMounts:
- name: host-os-release
mountPath: /host-etc/os-release
readOnly: true
livenessProbe:
httpGet:
--- HelmRelease: gpu-operator/gpu-operator ClusterPolicy: gpu-operator/cluster-policy
+++ HelmRelease: gpu-operator/gpu-operator ClusterPolicy: gpu-operator/cluster-policy
@@ -15,30 +15,30 @@
operator:
defaultRuntime: docker
runtimeClass: nvidia
initContainer:
repository: nvcr.io/nvidia
image: cuda
- version: 12.6.1-base-ubi8
+ version: 12.6.2-base-ubi9
imagePullPolicy: IfNotPresent
daemonsets:
labels:
- helm.sh/chart: gpu-operator-v24.6.2
+ helm.sh/chart: gpu-operator-v24.9.0
app.kubernetes.io/managed-by: gpu-operator
tolerations:
- effect: NoSchedule
key: nvidia.com/gpu
operator: Exists
priorityClassName: system-node-critical
updateStrategy: RollingUpdate
rollingUpdate:
maxUnavailable: '1'
validator:
repository: nvcr.io/nvidia/cloud-native
image: gpu-operator-validator
- version: v24.6.2
+ version: v24.9.0
imagePullPolicy: IfNotPresent
plugin:
env:
- name: WITH_WORKLOAD
value: 'false'
mig:
@@ -52,26 +52,26 @@
enabled: false
useNvidiaDriverCRD: false
useOpenKernelModules: false
usePrecompiled: false
repository: nvcr.io/nvidia
image: driver
- version: 550.90.07
+ version: 550.127.05
imagePullPolicy: IfNotPresent
startupProbe:
failureThreshold: 120
initialDelaySeconds: 60
periodSeconds: 10
timeoutSeconds: 60
rdma:
enabled: false
useHostMofed: false
manager:
repository: nvcr.io/nvidia/cloud-native
image: k8s-driver-manager
- version: v0.6.10
+ version: v0.7.0
imagePullPolicy: IfNotPresent
env:
- name: ENABLE_GPU_POD_EVICTION
value: 'true'
- name: ENABLE_AUTO_DRAIN
value: 'false'
@@ -113,13 +113,13 @@
enabled: false
image: vgpu-manager
imagePullPolicy: IfNotPresent
driverManager:
repository: nvcr.io/nvidia/cloud-native
image: k8s-driver-manager
- version: v0.6.10
+ version: v0.7.0
imagePullPolicy: IfNotPresent
env:
- name: ENABLE_GPU_POD_EVICTION
value: 'false'
- name: ENABLE_AUTO_DRAIN
value: 'false'
@@ -138,35 +138,35 @@
url: nvcr.io/nvidia/cloud-native/kata-gpu-artifacts:ubuntu22.04-535.86.10-snp
name: kata-nvidia-gpu-snp
nodeSelector:
nvidia.com/cc.capable: 'true'
repository: nvcr.io/nvidia/cloud-native
image: k8s-kata-manager
- version: v0.2.1
+ version: v0.2.2
imagePullPolicy: IfNotPresent
vfioManager:
enabled: true
repository: nvcr.io/nvidia
image: cuda
- version: 12.6.1-base-ubi8
+ version: 12.6.2-base-ubi9
imagePullPolicy: IfNotPresent
driverManager:
repository: nvcr.io/nvidia/cloud-native
image: k8s-driver-manager
- version: v0.6.10
+ version: v0.7.0
imagePullPolicy: IfNotPresent
env:
- name: ENABLE_GPU_POD_EVICTION
value: 'false'
- name: ENABLE_AUTO_DRAIN
value: 'false'
vgpuDeviceManager:
enabled: true
repository: nvcr.io/nvidia/cloud-native
image: vgpu-device-manager
- version: v0.2.7
+ version: v0.2.8
imagePullPolicy: IfNotPresent
config:
default: default
name: ''
ccManager:
enabled: false
@@ -189,13 +189,13 @@
value: none
installDir: /var/nvidia
devicePlugin:
enabled: true
repository: nvcr.io/nvidia
image: k8s-device-plugin
- version: v0.16.2-ubi8
+ version: v0.17.0-ubi9
imagePullPolicy: IfNotPresent
env:
- name: PASS_DEVICE_SPECS
value: 'true'
- name: FAIL_ON_INIT_ERROR
value: 'true'
@@ -211,19 +211,19 @@
name: time-slicing-config-all
default: any
dcgm:
enabled: false
repository: nvcr.io/nvidia/cloud-native
image: dcgm
- version: 3.3.7-1-ubuntu22.04
+ version: 3.3.8-1-ubuntu22.04
imagePullPolicy: IfNotPresent
dcgmExporter:
enabled: true
repository: nvcr.io/nvidia/k8s
image: dcgm-exporter
- version: 3.3.7-3.5.0-ubuntu22.04
+ version: 3.3.8-3.6.0-ubuntu22.04
imagePullPolicy: IfNotPresent
env:
- name: DCGM_EXPORTER_LISTEN
value: :9400
- name: DCGM_EXPORTER_KUBERNETES
value: 'true'
@@ -236,24 +236,24 @@
interval: 15s
relabelings: []
gfd:
enabled: true
repository: nvcr.io/nvidia
image: k8s-device-plugin
- version: v0.16.2-ubi8
+ version: v0.17.0-ubi9
imagePullPolicy: IfNotPresent
env:
- name: GFD_SLEEP_INTERVAL
value: 60s
- name: GFD_FAIL_ON_INIT_ERROR
value: 'true'
migManager:
enabled: true
repository: nvcr.io/nvidia/cloud-native
image: k8s-mig-manager
- version: v0.8.0-ubuntu20.04
+ version: v0.10.0-ubuntu20.04
imagePullPolicy: IfNotPresent
env:
- name: WITH_REBOOT
value: 'false'
config:
name: null
@@ -261,24 +261,24 @@
gpuClientsConfig:
name: ''
nodeStatusExporter:
enabled: false
repository: nvcr.io/nvidia/cloud-native
image: gpu-operator-validator
- version: v24.6.2
+ version: v24.9.0
imagePullPolicy: IfNotPresent
gdrcopy:
enabled: false
repository: nvcr.io/nvidia/cloud-native
image: gdrdrv
- version: v2.4.1-1
+ version: v2.4.1-2
imagePullPolicy: IfNotPresent
sandboxWorkloads:
enabled: false
defaultWorkload: container
sandboxDevicePlugin:
enabled: true
repository: nvcr.io/nvidia
image: kubevirt-gpu-device-plugin
- version: v1.2.9
+ version: v1.2.10
imagePullPolicy: IfNotPresent
--- HelmRelease: gpu-operator/gpu-operator ServiceAccount: gpu-operator/gpu-operator-upgrade-crd-hook-sa
+++ HelmRelease: gpu-operator/gpu-operator ServiceAccount: gpu-operator/gpu-operator-upgrade-crd-hook-sa
@@ -0,0 +1,10 @@
+---
+apiVersion: v1
+kind: ServiceAccount
+metadata:
+ name: gpu-operator-upgrade-crd-hook-sa
+ annotations:
+ helm.sh/hook: pre-upgrade
+ helm.sh/hook-delete-policy: hook-succeeded,before-hook-creation
+ helm.sh/hook-weight: '0'
+
--- HelmRelease: gpu-operator/gpu-operator ClusterRole: gpu-operator/gpu-operator-upgrade-crd-hook-role
+++ HelmRelease: gpu-operator/gpu-operator ClusterRole: gpu-operator/gpu-operator-upgrade-crd-hook-role
@@ -0,0 +1,22 @@
+---
+apiVersion: rbac.authorization.k8s.io/v1
+kind: ClusterRole
+metadata:
+ name: gpu-operator-upgrade-crd-hook-role
+ annotations:
+ helm.sh/hook: pre-upgrade
+ helm.sh/hook-delete-policy: hook-succeeded,before-hook-creation
+ helm.sh/hook-weight: '0'
+rules:
+- apiGroups:
+ - apiextensions.k8s.io
+ resources:
+ - customresourcedefinitions
+ verbs:
+ - create
+ - get
+ - list
+ - watch
+ - patch
+ - update
+
--- HelmRelease: gpu-operator/gpu-operator ClusterRoleBinding: gpu-operator/gpu-operator-upgrade-crd-hook-binding
+++ HelmRelease: gpu-operator/gpu-operator ClusterRoleBinding: gpu-operator/gpu-operator-upgrade-crd-hook-binding
@@ -0,0 +1,18 @@
+---
+kind: ClusterRoleBinding
+apiVersion: rbac.authorization.k8s.io/v1
+metadata:
+ name: gpu-operator-upgrade-crd-hook-binding
+ annotations:
+ helm.sh/hook: pre-upgrade
+ helm.sh/hook-delete-policy: hook-succeeded,before-hook-creation
+ helm.sh/hook-weight: '0'
+subjects:
+- kind: ServiceAccount
+ name: gpu-operator-upgrade-crd-hook-sa
+ namespace: gpu-operator
+roleRef:
+ kind: ClusterRole
+ name: gpu-operator-upgrade-crd-hook-role
+ apiGroup: rbac.authorization.k8s.io
+
--- HelmRelease: gpu-operator/gpu-operator Job: gpu-operator/gpu-operator-upgrade-crd
+++ HelmRelease: gpu-operator/gpu-operator Job: gpu-operator/gpu-operator-upgrade-crd
@@ -0,0 +1,46 @@
+---
+apiVersion: batch/v1
+kind: Job
+metadata:
+ name: gpu-operator-upgrade-crd
+ namespace: gpu-operator
+ annotations:
+ helm.sh/hook: pre-upgrade
+ helm.sh/hook-weight: '1'
+ helm.sh/hook-delete-policy: hook-succeeded,before-hook-creation
+ labels:
+ app.kubernetes.io/name: gpu-operator
+ app.kubernetes.io/instance: gpu-operator
+ app.kubernetes.io/managed-by: Helm
+ app.kubernetes.io/component: gpu-operator
+spec:
+ template:
+ metadata:
+ name: gpu-operator-upgrade-crd
+ labels:
+ app.kubernetes.io/name: gpu-operator
+ app.kubernetes.io/instance: gpu-operator
+ app.kubernetes.io/managed-by: Helm
+ app.kubernetes.io/component: gpu-operator
+ spec:
+ serviceAccountName: gpu-operator-upgrade-crd-hook-sa
+ tolerations:
+ - effect: NoSchedule
+ key: node-role.kubernetes.io/master
+ operator: Equal
+ value: ''
+ - effect: NoSchedule
+ key: node-role.kubernetes.io/control-plane
+ operator: Equal
+ value: ''
+ containers:
+ - name: upgrade-crd
+ image: ghcr.io/jfroy/gpu-operator:v24.6.2-ubi8
+ imagePullPolicy: IfNotPresent
+ command:
+ - /bin/sh
+ - -c
+ - |
+ kubectl apply -f /opt/gpu-operator/nvidia.com_clusterpolicies.yaml; kubectl apply -f /opt/gpu-operator/nvidia.com_nvidiadrivers.yaml;
+ restartPolicy: OnFailure
+ |
--- kubernetes/apps/gpu-operator/gpu-operator/app Kustomization: flux-system/gpu-operator HelmRelease: gpu-operator/gpu-operator
+++ kubernetes/apps/gpu-operator/gpu-operator/app Kustomization: flux-system/gpu-operator HelmRelease: gpu-operator/gpu-operator
@@ -13,13 +13,13 @@
spec:
chart: gpu-operator
sourceRef:
kind: HelmRepository
name: nvidia
namespace: flux-system
- version: v24.6.2
+ version: v24.9.0
driftDetection:
mode: enabled
install:
crds: CreateReplace
disableOpenAPIValidation: true
remediation: |
jfroy
force-pushed
the
main
branch
10 times, most recently
from
November 7, 2024 18:10
44a8b71
to
e2e1ece
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR contains the following updates:
v24.6.2
->v24.9.0
Release Notes
NVIDIA/gpu-operator (gpu-operator)
v24.9.0
: GPU Operator 24.9.0 ReleaseCompare Source
https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/24.9.0/release-notes.html
Configuration
📅 Schedule: Branch creation - "after 10pm every weekday,before 5am every weekday,every weekend" in timezone America/Los_Angeles, Automerge - At any time (no schedule defined).
🚦 Automerge: Disabled by config. Please merge this manually once you are satisfied.
♻ Rebasing: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.
🔕 Ignore: Close this PR and you won't be reminded about this update again.
This PR was generated by Mend Renovate. View the repository job log.