-
Notifications
You must be signed in to change notification settings - Fork 5
feat(helm)!: Update chart gpu-operator ( v24.9.2 → v25.3.4 ) #2173
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
samip5-bot
wants to merge
1
commit into
main
Choose a base branch
from
renovate/media-gpu-operator-25.x
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
--- HelmRelease: gpu-operator/nvidia-gpu-operator ClusterRole: gpu-operator/nvidia-gpu-operator-node-feature-discovery
+++ HelmRelease: gpu-operator/nvidia-gpu-operator ClusterRole: gpu-operator/nvidia-gpu-operator-node-feature-discovery
@@ -5,12 +5,19 @@
name: nvidia-gpu-operator-node-feature-discovery
labels:
app.kubernetes.io/name: node-feature-discovery
app.kubernetes.io/instance: nvidia-gpu-operator
app.kubernetes.io/managed-by: Helm
rules:
+- apiGroups:
+ - ''
+ resources:
+ - namespaces
+ verbs:
+ - watch
+ - list
- apiGroups:
- ''
resources:
- nodes
- nodes/status
verbs:
--- HelmRelease: gpu-operator/nvidia-gpu-operator ClusterRole: gpu-operator/gpu-operator
+++ HelmRelease: gpu-operator/nvidia-gpu-operator ClusterRole: gpu-operator/gpu-operator
@@ -66,30 +66,39 @@
- ''
resources:
- namespaces
verbs:
- get
- list
- - create
- watch
- update
- patch
- apiGroups:
- ''
resources:
- events
- - pods
- - pods/eviction
verbs:
- create
- get
- list
- watch
- - update
- - patch
- delete
+- apiGroups:
+ - ''
+ resources:
+ - pods
+ verbs:
+ - get
+ - list
+ - watch
+- apiGroups:
+ - ''
+ resources:
+ - pods/eviction
+ verbs:
+ - create
- apiGroups:
- apps
resources:
- daemonsets
verbs:
- get
--- HelmRelease: gpu-operator/nvidia-gpu-operator ClusterRoleBinding: gpu-operator/gpu-operator
+++ HelmRelease: gpu-operator/nvidia-gpu-operator ClusterRoleBinding: gpu-operator/gpu-operator
@@ -9,14 +9,11 @@
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/component: gpu-operator
subjects:
- kind: ServiceAccount
name: gpu-operator
namespace: gpu-operator
-- kind: ServiceAccount
- name: node-feature-discovery
- namespace: gpu-operator
roleRef:
kind: ClusterRole
name: gpu-operator
apiGroup: rbac.authorization.k8s.io
--- HelmRelease: gpu-operator/nvidia-gpu-operator Role: gpu-operator/nvidia-gpu-operator-node-feature-discovery-worker
+++ HelmRelease: gpu-operator/nvidia-gpu-operator Role: gpu-operator/nvidia-gpu-operator-node-feature-discovery-worker
@@ -14,12 +14,13 @@
resources:
- nodefeatures
verbs:
- create
- get
- update
+ - delete
- apiGroups:
- ''
resources:
- pods
verbs:
- get
--- HelmRelease: gpu-operator/nvidia-gpu-operator DaemonSet: gpu-operator/nvidia-gpu-operator-node-feature-discovery-worker
+++ HelmRelease: gpu-operator/nvidia-gpu-operator DaemonSet: gpu-operator/nvidia-gpu-operator-node-feature-discovery-worker
@@ -34,23 +34,23 @@
allowPrivilegeEscalation: false
capabilities:
drop:
- ALL
readOnlyRootFilesystem: true
runAsNonRoot: true
- image: registry.k8s.io/nfd/node-feature-discovery:v0.16.6
+ image: registry.k8s.io/nfd/node-feature-discovery:v0.17.3
imagePullPolicy: IfNotPresent
livenessProbe:
grpc:
port: 8082
initialDelaySeconds: 10
readinessProbe:
- failureThreshold: 10
grpc:
port: 8082
initialDelaySeconds: 5
+ failureThreshold: 10
env:
- name: NODE_NAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
- name: POD_NAME
@@ -67,13 +67,12 @@
requests:
cpu: 5m
memory: 64Mi
command:
- nfd-worker
args:
- - -feature-gates=NodeFeatureAPI=true
- -feature-gates=NodeFeatureGroupAPI=false
- -metrics=8081
- -grpc-health=8082
ports:
- containerPort: 8081
name: metrics
@@ -95,15 +94,12 @@
- name: host-lib
mountPath: /host-lib
readOnly: true
- name: host-proc-swaps
mountPath: /host-proc/swaps
readOnly: true
- - name: source-d
- mountPath: /etc/kubernetes/node-feature-discovery/source.d/
- readOnly: true
- name: features-d
mountPath: /etc/kubernetes/node-feature-discovery/features.d/
readOnly: true
- name: nfd-worker-conf
mountPath: /etc/kubernetes/node-feature-discovery
readOnly: true
@@ -123,15 +119,12 @@
- name: host-lib
hostPath:
path: /lib
- name: host-proc-swaps
hostPath:
path: /proc/swaps
- - name: source-d
- hostPath:
- path: /etc/kubernetes/node-feature-discovery/source.d/
- name: features-d
hostPath:
path: /etc/kubernetes/node-feature-discovery/features.d/
- name: nfd-worker-conf
configMap:
name: nvidia-gpu-operator-node-feature-discovery-worker-conf
--- HelmRelease: gpu-operator/nvidia-gpu-operator Deployment: gpu-operator/nvidia-gpu-operator-node-feature-discovery-master
+++ HelmRelease: gpu-operator/nvidia-gpu-operator Deployment: gpu-operator/nvidia-gpu-operator-node-feature-discovery-master
@@ -35,26 +35,26 @@
allowPrivilegeEscalation: false
capabilities:
drop:
- ALL
readOnlyRootFilesystem: true
runAsNonRoot: true
- image: registry.k8s.io/nfd/node-feature-discovery:v0.16.6
+ image: registry.k8s.io/nfd/node-feature-discovery:v0.17.3
imagePullPolicy: IfNotPresent
+ startupProbe:
+ grpc:
+ port: 8082
+ failureThreshold: 30
livenessProbe:
grpc:
port: 8082
- initialDelaySeconds: 10
readinessProbe:
- failureThreshold: 10
grpc:
port: 8082
- initialDelaySeconds: 5
+ failureThreshold: 10
ports:
- - containerPort: 8080
- name: grpc
- containerPort: 8081
name: metrics
- containerPort: 8082
name: health
env:
- name: NODE_NAME
@@ -67,14 +67,13 @@
limits:
memory: 4Gi
requests:
cpu: 100m
memory: 128Mi
args:
- - -crd-controller=true
- - -feature-gates=NodeFeatureAPI=true
+ - -enable-leader-election
- -feature-gates=NodeFeatureGroupAPI=false
- -metrics=8081
- -grpc-health=8082
volumeMounts:
- name: nfd-master-conf
mountPath: /etc/kubernetes/node-feature-discovery
--- HelmRelease: gpu-operator/nvidia-gpu-operator Deployment: gpu-operator/nvidia-gpu-operator-node-feature-discovery-gc
+++ HelmRelease: gpu-operator/nvidia-gpu-operator Deployment: gpu-operator/nvidia-gpu-operator-node-feature-discovery-gc
@@ -28,13 +28,13 @@
dnsPolicy: ClusterFirstWithHostNet
priorityClassName: system-node-critical
securityContext: {}
hostNetwork: false
containers:
- name: gc
- image: registry.k8s.io/nfd/node-feature-discovery:v0.16.6
+ image: registry.k8s.io/nfd/node-feature-discovery:v0.17.3
imagePullPolicy: IfNotPresent
env:
- name: NODE_NAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
--- HelmRelease: gpu-operator/nvidia-gpu-operator Deployment: gpu-operator/gpu-operator
+++ HelmRelease: gpu-operator/nvidia-gpu-operator Deployment: gpu-operator/gpu-operator
@@ -28,13 +28,13 @@
openshift.io/scc: restricted-readonly
spec:
serviceAccountName: gpu-operator
priorityClassName: system-node-critical
containers:
- name: gpu-operator
- image: nvcr.io/nvidia/gpu-operator:v24.9.2
+ image: nvcr.io/nvidia/gpu-operator:v25.3.4
imagePullPolicy: IfNotPresent
command:
- gpu-operator
args:
- --leader-elect
- --zap-time-encoding=epoch
@@ -44,13 +44,13 @@
value: ''
- name: OPERATOR_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
- name: DRIVER_MANAGER_IMAGE
- value: nvcr.io/nvidia/cloud-native/k8s-driver-manager:v0.7.0
+ value: nvcr.io/nvidia/cloud-native/k8s-driver-manager:v0.8.1
volumeMounts:
- name: host-os-release
mountPath: /host-etc/os-release
readOnly: true
livenessProbe:
httpGet:
--- HelmRelease: gpu-operator/nvidia-gpu-operator ClusterPolicy: gpu-operator/cluster-policy
+++ HelmRelease: gpu-operator/nvidia-gpu-operator ClusterPolicy: gpu-operator/cluster-policy
@@ -10,51 +10,47 @@
app.kubernetes.io/component: gpu-operator
spec:
hostPaths:
rootFS: /
driverInstallDir: /run/nvidia/driver
operator:
- defaultRuntime: docker
runtimeClass: nvidia
initContainer:
repository: nvcr.io/nvidia
image: cuda
- version: 12.6.3-base-ubi9
+ version: 13.0.1-base-ubi9
imagePullPolicy: IfNotPresent
daemonsets:
labels:
- helm.sh/chart: gpu-operator-v24.9.2
+ helm.sh/chart: gpu-operator-v25.3.4
app.kubernetes.io/managed-by: gpu-operator
tolerations:
- effect: NoSchedule
key: nvidia.com/gpu
operator: Exists
priorityClassName: system-node-critical
updateStrategy: RollingUpdate
rollingUpdate:
maxUnavailable: '1'
validator:
repository: nvcr.io/nvidia/cloud-native
image: gpu-operator-validator
- version: v24.9.2
- imagePullPolicy: IfNotPresent
- plugin:
- env:
- - name: WITH_WORKLOAD
- value: 'false'
+ version: v25.3.4
+ imagePullPolicy: IfNotPresent
+ plugin: null
mig:
strategy: single
psa:
enabled: false
cdi:
enabled: false
default: false
driver:
enabled: false
useNvidiaDriverCRD: false
- useOpenKernelModules: false
+ kernelModuleType: auto
usePrecompiled: false
repository: registry.skysolutions.fi/library/nvidia
image: driver
version: 550.90.07
imagePullPolicy: IfNotPresent
startupProbe:
@@ -65,27 +61,14 @@
rdma:
enabled: false
useHostMofed: false
manager:
repository: nvcr.io/nvidia/cloud-native
image: k8s-driver-manager
- version: v0.7.0
- imagePullPolicy: IfNotPresent
- env:
- - name: ENABLE_GPU_POD_EVICTION
- value: 'true'
- - name: ENABLE_AUTO_DRAIN
- value: 'false'
- - name: DRAIN_USE_FORCE
- value: 'false'
- - name: DRAIN_POD_SELECTOR_LABEL
- value: ''
- - name: DRAIN_TIMEOUT_SECONDS
- value: 0s
- - name: DRAIN_DELETE_EMPTYDIR_DATA
- value: 'false'
+ version: v0.8.1
+ imagePullPolicy: IfNotPresent
repoConfig:
configMapName: ''
certConfig:
name: ''
licensingConfig:
configMapName: ''
@@ -113,19 +96,14 @@
enabled: false
image: vgpu-manager
imagePullPolicy: IfNotPresent
driverManager:
repository: nvcr.io/nvidia/cloud-native
image: k8s-driver-manager
- version: v0.7.0
- imagePullPolicy: IfNotPresent
- env:
- - name: ENABLE_GPU_POD_EVICTION
- value: 'false'
- - name: ENABLE_AUTO_DRAIN
- value: 'false'
+ version: v0.8.1
+ imagePullPolicy: IfNotPresent
kataManager:
enabled: false
config:
artifactsDir: /opt/nvidia-gpu-operator/artifacts/runtimeclasses
runtimeClasses:
- artifacts:
@@ -138,35 +116,30 @@
url: nvcr.io/nvidia/cloud-native/kata-gpu-artifacts:ubuntu22.04-535.86.10-snp
name: kata-nvidia-gpu-snp
nodeSelector:
nvidia.com/cc.capable: 'true'
repository: nvcr.io/nvidia/cloud-native
image: k8s-kata-manager
- version: v0.2.2
+ version: v0.2.3
imagePullPolicy: IfNotPresent
vfioManager:
enabled: true
repository: nvcr.io/nvidia
image: cuda
- version: 12.6.3-base-ubi9
+ version: 13.0.1-base-ubi9
imagePullPolicy: IfNotPresent
driverManager:
repository: nvcr.io/nvidia/cloud-native
image: k8s-driver-manager
- version: v0.7.0
- imagePullPolicy: IfNotPresent
- env:
- - name: ENABLE_GPU_POD_EVICTION
- value: 'false'
- - name: ENABLE_AUTO_DRAIN
- value: 'false'
+ version: v0.8.1
+ imagePullPolicy: IfNotPresent
vgpuDeviceManager:
enabled: true
repository: nvcr.io/nvidia/cloud-native
image: vgpu-device-manager
- version: v0.2.8
+ version: v0.4.0
imagePullPolicy: IfNotPresent
config:
default: default
name: ''
ccManager:
enabled: false
@@ -177,13 +150,13 @@
imagePullPolicy: IfNotPresent
env: []
toolkit:
enabled: true
repository: nvcr.io/nvidia/k8s
image: container-toolkit
- version: v1.17.4-ubuntu20.04
+ version: v1.17.8-ubuntu20.04
imagePullPolicy: IfNotPresent
env:
- name: CONTAINERD_CONFIG
value: /var/lib/rancher/k3s/agent/etc/containerd/config.toml
- name: CONTAINERD_SOCKET
value: /run/k3s/containerd/containerd.sock
@@ -193,96 +166,70 @@
value: 'true'
installDir: /usr/local/nvidia
devicePlugin:
enabled: true
repository: nvcr.io/nvidia
image: k8s-device-plugin
- version: v0.17.0
- imagePullPolicy: IfNotPresent
- env:
- - name: PASS_DEVICE_SPECS
- value: 'true'
- - name: FAIL_ON_INIT_ERROR
- value: 'true'
- - name: DEVICE_LIST_STRATEGY
- value: envvar
- - name: DEVICE_ID_STRATEGY
- value: uuid
- - name: NVIDIA_VISIBLE_DEVICES
- value: all
- - name: NVIDIA_DRIVER_CAPABILITIES
- value: all
+ version: v0.17.4
+ imagePullPolicy: IfNotPresent
config:
name: time-slicing-config
default: any
dcgm:
enabled: false
repository: nvcr.io/nvidia/cloud-native
image: dcgm
- version: 3.3.9-1-ubuntu22.04
+ version: 4.3.1-1-ubuntu22.04
imagePullPolicy: IfNotPresent
dcgmExporter:
enabled: true
repository: nvcr.io/nvidia/k8s
image: dcgm-exporter
- version: 3.3.9-3.6.1-ubuntu22.04
- imagePullPolicy: IfNotPresent
- env:
- - name: DCGM_EXPORTER_LISTEN
- value: :9400
- - name: DCGM_EXPORTER_KUBERNETES
- value: 'true'
- - name: DCGM_EXPORTER_COLLECTORS
- value: /etc/dcgm-exporter/dcp-metrics-included.csv
+ version: 4.3.1-4.4.0-ubuntu22.04
+ imagePullPolicy: IfNotPresent
serviceMonitor:
additionalLabels: {}
enabled: false
honorLabels: false
interval: 15s
relabelings: []
+ service:
+ internalTrafficPolicy: Cluster
gfd:
enabled: true
repository: nvcr.io/nvidia
image: k8s-device-plugin
- version: v0.17.0
- imagePullPolicy: IfNotPresent
- env:
- - name: GFD_SLEEP_INTERVAL
- value: 60s
- - name: GFD_FAIL_ON_INIT_ERROR
- value: 'true'
+ version: v0.17.4
+ imagePullPolicy: IfNotPresent
migManager:
enabled: true
repository: nvcr.io/nvidia/cloud-native
image: k8s-mig-manager
- version: v0.10.0-ubuntu20.04
- imagePullPolicy: IfNotPresent
- env:
- - name: WITH_REBOOT
- value: 'false'
+ version: v0.12.3-ubuntu20.04
+ imagePullPolicy: IfNotPresent
config:
name: null
default: all-disabled
gpuClientsConfig:
name: ''
nodeStatusExporter:
enabled: false
repository: nvcr.io/nvidia/cloud-native
image: gpu-operator-validator
- version: v24.9.2
+ version: v25.3.4
imagePullPolicy: IfNotPresent
gdrcopy:
enabled: false
repository: nvcr.io/nvidia/cloud-native
image: gdrdrv
- version: v2.4.1-2
+ version: v2.5.1
imagePullPolicy: IfNotPresent
sandboxWorkloads:
enabled: false
defaultWorkload: container
sandboxDevicePlugin:
enabled: true
repository: nvcr.io/nvidia
image: kubevirt-gpu-device-plugin
- version: v1.2.10
+ version: v1.4.0
imagePullPolicy: IfNotPresent
--- HelmRelease: gpu-operator/nvidia-gpu-operator Job: gpu-operator/nvidia-gpu-operator-node-feature-discovery-prune
+++ HelmRelease: gpu-operator/nvidia-gpu-operator Job: gpu-operator/nvidia-gpu-operator-node-feature-discovery-prune
@@ -27,13 +27,13 @@
allowPrivilegeEscalation: false
capabilities:
drop:
- ALL
readOnlyRootFilesystem: true
runAsNonRoot: true
- image: registry.k8s.io/nfd/node-feature-discovery:v0.16.6
+ image: registry.k8s.io/nfd/node-feature-discovery:v0.17.3
imagePullPolicy: IfNotPresent
command:
- nfd-master
args:
- -prune
restartPolicy: Never
--- HelmRelease: gpu-operator/nvidia-gpu-operator Job: gpu-operator/gpu-operator-upgrade-crd
+++ HelmRelease: gpu-operator/nvidia-gpu-operator Job: gpu-operator/gpu-operator-upgrade-crd
@@ -32,13 +32,13 @@
- effect: NoSchedule
key: node-role.kubernetes.io/control-plane
operator: Equal
value: ''
containers:
- name: upgrade-crd
- image: nvcr.io/nvidia/gpu-operator:v24.9.2
+ image: nvcr.io/nvidia/gpu-operator:v25.3.4
imagePullPolicy: IfNotPresent
command:
- /bin/sh
- -c
- |
kubectl apply -f /opt/gpu-operator/nvidia.com_clusterpolicies.yaml; kubectl apply -f /opt/gpu-operator/nvidia.com_nvidiadrivers.yaml; kubectl apply -f /opt/gpu-operator/nfd-api-crds.yaml; |
--- k8s/media/apps/gpu/operator/app Kustomization: flux-system/nvidia-gpu-operator HelmRelease: gpu-operator/nvidia-gpu-operator
+++ k8s/media/apps/gpu/operator/app Kustomization: flux-system/nvidia-gpu-operator HelmRelease: gpu-operator/nvidia-gpu-operator
@@ -12,13 +12,13 @@
spec:
chart: gpu-operator
sourceRef:
kind: HelmRepository
name: nvidia
namespace: flux-system
- version: v24.9.2
+ version: v25.3.4
install:
crds: CreateReplace
remediation:
retries: 3
interval: 15m
maxHistory: 2 |
190e1c6
to
19b47ff
Compare
19b47ff
to
ce51447
Compare
ce51447
to
f2a2aa8
Compare
| datasource | package | from | to | | ---------- | ------------ | ------- | ------- | | helm | gpu-operator | v24.9.2 | v25.3.4 |
f2a2aa8
to
46bdf5e
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR contains the following updates:
v24.9.2
->v25.3.4
Release Notes
NVIDIA/gpu-operator (gpu-operator)
v25.3.4
: GPU Operator 25.3.4 ReleaseCompare Source
https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/25.3.4/release-notes.html
v25.3.3
: GPU Operator 25.3.3 ReleaseCompare Source
https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/25.3.3/release-notes.html
v25.3.2
: GPU Operator 25.3.2 ReleaseCompare Source
https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/25.3.2/release-notes.html
v25.3.1
: GPU Operator 25.3.1 ReleaseCompare Source
https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/25.3.1/release-notes.html
v25.3.0
: GPU Operator 25.3.0 ReleaseCompare Source
https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/25.3.0/release-notes.html
Configuration
📅 Schedule: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).
🚦 Automerge: Disabled by config. Please merge this manually once you are satisfied.
♻ Rebasing: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.
🔕 Ignore: Close this PR and you won't be reminded about this update again.
This PR has been generated by Renovate Bot.