This guide assumes that the NVIDIA drivers and nvidia-docker2 have been installed.
Enable the Nvidia runtime as your default runtime on your node. To do this, please edit the docker daemon config file which is usually present at /etc/docker/daemon.json:
{
"default-runtime": "nvidia",
"runtimes": {
"nvidia": {
"path": "/usr/bin/nvidia-container-runtime",
"runtimeArgs": []
}
}
}
if
runtimes
is not already present, head to the install page of nvidia-docker
kubectl create -f https://raw.githubusercontent.com/AliyunContainerService/gpushare-scheduler-extender/master/config/gpushare-schd-extender.yaml
The goal is to include scheduler-policy-config.json
into the scheduler configuration (/etc/kubernetes/manifests/kube-scheduler.yaml
).
Notice: If your Kubernetes default scheduler is deployed as static pod, don't edit the yaml file inside /etc/kubernetes/manifest. You need to edit the yaml file outside the
/etc/kubernetes/manifest
directory. and copy the yaml file you edited to the '/etc/kubernetes/manifest/' directory, and then kubernetes will update the default static pod with the yaml file automatically.
From Kubernetes v1.23 scheduling policies are no longer supported instead scheduler configurations should be used.
That means scheduler-policy-config.yaml
needs to be included in the scheduler config (/etc/kubernetes/manifests/kube-scheduler.yaml
).
Here is the sample of the final modified kube-scheduler.yaml
cd /etc/kubernetes
curl -O https://raw.githubusercontent.com/AliyunContainerService/gpushare-scheduler-extender/master/config/scheduler-policy-config.yaml
- --config=/etc/kubernetes/scheduler-policy-config.yaml
- mountPath: /etc/kubernetes/scheduler-policy-config.yaml
name: scheduler-policy-config
readOnly: true
- hostPath:
path: /etc/kubernetes/scheduler-policy-config.yaml
type: FileOrCreate
name: scheduler-policy-config
Here is the sample of the final modified kube-scheduler.yaml
cd /etc/kubernetes
curl -O https://raw.githubusercontent.com/AliyunContainerService/gpushare-scheduler-extender/master/config/scheduler-policy-config.json
- --policy-config-file=/etc/kubernetes/scheduler-policy-config.json
- mountPath: /etc/kubernetes/scheduler-policy-config.json
name: scheduler-policy-config
readOnly: true
- hostPath:
path: /etc/kubernetes/scheduler-policy-config.json
type: FileOrCreate
name: scheduler-policy-config
kubectl create -f https://raw.githubusercontent.com/AliyunContainerService/gpushare-device-plugin/master/device-plugin-rbac.yaml
kubectl create -f https://raw.githubusercontent.com/AliyunContainerService/gpushare-device-plugin/master/device-plugin-ds.yaml
Notice: please remove default GPU device plugin, for example, if you are using nvidia-device-plugin, you can run
kubectl delete ds -n kube-system nvidia-device-plugin-daemonset
to delete.
You need to add a label "gpushare=true" to all node where you want to install device plugin because the device plugin is deamonset.
kubectl label node <target_node> gpushare=true
For example:
kubectl label node mynode gpushare=true
You can download and install kubectl
for linux
curl -LO https://storage.googleapis.com/kubernetes-release/release/v1.12.1/bin/linux/amd64/kubectl
chmod +x ./kubectl
sudo mv ./kubectl /usr/bin/kubectl
cd /usr/bin/
wget https://github.com/AliyunContainerService/gpushare-device-plugin/releases/download/v0.3.0/kubectl-inspect-gpushare
chmod u+x /usr/bin/kubectl-inspect-gpushare