Try using vgpu-manager to solve the scheduling and allocation problems of VGPU. It will solve some existing problems and add new features that you may be interested in.
This project is a wrapper of NVIDIA driver library, it's a component of gpu-manager which makes Kubernetes can not only run more than one Pod on the same GPU, but also give QoS guaranteed to each Pod. For more details, please refer to our paper here.
IMAGE_FILE=<your image name without version> ./build-img.sh
./find_new_lib.sh /lib/x86_64-linux-gnu/libcuda.so.535.54.03 /lib/x86_64-linux-gnu/libnvidia-ml.so.535.54.03CUDA 12.2.0 and before are supporteds
Any architecture of GPU after Kepler are supported