Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Research: show GPU attached to a function #639

Open
alexellis opened this issue Apr 12, 2018 · 40 comments
Open

Research: show GPU attached to a function #639

alexellis opened this issue Apr 12, 2018 · 40 comments

Comments

@alexellis
Copy link
Member

Description

Show a GPU attached to an OpenFaaS function in Kubernetes

Background

We have several users using Python for data-science where GPU acceleration is available. From the investigation I've done so far we should be able to make a few minor changes to faas-netes and then be able to mount a GPU into a function.

Tasks

  • List compatible GPUs
  • Write some code to mount a GPU
  • Produce a short list of steps to document how to test the patches/PR
  • Document any specific requirements / limitations

Other notes

GKE has GPUs available pre-configured under Kubernetes - I think this would be the easiest way to test - https://thenewstack.io/getting-started-with-gpus-in-google-kubernetes-engine/

Otherwise you'll need an Nvidia GPU and the process for configuring your kubelet is not trivial

Documentation page from Kubernetes:

https://kubernetes.io/docs/tasks/manage-gpus/scheduling-gpus/

@LucasRoesler
Copy link
Member

Per these docs https://kubernetes.io/docs/tasks/manage-gpus/scheduling-gpus/

There are two core changes

  1. There is a new schedulable resource nvidia.com/gpu, this is a required change.
  2. it is possible to have mixed types of resources, so they recommend using node lables and node selectors to ensure that your pod ends up on the node with the specific GPU you are looking for (this is probably optional and very advanced).

The simplest example of a pod using a GPU is provided as

apiVersion: v1
kind: Pod
metadata:
  name: cuda-vector-add
spec:
  restartPolicy: OnFailure
  containers:
    - name: cuda-vector-add
      # https://github.com/kubernetes/kubernetes/blob/v1.7.11/test/images/nvidia-cuda/Dockerfile
      image: "k8s.gcr.io/cuda-vector-add:v0.1"
      resources:
        limits:
          nvidia.com/gpu: 1 # requesting 1 GPU

Note the resources.limits. This field has very particular restrictions that are different from CPUs. These are listed as

  • GPUs are only supposed to be specified in the limits section, which means:
  • You can specify GPU limits without specifying requests because Kubernetes will use the limit as the request value by default.
  • You can specify GPU in both limits and requests but these two values must be equal.
  • You cannot specify GPU requests without specifying limits.
  • Containers (and pods) do not share GPUs. There’s no overcommitting of GPUs.
  • Each container can request one or more GPUs. It is not possible to request a fraction of a GPU.

I think most of these changes will be made in the request struct and in the stack file schema https://github.com/openfaas/faas/blob/master/gateway/requests/requests.go#L47 and https://github.com/openfaas/faas-cli/blob/master/stack/schema.go#L50

Modifying the FunctionRequests struct would be the absolute minimum required change.

To support the mixed GPU case, we need to support allowing the developer to specify a nodeSelector, e.g.

apiVersion: v1
kind: Pod
metadata:
  name: cuda-vector-add
spec:
  restartPolicy: OnFailure
  containers:
    - name: cuda-vector-add
      image: "k8s.gcr.io/cuda-vector-add:v0.1"
      resources:
        limits:
          nvidia.com/gpu: 1
  nodeSelector:
    accelerator: nvidia-tesla-p100

This would be adding a new option to the http requests and the stack schema.

@stefanprodan
Copy link
Contributor

We already cover the node selector via stack constraints.

PS. I think this issue should be moved to faas-netes since it's Kubernetes specific.

@alexellis
Copy link
Member Author

I would like project research and initiatives to start out here in the FaaS repo for visibility.

Thanks for the comments Lucas.

@dkozlov
Copy link

dkozlov commented Apr 15, 2018

FYI: https://github.com/dkozlov/openfaas-tensorflow-gpu

@alexellis
Copy link
Member Author

alexellis commented Apr 16, 2018

That project looks like a useful example.

  • I can’t see a patch for the GPU. Does it “just work”?
  • could you run two different functions using GPU at the same time?
  • is it demonstrably faster on GPU vs. CPU?

@alexellis
Copy link
Member Author

@dkozlov we had some discussion about this on Slack.. please can you summarize the points for the community?

@dkozlov
Copy link

dkozlov commented May 9, 2018

Sorry for late response,

I can’t see a patch for the GPU. Does it “just work”?

Yes, it "just work" after installing nvidia-docker

could you run two different functions using GPU at the same time?

Yes, I can

is it demonstrably faster on GPU vs. CPU?

It depends on how you utilize your GPU, but in most cases neural networks on GPU is demonstrably faster than CPU

@alexellis
Copy link
Member Author

could you run two different functions using GPU at the same time?
Yes, I can

I'm confused by this comment - I thought we were talking about scheduling constraints on Slack because two Pods cannot use the same GPU at the same time?

@dkozlov
Copy link

dkozlov commented May 9, 2018

I have found following problems with native Schedule GPUs:

Containers (and pods) do not share GPUs. There’s no overcommitting of GPUs. - Each container can request one or more GPUs. It is not possible to request a fraction of a GPU.

As workaround I have implemented following:

  • Install only nvidia-docker, do not install k8s-device-plugin
  • Add label to GPU nodes (sudo kubectl label nodes node1 node2 label=gpu) and to your OpenFaaS function
    labels:
       label: gpu
    constraints:
     - "label=gpu"

@dkozlov
Copy link

dkozlov commented May 9, 2018

I'm confused by this comment - I thought we were talking about scheduling constraints on Slack because two Pods cannot use the same GPU at the same time?

If you install only NVIDIA drivers, docker and nvidia-docker is enough to start GPU docker containers by kubernetes without any device plugin.

Also I have found two outdated guides for openshift which not support overcommitting of GPUs:
https://blog.openshift.com/use-gpus-openshift-kubernetes/
https://blog.openshift.com/use-gpus-with-device-plugin-in-openshift-3-9/

Some useful information from ClarifAI:
https://clarifai.com/blog/scale-your-gpu-cloud-infrastructure-with-kubernetes

@alexellis
Copy link
Member Author

alexellis commented May 10, 2018

My question was: "could you run two different functions using [the same] GPU at the same time?" (expecting an answer of no) and you answered "Yes, I can". Are we talking about the same thing? I thought GPUs could only be used by a single container/Pod at a time?

@dkozlov
Copy link

dkozlov commented May 10, 2018

I could repeat it again: "Yes, it possible" :). It was even possible in 2016, see ClarifAI blog post allow multiple pods on the same machine to share the same card, even if you know what you’re doing (at least on paper: ask us about this one weird trick to do just that!). Have you checked https://github.com/dkozlov/openfaas-tensorflow-gpu manual?

@dkozlov
Copy link

dkozlov commented May 10, 2018

My question was: "could you run two different functions using [the same] GPU at the same time?" (expecting an answer of no) and you answered "Yes, I can". Are we talking about the same thing? I thought GPUs could only be used by a single container/Pod at a time?
If you try nvidia-docker you could use single GPU by more than one container at a time.

https://github.com/NVIDIA/nvidia-docker/wiki/Frequently-Asked-Questions#can-i-share-a-gpu-between-multiple-containers

Can I share a GPU between multiple containers?

Kubernetes GPU support proposal:
https://github.com/kubernetes/community/blob/master/contributors/design-proposals/resource-management/gpu-support.md

Yes. This is no different than sharing a GPU between multiple processes outside of containers.
Scheduling and compute preemption vary from one GPU architecture to another (e.g. CTA-level, instruction-level).

@alexellis According to the issue kubernetes/kubernetes#52757
From @flx42:

By default, kernels from different processes can't run on one GPU simultaneously (concurrency but not parallelism)

So @flx42 means that it is possible to share NVIDIA device between multiple containers but only in concurrency mode by original NVIDIA design.

https://docs.nvidia.com/deploy/pdf/CUDA_Multi_Process_Service_Overview.pdf

Three Compute Modes are supported via settings accessible in  nvidia-smi.
PROHIBITED - the GPU is not available for compute applications.
EXCLUSIVE_PROCESS - the GPU is assigned to only one process at a time, and individual process threads may submit work to the GPU concurrently.
DEFAULT - multiple processes can use the GPU simultaneously. Individual threads of each process may submit work to the GPU simultaneously.

So by default multiple processes can use the GPU simultaneously even without using "Multi-Process Service"

@flx42
Copy link

flx42 commented May 11, 2018

You are both correct :)

@alexellis

My question was: "could you run two different functions using [the same] GPU at the same time?" (expecting an answer of no) and you answered "Yes, I can". Are we talking about the same thing? I thought GPUs could only be used by a single container/Pod at a time?

This is correct in the scope of Kubernetes, GPU resources are integer values and will belong to a single container. Unless you try to hack around it, that is :)
In the Kubernetes issue linked above, I was trying to pitch the idea of sharing a GPU across all the containers in a single pod.

@dkozlov

So @flx42 means that it is possible to share NVIDIA device between multiple containers but only in concurrency mode by original NVIDIA design.

This is also correct. If you launch containers manually on your machine, you can launch 10 containers accessing the same GPU, no problem. You can also launch 10 processes outside containers, it's not different.

Let's not even talk about Multi Process Service (MPS) for now, you probably want to start with just the upstream GPU support in K8s.
You can find more information in the Volta whitepaper, section VOLTA MULTI-PROCESS SERVICE
http://images.nvidia.com/content/volta-architecture/pdf/volta-architecture-whitepaper.pdf

@dkozlov
Copy link

dkozlov commented May 11, 2018

This is correct in the scope of Kubernetes, GPU resources are integer values and will belong to a single container. Unless you try to hack around it, that is :) In the Kubernetes issue linked above, I was trying to pitch the idea of sharing a GPU across all the containers in a single pod.

@flx42 Which another tricks/hacks we could do for overcommitting of GPUs (using single GPU by multiple pods) in scope of Kubernetes? I am asking because OpenFaaS scales by pods and it could not be scaled by containers in a single pod.

@flx42
Copy link

flx42 commented May 11, 2018

I don't think you should try to hack around the official upstream support: that means don't overcommit GPUs.

If you need to run multiple pods for the same function, you will need multiple GPUs.

@dkozlov
Copy link

dkozlov commented May 13, 2018

FYI: https://github.com/Microsoft/KubeGPU seems that Microsoft trying to solve this problem

@alexellis
Copy link
Member Author

@flx42 thanks for your input 👍 I would like to figure out what we need to do in the project to make it easy to consume GPU in a function on GKE or a bare-metal node / VM with nvidia-docker swapped in. If you'd like to collaborate on this we are also talking on Slack.

@flx42
Copy link

flx42 commented May 14, 2018

I think you should embrace the current upstream support, including its limitations.
If you assume that the cluster is already configured with the NVIDIA device driver, the device plugin and optionally taints/tolerations (see this article), then you can just schedule pods consuming resources of type nvidia.com/gpu.

For the sake of simplicity and to avoid falling into suboptimal scheduling corner cases, I think you should limit the initial implementation to 1 GPU per container. i.e. nvidia.com/gpu: 1

@alexellis
Copy link
Member Author

Would either of you be interested in helping to implement that within the project?

@alexellis
Copy link
Member Author

alexellis commented Jun 21, 2018

@flx42 GPUs in the cloud are very heavy-weight and expensive. What could I buy to use at home for testing this work and ensuring the GPU support is stable?

Do you or @dkozlov have a good container or some sample code that can verify that it has used or is using a GPU? That would be ideal for our testing and proving that things are working end to end.

@feri
Copy link

feri commented Jun 21, 2018

I'm working on a patch that will enable scheduling functions in k8s if there is an extended resource exposed, by let's say a suitable device plugin, such as [this]. The work includes changes to faas-netes and faas-cli, and a minor one to the FunctionResources struct that is from faas. Naturally faas-cli and faas patches will not be k8s specific.

@dkozlov
Copy link

dkozlov commented Jun 21, 2018

Do you or @dkozlov have a good container or some sample code that can verify that it has used or is using a GPU?

@alexellis https://hub.docker.com/r/tensorflow/tensorflow/
nvidia-docker run -it -p 8888:8888 tensorflow/tensorflow:latest-gpu
or
sample code:

python3 -c "from tensorflow.python.client import device_lib; print(device_lib.list_local_devices())"

What could I buy to use at home for testing this work and ensuring the GPU support is stable?

Open https://developer.nvidia.com/cuda-gpus -> CUDA-Enabled GeForce Products -> Select any GPU by Compute Capability >= 6.1

@alexellis
Copy link
Member Author

Derek add label: Hacktoberfest

@derek derek bot added the hacktoberfest label Oct 3, 2018
@sberryman
Copy link
Contributor

Will this work on hosts with >1 GPU? I have a computer with two GTX 1080 TI's that I use for training or bulk inference. NVIDIA allows you to peg a docker container to a single GPU via an environment variable. NVIDIA_VISIBLE_DEVICES=0 would restrict that container to the first GPU while NVIDIA_VISIBLE_DEVICES=1 goes to GPU with index 1, etc.

@flx42
Copy link

flx42 commented Oct 25, 2018

@sberryman yes, our device plugin implementation supports multiple GPUs on one node and set this environment variable accordingly for the container.

@alexellis
Copy link
Member Author

Bump

@vielmetti
Copy link

Bringing this to @DieterReuter attention, with the Jetson Nano as a target device for experimentation.

@alexellis
Copy link
Member Author

@johnmccabe rebuilt his kennel to use the GPU in Docker.

@vielmetti
Copy link

docker run --device=/dev/nvhost-ctrl --device=/dev/nvhost-ctrl-gpu --device=/dev/nvhost-prof-gpu --device=/dev/nvmap --device=/dev/nvhost-gpu --device=/dev/nvhost-as-gpu -v /usr/lib/aarch64-linux-gnu/tegra:/usr/lib/aarch64-linux-gnu/tegra device_query - successfully run per instructions from https://github.com/Technica-Corporation/Tegra-Docker

@aimbot31
Copy link

Hello, is it still a subject going ? Thank's for the answers

@LucasRoesler
Copy link
Member

@aimbot31 I think the best wait to achieve this is through the new Profiles feature in faas-netes https://docs.openfaas.com/reference/profiles/#use-tolerations-and-affinity-to-separate-workloads

Using profiles exposes the ability to give node affinity to your functions so that they run on nodes with a GPU available

@guptaprakash9
Copy link

@LucasRoesler I am not able to get Profiles method to work on GKE. It allows me to run my pod on GPU node fine, but apparently GKE doesn't attach and make GPU available to a pod unless explicitly requested via limits. Since its a managed Kubernetes, I cannot configure default runtime to nvidia-docker on these nodes. Any suggestions?

@LucasRoesler
Copy link
Member

@guptaprakash9 unfortunately, i don't have a good answer. I can't think of a good way to set that request in OpenFaaS right now

@alexellis we either need to expand the spec for requests or add a custom profiler field to configure this

the relevant GKE docs https://cloud.google.com/kubernetes-engine/docs/how-to/gpus#pods_gpus

@alexellis
Copy link
Member Author

@guptaprakash9 if you or anyone landing here still has interest, we would be open to having this feature a) sponsored b) contributed or just c) written up in detail as a proposal

#639 (comment)

@rajitha1998
Copy link

Hello everyone, I would like to implement a component for OpenFaaS which will help to accelerate functions with GPU and TPU. This will benefit in scientific applications, video frame analysis tasks etc. Any ideas/suggestions on what kind of a component will be best fitting to this project?

@rajitha1998
Copy link

Found something interesting @alexellis.

@alexellis
Copy link
Member Author

Thanks for the link @rajitha1998

You could try starting with faasd which is fairly small and hackable, you can configure containerd to use a local GPU or TPU that you have. It's on my list and it's not for lack of interest, but since we are way off the funding target on GitHub Sponsors, and I have no access to such equipment, it's up to the community to invest their own time and resources into this.

@rajitha1998
Copy link

I am currently a bit busy with my university work. But I will add anything here which helps. Will look into this more when I get time :) @alexellis.

Here there is a way to create a local Kubernetes cluster with GPU with MiniKube but a Linux computer is required.

Using the ‘none’ driver seems to be the easy way: https://minikube.sigs.k8s.io/docs/tutorials/nvidia_gpu/

Having a low-level GPU such as Nvidia 940MX is enough: https://developer.nvidia.com/cuda-gpus

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests