using nvidia gpu. #1695

HyeyeonKoo · 2022-01-28T05:19:00Z

Hello, Thank you for the nice work.
I am trying to use OpenFaaS with GPU. But it is not easy.
So, I will be very appreciate if you give me some advices.

Environments

FaaS-CLI version ( Full output from: faas-cli version ): 0.14.1
Docker version docker version (e.g. Docker 17.0.05 ): 20.10.12
Are you using OpenFaaS on Kubernetes or faasd? Minikube 1.25.1
Operating System and version (e.g. Linux, Windows, MacOS): Ubuntu 20.04
plus,
I install nvidia-docker from fallowing this link
and,
I make minikube recognize gpu in cluster from fallowing this link. So, cluster recognize it well.

$ kubectl get nodes "-o=custom-columns=NAME:.metadata.name,GPU:.status.allocatable.nvidia\.com/gpu"
NAME                       GPU
[SYSTEM_NAME]         1

and, also working well in nvidia/cuda container.

But, It didn't work in OpenFaaS funtion. How can I OpenFaaS Function recognize GPU. Do I need to edit docker file?
Please, help.

The text was updated successfully, but these errors were encountered:

alexellis · 2022-02-01T21:12:08Z

Hi @HyeyeonKoo thanks for your interest in OpenFaaS

You will need to fill out the whole issue template and ideally provide us with a code example too.

https://raw.githubusercontent.com/openfaas/faas/master/.github/ISSUE_TEMPLATE.md

Let me know if you are going to do that or whether you want to close the issue.

If this is for your employer, feel free to reach out about commercial support.

Alex

HyeyeonKoo · 2022-02-04T02:27:16Z

Thank you for the answer. I try to fill the form again.

My actions before raising this issue

Followed the troubleshooting guide
Read/searched the docs
Searched past issues

I found an issue that similar to the one I was looking for : #639
In this issue, I found that using GPU with openfaas using nvidia-docker.
So, I install nvidia-docker, and GPU is successfully recognized in minikube's contatiner.

Expected Behaviour

I want to use GPU in OpenFaaS Funtion with pytorch.

Current Behaviour

GPU is not available using pytorch in OpenFaaS function.

Are you a GitHub Sponsor (Yes/No?)

Check at: https://github.com/sponsors/openfaas

Yes
No
I don't have enough skills to support yet. I try to learn now and someday I want to be one of sponsors.

List All Possible Solutions and Workarounds

Is there a chance to make the GPU available with changing function's DockerFile or YAML?

Which Solution Do You Recommend?

Steps to Reproduce (for bugs)

make function

$ mkdir gpu-is-available && cd gpu-is-available
$ faas-cli new --lang python3-debian gpu-is-available

gpu-is-available/requirements.txt

--find-links https://download.pytorch.org/whl/torch_stable.html
torch==1.10.1+cu113
numpy

gpu-is-available/handle.py

import torch

def handle(req):
    available = ""

    if torch.cuda.is_available():
        available = "GPU is available." + "\n" \
                    + "Device count is " + str(torch.cuda.device_count()) + "\n" \
                    + "Device name is " + torch.cuda.get_device_name(0)
    else:
        available = "GPU is not available. Sorry."

    return available

build, push, deploy

faas-cli up -f ./gpu-is-available.yml

invoke => GPU is not recognized

$ echo | faas-cli invoke gpu-is-available

GPU is not available. Sorry.

Context

I try to service deep learning model inference like sentiment analysis.

Your Environment

FaaS-CLI version ( Full output from: faas-cli version ): 0.14.1
Docker version docker version (e.g. Docker 17.0.05 ): 20.10.12
Are you using OpenFaaS on Kubernetes or faasd? Minikube 1.25.1
Operating System and version (e.g. Linux, Windows, MacOS): Ubuntu 20.04
Code example or link to GitHub repo or gist to reproduce problem: I demonstrate it above.
Other diagnostic information / logs from troubleshooting guide
I install nvidia-docker from fallowing this link
and,
I make minikube recognize gpu in cluster from fallowing this link. So, cluster recognize it well.

LucasRoesler · 2022-02-04T10:26:08Z

@HyeyeonKoo at least one approach to this is partially described here https://docs.openfaas.com/reference/profiles/#use-tolerations-and-affinity-to-separate-workloads

Using the profiles feature you can ensure that the function is running on a node with a GPU.

It might be possible with constraints but I think the profiles approach is more accurate

HyeyeonKoo · 2022-02-07T01:08:25Z

Thank you for answering.
I tried to follow the instruction, but got some problem.

first, when I create a profile with YAML, error occurred.

error: error validating "STDIN": error validating data: [ValidationError(Profile.spec.affinity.nodeAffinity.requiredDuringSchedulingIgnoredDuringExecution.nodeSelectorTerms[1]): unknown field "key" in com.openfaas.v1.Profile.spec.affinity.nodeAffinity.requiredDuringSchedulingIgnoredDuringExecution.nodeSelectorTerms, ValidationError(Profile.spec.affinity.nodeAffinity.requiredDuringSchedulingIgnoredDuringExecution.nodeSelectorTerms[1]): unknown field "operator" in com.openfaas.v1.Profile.spec.affinity.nodeAffinity.requiredDuringSchedulingIgnoredDuringExecution.nodeSelectorTerms, ValidationError(Profile.spec.affinity.nodeAffinity.requiredDuringSchedulingIgnoredDuringExecution.nodeSelectorTerms[1]): unknown field "values" in com.openfaas.v1.Profile.spec.affinity.nodeAffinity.requiredDuringSchedulingIgnoredDuringExecution.nodeSelectorTerms]; if you choose to ignore these errors, turn validation off with --validate=false

So, I apply it with --validate=false flag. After that, I found the pod's status is pending.

NAME                                   READY   STATUS    RESTARTS   AGE
...
gpu-is-available-66b976994c-qdxsv      0/1     Pending   0          24s
...

Can you give me some more advice with this problem, please?

LucasRoesler · 2022-02-07T08:24:55Z

It looks like a validation error, without seeing that profile yaml you used it is pretty hard to say for sure.

My guess from rereading the docs, is that it is a yaml indentation error

The docs have this (which is wrong)

kind: Profile
apiVersion: openfaas.com/v1
metadata:
  name: withgpu
  namespace: openfaas
spec:
    tolerations:
    - key: "gpu"
      operator: "Exists"
      effect: "NoSchedule"
    affinity:
      nodeAffinity:
        requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
            - key: gpu
              operator: In
              values:
              - installed

But it should probably be this

kind: Profile
apiVersion: openfaas.com/v1
metadata:
  name: withgpu
  namespace: openfaas
spec:
    tolerations:
    - key: "gpu"
      operator: "Exists"
      effect: "NoSchedule"
    affinity:
      nodeAffinity:
        requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: gpu
                operator: In
                values:
                  - installed

It would really help to know the output from kubectl version and how you installed openfaas (via helm or arkade, what flags or values you use etc) and finally that Profile you tried to you.

HyeyeonKoo · 2022-02-08T06:19:20Z

Thank you for an advice.

First of all, here is the kubectl version

Client Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.2", GitCommit:"9d142434e3af351a628bffee3939e64c681afa4d", GitTreeState:"clean", BuildDate:"2022-01-19T17:35:46Z", GoVersion:"go1.17.5", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.1", GitCommit:"86ec240af8cbd1b60bcc4c03c20da9b98005b92e", GitTreeState:"clean", BuildDate:"2021-12-16T11:34:54Z", GoVersion:"go1.17.5", Compiler:"gc", Platform:"linux/amd64"}

And, I install OpenFaaS with this guide which is using helm.

I checked this again.
When I simply run docker container docker run -it --rm --gpus all ubuntu nvidia-smi, then GPU is recognized.

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.39.01    Driver Version: 510.39.01    CUDA Version: 11.6     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  On   | 00000000:01:00.0 Off |                  N/A |
| 31%   33C    P8    14W / 250W |     15MiB / 11264MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      1404      G                                       9MiB |
|    0   N/A  N/A      1545      G                                       4MiB |
+-----------------------------------------------------------------------------+

On Minikube, also it works well kubectl get nodes "-o=custom-columns=NAME:.metadata.name,GPU:.status.allocatable.nvidia\.com/gpu" following this guide

NAME                       GPU
####                        1

I create profile with your advice, so the validation problem is gone. So I try to build, push, deploy function again. faas-cli up -f ./gpu-is-available.yml --gateway http://127.0.0.1:31112 --annotation com.openfaas.profile=withgpu
It deployed well. So, I checked that GPU is available in the function's container using docker exec -it [container id] /bin/bash and nvidia-smi, but it is not available.

I keep try to find a way. If you think any idea, please tell me.

LucasRoesler · 2022-02-08T08:36:13Z

If I am reading this correctly, then it seems like we need to set a Resource request on the function

https://kubernetes.io/docs/tasks/manage-gpus/scheduling-gpus/

But this isn't exposed via the OpenFaaS API right now. We might need to sync with @alexellis on this, but he is on vacation until next week.

I see two options

Add the two required resource fields to the Function spec
Extend the Profiles spec to add a section for Resources

HyeyeonKoo · 2022-03-16T23:56:32Z

Thank you, very much.

HyeyeonKoo closed this as completed Mar 16, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

using nvidia gpu. #1695

using nvidia gpu. #1695

HyeyeonKoo commented Jan 28, 2022 •

edited

Loading

alexellis commented Feb 1, 2022

HyeyeonKoo commented Feb 4, 2022

LucasRoesler commented Feb 4, 2022

HyeyeonKoo commented Feb 7, 2022

LucasRoesler commented Feb 7, 2022

HyeyeonKoo commented Feb 8, 2022

LucasRoesler commented Feb 8, 2022

HyeyeonKoo commented Mar 16, 2022

using nvidia gpu. #1695

using nvidia gpu. #1695

Comments

HyeyeonKoo commented Jan 28, 2022 • edited Loading

alexellis commented Feb 1, 2022

HyeyeonKoo commented Feb 4, 2022

My actions before raising this issue

Expected Behaviour

Current Behaviour

Are you a GitHub Sponsor (Yes/No?)

List All Possible Solutions and Workarounds

Which Solution Do You Recommend?

Steps to Reproduce (for bugs)

Context

Your Environment

LucasRoesler commented Feb 4, 2022

HyeyeonKoo commented Feb 7, 2022

LucasRoesler commented Feb 7, 2022

HyeyeonKoo commented Feb 8, 2022

LucasRoesler commented Feb 8, 2022

HyeyeonKoo commented Mar 16, 2022

HyeyeonKoo commented Jan 28, 2022 •

edited

Loading