Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

using nvidia gpu. #1695

Closed
HyeyeonKoo opened this issue Jan 28, 2022 · 8 comments
Closed

using nvidia gpu. #1695

HyeyeonKoo opened this issue Jan 28, 2022 · 8 comments

Comments

@HyeyeonKoo
Copy link

HyeyeonKoo commented Jan 28, 2022

Hello, Thank you for the nice work.
I am trying to use OpenFaaS with GPU. But it is not easy.
So, I will be very appreciate if you give me some advices.

Environments

  • FaaS-CLI version ( Full output from: faas-cli version ): 0.14.1
  • Docker version docker version (e.g. Docker 17.0.05 ): 20.10.12
  • Are you using OpenFaaS on Kubernetes or faasd? Minikube 1.25.1
  • Operating System and version (e.g. Linux, Windows, MacOS): Ubuntu 20.04
  • plus,
    I install nvidia-docker from fallowing this link
    and,
    I make minikube recognize gpu in cluster from fallowing this link. So, cluster recognize it well.
$ kubectl get nodes "-o=custom-columns=NAME:.metadata.name,GPU:.status.allocatable.nvidia\.com/gpu"
NAME                       GPU
[SYSTEM_NAME]         1

and, also working well in nvidia/cuda container.

But, It didn't work in OpenFaaS funtion. How can I OpenFaaS Function recognize GPU. Do I need to edit docker file?
Please, help.

@alexellis
Copy link
Member

Hi @HyeyeonKoo thanks for your interest in OpenFaaS

You will need to fill out the whole issue template and ideally provide us with a code example too.

https://raw.githubusercontent.com/openfaas/faas/master/.github/ISSUE_TEMPLATE.md

Let me know if you are going to do that or whether you want to close the issue.

If this is for your employer, feel free to reach out about commercial support.

Alex

@HyeyeonKoo
Copy link
Author

Thank you for the answer. I try to fill the form again.

My actions before raising this issue

I found an issue that similar to the one I was looking for : #639
In this issue, I found that using GPU with openfaas using nvidia-docker.
So, I install nvidia-docker, and GPU is successfully recognized in minikube's contatiner.

Expected Behaviour

I want to use GPU in OpenFaaS Funtion with pytorch.

Current Behaviour

GPU is not available using pytorch in OpenFaaS function.

Are you a GitHub Sponsor (Yes/No?)

Check at: https://github.com/sponsors/openfaas

  • Yes
  • No
    I don't have enough skills to support yet. I try to learn now and someday I want to be one of sponsors.

List All Possible Solutions and Workarounds

Is there a chance to make the GPU available with changing function's DockerFile or YAML?

Which Solution Do You Recommend?

Steps to Reproduce (for bugs)

  1. make function
$ mkdir gpu-is-available && cd gpu-is-available
$ faas-cli new --lang python3-debian gpu-is-available 
  1. gpu-is-available/requirements.txt
--find-links https://download.pytorch.org/whl/torch_stable.html
torch==1.10.1+cu113
numpy
  1. gpu-is-available/handle.py
import torch

def handle(req):
    available = ""

    if torch.cuda.is_available():
        available = "GPU is available." + "\n" \
                    + "Device count is " + str(torch.cuda.device_count()) + "\n" \
                    + "Device name is " + torch.cuda.get_device_name(0)
    else:
        available = "GPU is not available. Sorry."

    return available
  1. build, push, deploy
faas-cli up -f ./gpu-is-available.yml
  1. invoke => GPU is not recognized
$ echo | faas-cli invoke gpu-is-available
GPU is not available. Sorry.

Context

I try to service deep learning model inference like sentiment analysis.

Your Environment

  • FaaS-CLI version ( Full output from: faas-cli version ): 0.14.1

  • Docker version docker version (e.g. Docker 17.0.05 ): 20.10.12

  • Are you using OpenFaaS on Kubernetes or faasd? Minikube 1.25.1

  • Operating System and version (e.g. Linux, Windows, MacOS): Ubuntu 20.04

  • Code example or link to GitHub repo or gist to reproduce problem: I demonstrate it above.

  • Other diagnostic information / logs from troubleshooting guide
    I install nvidia-docker from fallowing this link
    and,
    I make minikube recognize gpu in cluster from fallowing this link. So, cluster recognize it well.

@LucasRoesler
Copy link
Member

@HyeyeonKoo at least one approach to this is partially described here https://docs.openfaas.com/reference/profiles/#use-tolerations-and-affinity-to-separate-workloads

Using the profiles feature you can ensure that the function is running on a node with a GPU.

It might be possible with constraints but I think the profiles approach is more accurate

@HyeyeonKoo
Copy link
Author

Thank you for answering.
I tried to follow the instruction, but got some problem.

first, when I create a profile with YAML, error occurred.

error: error validating "STDIN": error validating data: [ValidationError(Profile.spec.affinity.nodeAffinity.requiredDuringSchedulingIgnoredDuringExecution.nodeSelectorTerms[1]): unknown field "key" in com.openfaas.v1.Profile.spec.affinity.nodeAffinity.requiredDuringSchedulingIgnoredDuringExecution.nodeSelectorTerms, ValidationError(Profile.spec.affinity.nodeAffinity.requiredDuringSchedulingIgnoredDuringExecution.nodeSelectorTerms[1]): unknown field "operator" in com.openfaas.v1.Profile.spec.affinity.nodeAffinity.requiredDuringSchedulingIgnoredDuringExecution.nodeSelectorTerms, ValidationError(Profile.spec.affinity.nodeAffinity.requiredDuringSchedulingIgnoredDuringExecution.nodeSelectorTerms[1]): unknown field "values" in com.openfaas.v1.Profile.spec.affinity.nodeAffinity.requiredDuringSchedulingIgnoredDuringExecution.nodeSelectorTerms]; if you choose to ignore these errors, turn validation off with --validate=false

So, I apply it with --validate=false flag. After that, I found the pod's status is pending.

NAME                                   READY   STATUS    RESTARTS   AGE
...
gpu-is-available-66b976994c-qdxsv      0/1     Pending   0          24s
...

Can you give me some more advice with this problem, please?

@LucasRoesler
Copy link
Member

It looks like a validation error, without seeing that profile yaml you used it is pretty hard to say for sure.

My guess from rereading the docs, is that it is a yaml indentation error

The docs have this (which is wrong)

kind: Profile
apiVersion: openfaas.com/v1
metadata:
  name: withgpu
  namespace: openfaas
spec:
    tolerations:
    - key: "gpu"
      operator: "Exists"
      effect: "NoSchedule"
    affinity:
      nodeAffinity:
        requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
            - key: gpu
              operator: In
              values:
              - installed

But it should probably be this

kind: Profile
apiVersion: openfaas.com/v1
metadata:
  name: withgpu
  namespace: openfaas
spec:
    tolerations:
    - key: "gpu"
      operator: "Exists"
      effect: "NoSchedule"
    affinity:
      nodeAffinity:
        requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: gpu
                operator: In
                values:
                  - installed

It would really help to know the output from kubectl version and how you installed openfaas (via helm or arkade, what flags or values you use etc) and finally that Profile you tried to you.

@HyeyeonKoo
Copy link
Author

Thank you for an advice.

First of all, here is the kubectl version

Client Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.2", GitCommit:"9d142434e3af351a628bffee3939e64c681afa4d", GitTreeState:"clean", BuildDate:"2022-01-19T17:35:46Z", GoVersion:"go1.17.5", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.1", GitCommit:"86ec240af8cbd1b60bcc4c03c20da9b98005b92e", GitTreeState:"clean", BuildDate:"2021-12-16T11:34:54Z", GoVersion:"go1.17.5", Compiler:"gc", Platform:"linux/amd64"}

And, I install OpenFaaS with this guide which is using helm.

I checked this again.
When I simply run docker container docker run -it --rm --gpus all ubuntu nvidia-smi, then GPU is recognized.

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.39.01    Driver Version: 510.39.01    CUDA Version: 11.6     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  On   | 00000000:01:00.0 Off |                  N/A |
| 31%   33C    P8    14W / 250W |     15MiB / 11264MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      1404      G                                       9MiB |
|    0   N/A  N/A      1545      G                                       4MiB |
+-----------------------------------------------------------------------------+

On Minikube, also it works well kubectl get nodes "-o=custom-columns=NAME:.metadata.name,GPU:.status.allocatable.nvidia\.com/gpu" following this guide

NAME                       GPU
####                        1

I create profile with your advice, so the validation problem is gone. So I try to build, push, deploy function again. faas-cli up -f ./gpu-is-available.yml --gateway http://127.0.0.1:31112 --annotation com.openfaas.profile=withgpu
It deployed well. So, I checked that GPU is available in the function's container using docker exec -it [container id] /bin/bash and nvidia-smi, but it is not available.

I keep try to find a way. If you think any idea, please tell me.

@LucasRoesler
Copy link
Member

If I am reading this correctly, then it seems like we need to set a Resource request on the function

https://kubernetes.io/docs/tasks/manage-gpus/scheduling-gpus/

But this isn't exposed via the OpenFaaS API right now. We might need to sync with @alexellis on this, but he is on vacation until next week.

I see two options

  1. Add the two required resource fields to the Function spec
  2. Extend the Profiles spec to add a section for Resources

@HyeyeonKoo
Copy link
Author

Thank you, very much.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants