Skip to content

Need GPU (cuda) access while deploying the model #30554

Open
@rishavmandal771

Description

I need assistance with deploying a pre-trained model. I have created a custom score.py file for the deployment process. However, the docker created on the CPU instance does not provide access to the GPU, which poses a problem for predicting with PyTorch or TensorFlow models as they require input to be converted to tensors loaded on the GPU. Can you suggest a solution?

My score.py script -

import something

# original = torch.load


# def load(*args):
#     return torch.load(*args, map_location=torch.device("cpu"),pickle_module=None)


# def init():
#     global model
#     model_path = os.path.join(os.getenv("AZUREML_MODEL_DIR"), "use-case1-model")
#     # "model" is the path of the mlflow artifacts when the model was registered. For automl
#     # models, this is generally "mlflow-model".

#     with mock.patch("torch.load", load):
#         model = mlflow.pyfunc.load_model(model_path)

#     logging.info("Init complete")

def init():
    global model

    model_path = os.path.join(os.getenv("AZUREML_MODEL_DIR"), "use-case1-model")

    model = mlflow.pytorch.load_model(model_path, map_location=torch.device('cpu'))
    logging.info("Init complete")


tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")
            

def run(data):

    json_data = json.loads(data) 

    title = json_data["input_data"]["title"]
    att = json_data["input_data"]["attributes"]
    
    result = {}

    for i in range(len(title)):

        my_dict = {}
        for j in range(len(att)):
            
            attr = att[i][j]

            t, a = nobert4token(tokenizer, title[i].lower(), attr)

            x = X_padding(t)
            y = tag_padding(a)

            tensor_a = torch.tensor(y, dtype=torch.int32)
            tensor_a = torch.unsqueeze(tensor_a, dim=0).to("cuda")

            tensor_t = torch.tensor(x, dtype=torch.int32)
            tensor_t = torch.unsqueeze(tensor_t, dim=0).to("cuda")

            output = model([tensor_t, tensor_a])

            predict_list = output.tolist()[0]
            
            my_dict[attr] = " ".join(words_p)

        result[title[i]] = my_dict


    return result

My invoke script-

ml_client.online_endpoints.invoke(
    endpoint_name=endpoint_result.name,
    deployment_name=green_deployment_uc1.name,
    request_file=os.path.join("./dependencies", "sample.json"),
)

My conda.yaml-

channels:
  - conda-forge
dependencies:
  - python=3.8
  - pip=22.1.2
  - numpy=1.21.2
  - scikit-learn=0.24.2
  - scipy=1.7.1
  - 'pandas>=1.1,<1.2'
  - pytorch=1.10.0
  - pip:
      - 'inference-schema[numpy-support]==1.5.0'
      - xlrd==2.0.1
      - mlflow== 1.26.1
      - azureml-mlflow==1.42.0
      - tqdm==4.63.0
      - pytorch-transformers==1.2.0
      - pytorch-lightning==2.0.2
      - seqeval==1.2.2
      - azureml-inference-server-http==0.8.0
name: model-env

Error that I am getting -

127.0.0.1 - - [29/May/2023:10:03:32 +0000] "GET / HTTP/1.0" 200 7 "-" "kube-probe/1.18"
2023-05-29 10:03:34,291 E [70] azmlinfsrv - Encountered Exception: Traceback (most recent call last):
  File "/azureml-envs/azureml_d587e0800be72e17d773ddca63762cd1/lib/python3.8/site-packages/azureml_inference_server_http/server/user_script.py", line 130, in invoke_run
    run_output = self._wrapped_user_run(**run_parameters, request_headers=dict(request.headers))
  File "/azureml-envs/azureml_d587e0800be72e17d773ddca63762cd1/lib/python3.8/site-packages/azureml_inference_server_http/server/user_script.py", line 154, in <lambda>
    self._wrapped_user_run = lambda request_headers, **kwargs: self._user_run(**kwargs)
  File "/var/azureml-app/dependencies/score.py", line 129, in run
    tensor_a = torch.unsqueeze(tensor_a, dim=0).to("cuda")
  File "/azureml-envs/azureml_d587e0800be72e17d773ddca63762cd1/lib/python3.8/site-packages/torch/cuda/__init__.py", line 247, in _lazy_init
    torch._C._cuda_init()
RuntimeError: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx

The above exception was the direct cause of the following exception:

If you think why I used "model = mlflow.pytorch.load_model(model_path, map_location=torch.device('cpu'))"

please refer to this forum- https://learn.microsoft.com/en-us/answers/questions/1291498/facing-problem-while-deploying-model-on-azure-ml-a

Documentation - https://learn.microsoft.com/en-us/azure/machine-learning/how-to-deploy-mlflow-models-online-endpoints?view=azureml-api-2&tabs=sdk

Metadata

Assignees

Labels

ClientThis issue points to a problem in the data-plane of the library.Machine LearningService AttentionWorkflow: This issue is responsible by Azure service team.customer-reportedIssues that are reported by GitHub users external to the Azure organization.needs-team-attentionWorkflow: This issue needs attention from Azure service team or SDK teamquestionThe issue doesn't require a change to the product in order to be resolved. Most issues start as that

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions