Skip to content

[docs] A complete example of a basic string processing model and of calling it of invoking it via curl #6337

@vadimkantorov

Description

@vadimkantorov

I have the following echo-like models/modelA/1/model.py.

How can I call it from a command-line using curl?

curl -i -X POST localhost:8000/api/infer/modelA/1 -H "Content-Type: application/octet-stream" -H 'NV-InferRequest:batch_size: 1 input { name: "INPUT0" } output { name: "OUTPUT0" }' --data 'hello' gives HTTP 400.

I think I am not formatting the --data argument correctly. But it should not be very difficult for UTF-8 encoding, right?

Model code:

import json
import numpy as np
import triton_python_backend_utils as pb_utils

class TritonPythonModel:
    @staticmethod
    def auto_complete_config(auto_complete_model_config):
        auto_complete_model_config.add_input( {"name": "INPUT0",  "data_type": "TYPE_UINT8", "dims": [-1]})
        auto_complete_model_config.add_output({"name": "OUTPUT0", "data_type": "TYPE_UINT8", "dims": [-1]})
        auto_complete_model_config.set_max_batch_size(0)
        return auto_complete_model_config

    def execute(self, requests):
        responses = []
        for request in requests:
            in_numpy = pb_utils.get_input_tensor_by_name(request, 'INPUT0').as_numpy()
            in_str = str(bytes(in_numpy), 'utf8')
            
            out_str = 'modelA:' + in_str
            out_numpy = np.frombuffer(bytes(out_str, 'utf8'), dtype = np.uint8)
            out_pb = pb_utils.Tensor('OUTPUT0', out_numpy)

            responses.append(pb_utils.InferenceResponse(output_tensors = [out_pb]))
        return responses

Currently I can successfully call it using:

import numpy as np
import tritonclient.http as httpclient

triton_client = httpclient.InferenceServerClient("localhost:8000")
model_name = 'modelA'

input_arr = np.frombuffer(bytes('hello', 'utf8'), dtype = np.uint8)
inputs = [httpclient.InferInput("INPUT0", input_arr.shape, "UINT8")]
inputs[0].set_data_from_numpy(input_arr, binary_data=True)

res = triton_client.infer(model_name=model_name, inputs=inputs)

output_arr = res.as_numpy('OUTPUT0')
output_str = str(bytes(output_arr), 'utf8')

print(output_str)

but I would like to use curl as it's simpler for some demonstration purposes and might be simpler for debugging string-processing models

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestquestionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions