-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Open
Labels
enhancementNew feature or requestNew feature or requestquestionFurther information is requestedFurther information is requested
Description
I have the following echo
-like models/modelA/1/model.py
.
How can I call it from a command-line using curl
?
curl -i -X POST localhost:8000/api/infer/modelA/1 -H "Content-Type: application/octet-stream" -H 'NV-InferRequest:batch_size: 1 input { name: "INPUT0" } output { name: "OUTPUT0" }' --data 'hello'
gives HTTP 400.
I think I am not formatting the --data
argument correctly. But it should not be very difficult for UTF-8 encoding, right?
Model code:
import json
import numpy as np
import triton_python_backend_utils as pb_utils
class TritonPythonModel:
@staticmethod
def auto_complete_config(auto_complete_model_config):
auto_complete_model_config.add_input( {"name": "INPUT0", "data_type": "TYPE_UINT8", "dims": [-1]})
auto_complete_model_config.add_output({"name": "OUTPUT0", "data_type": "TYPE_UINT8", "dims": [-1]})
auto_complete_model_config.set_max_batch_size(0)
return auto_complete_model_config
def execute(self, requests):
responses = []
for request in requests:
in_numpy = pb_utils.get_input_tensor_by_name(request, 'INPUT0').as_numpy()
in_str = str(bytes(in_numpy), 'utf8')
out_str = 'modelA:' + in_str
out_numpy = np.frombuffer(bytes(out_str, 'utf8'), dtype = np.uint8)
out_pb = pb_utils.Tensor('OUTPUT0', out_numpy)
responses.append(pb_utils.InferenceResponse(output_tensors = [out_pb]))
return responses
Currently I can successfully call it using:
import numpy as np
import tritonclient.http as httpclient
triton_client = httpclient.InferenceServerClient("localhost:8000")
model_name = 'modelA'
input_arr = np.frombuffer(bytes('hello', 'utf8'), dtype = np.uint8)
inputs = [httpclient.InferInput("INPUT0", input_arr.shape, "UINT8")]
inputs[0].set_data_from_numpy(input_arr, binary_data=True)
res = triton_client.infer(model_name=model_name, inputs=inputs)
output_arr = res.as_numpy('OUTPUT0')
output_str = str(bytes(output_arr), 'utf8')
print(output_str)
but I would like to use curl
as it's simpler for some demonstration purposes and might be simpler for debugging string-processing models
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or requestquestionFurther information is requestedFurther information is requested