KServeClient.jl

A Julia KServe client for ML model inference over gRPC. Supports any implementation of the official KServe protocol including:

Install

using Pkg
Pkg.add("KServeClient")

Basic Usage

For this example, we are going to call an image classification model on nVidia Triton with the following config.pbtxt:

name: "example_cnn_classifier"
platform: "pytorch_libtorch"
max_batch_size: 1
input [
  {
    name: "INPUT__0"
    data_type: TYPE_FP32
    dims: [ 224, 224 ]
  }
]
output [
  {
    name: "OUTPUT__0"
    data_type: TYPE_FP32
    dims: [ 1000 ]
  }
]
instance_group [
  {
    count: 1
    kind: KIND_GPU
    gpus: [ 0 ]
  }
]

To run inference against this model, setup a connection pool or client, define your model inputs, call the model, and choose which outputs to extract.

using KServeClient

# Create the pool with a single connection, effectively the same thing as not using a pool
kscp = KServeClientPool(1, "https://my-grpc-server:8001")

# Define the inputs using native Julia types
input__0 = InferInput("INPUT__0", zeros(Float32, 1, 224, 224))

# Call inference (blocking)
response = ModelInfer(kscp, "example_cnn_classifier", [input__0])

# Get the output
output__0 = InferOutput(
    "OUTPUT__0",
    response
)

@assert size(output__0) == (1, 1000)
@assert eltype(output__0) == Float32

In order to achieve maximum throughput you will need to use concurrency. Currently there is an upstream issue where using non-pinning concurrency where results in connections being dropped, so use @async until we can get it fixed. You can use a @spawn one level above the async to prevent the parent task from being pinned to a single Julia thread if you are using threading.

using KServeClient

# This time create a connection pool with 8 connections to take advantage of async
kscp = KServeClientPool(8, "https://my-grpc-server:8001")

N = 256
inp = zeros(Float32, N, 224, 224)
@sync begin
# The @spawn here avoids pinning the parent thread to a single task
@spawn begin
    for i in 1:N
        let i=i
            @async begin
                input__0 = InferInput("INPUT__0", inp[i:i, :, :])
                response = ModelInfer(kscp, "example_cnn_classifier", [input__0])
                output__0 = InferOutput(
                    "OUTPUT__0",
                    response
                )
                # Do something with the output
            end
        end
    end
end
end

Known Issues

curl_multi_socket_action: 8 deadlock

This is caused by a bug in Downloads.jl: Issue

See the open pull request for a workaround until it is merged.

HTTP/2 Multiplexing

Currently its not recomended to use one client for multiple requests as things seem to randomly hang or get into a bad state. An upstream fix for this is planned.

Non @async concurrency

There is currently some instability using threads, ie non-pinning concurrency with a number of Julia threads > 1. For now just use @async but an upstream fix for this is planned.

Development Sponsored by Medical Metrics Inc.

An experienced, full service, global imaging services and solutions company, Medical Metrics, Inc. (MMI) delivers independent, high-quality image analysis you can trust.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
proto		proto
src		src
test		test
LICENCE		LICENCE
Manifest.toml		Manifest.toml
Project.toml		Project.toml
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

KServeClient.jl

Install

Basic Usage

Known Issues

curl_multi_socket_action: 8 deadlock

HTTP/2 Multiplexing

Non @async concurrency

Development Sponsored by Medical Metrics Inc.

About

Uh oh!

Releases

Packages

Languages

License

csvance/KServeClient.jl

Folders and files

Latest commit

History

Repository files navigation

KServeClient.jl

Install

Basic Usage

Known Issues

curl_multi_socket_action: 8 deadlock

HTTP/2 Multiplexing

Non @async concurrency

Development Sponsored by Medical Metrics Inc.

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages