Skip to content

Inference API docs should document the binary request/response format. #2407

Open
@jfmatthews

Description

@jfmatthews

📚 The doc issue

The Prediction API docs point to the gRPC definition of that interface. But that proto does not contain any meaningful information about the request or response format: the request is defined as map<string, bytes>, and the response is simply bytes, with no informative comments. Nor are the given examples particularly conducive to understanding - the curl commands only cover the case where the input is an image, with no further explanation or hints on how to generalize to other use cases. This makes building API-based client integrations a matter of mostly trial and error.

At minimum, the proto file should include a description of the expected binary format - is it JSON? some binary format? is the answer model-dependent? Ideally, this would also come with a link to a reference implementation (and even more ideally, this reference implementation would be generated by the proto compiler itself; see companion issue #2406).

The following topics should be covered:

  • Binary format for input data/tensors
  • Response format for single-output (e.g., regression) models
  • Response format for multi-output models
  • Request/response format for batched inference

Suggest a potential alternative/fix

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    documentationImprovements or additions to documentationgrpc

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions