Description
📚 The doc issue
The Prediction API docs point to the gRPC definition of that interface. But that proto does not contain any meaningful information about the request or response format: the request is defined as map<string, bytes>
, and the response is simply bytes
, with no informative comments. Nor are the given examples particularly conducive to understanding - the curl
commands only cover the case where the input is an image, with no further explanation or hints on how to generalize to other use cases. This makes building API-based client integrations a matter of mostly trial and error.
At minimum, the proto file should include a description of the expected binary format - is it JSON? some binary format? is the answer model-dependent? Ideally, this would also come with a link to a reference implementation (and even more ideally, this reference implementation would be generated by the proto compiler itself; see companion issue #2406).
The following topics should be covered:
- Binary format for input data/tensors
- Response format for single-output (e.g., regression) models
- Response format for multi-output models
- Request/response format for batched inference
Suggest a potential alternative/fix
No response