Closed
Description
Description
https://grpc.io/docs/guides/concepts/
https://blog.feathersjs.com/http-vs-websockets-a-performance-comparison-da2533f13a77
- For best performance, the protocol would be customizable in the API spec (REST vs gRPC). A field called
protocol
would be added to thepredictor
section. - User is responsible for providing a protobuf file in the API spec as well. Cortex would automatically generate the server files from the protobuf.
- Both REST and gRPC cannot be served for the same API.
- Istio doesn't have to be changed - it's got native support for HTTP 2.0. We probably need to change the service mode of APIs to headless mode.
- The protobuf would only support a single method/service -
predict
. Its input/output is defined by the user. We don't provide a default protobuf.
Example of python predictor for gRPC:
# UserProvidedServicer provides an implementation of the methods of the RouteGuide service.
class PythonPredictor(user_pb2_grpc.UserServicer):
def start(self, config):
self.config = config
def predict(self, request_iterator, context):
prev_notes = []
for new_note in request_iterator:
for prev_note in prev_notes:
if prev_note.location == new_note.location:
yield prev_note
prev_notes.append(new_note)
We don't necessarily need (or can) to ask the user to subclass the generated server files - we can wrap this class instead. This way, the class could look like
class PythonPredictor:
def __init__(self, config):
self.config = config
def predict(self, request_iterator, context):
prev_notes = []
for new_note in request_iterator:
for prev_note in prev_notes:
if prev_note.location == new_note.location:
yield prev_note
prev_notes.append(new_note)
Creating a gRPC server is straightforward. We'll need multiple servers as dictated by processes_per_replica
field.
def serve():
server = grpc.server(futures.ThreadPoolExecutor(max_workers=10))
user_pb2_grpc.add_UserServices_to_server(
PythonPredictor(), server)
server.add_insecure_port('[::]:50051')
server.start()
server.wait_for_termination()
Example API spec:
- name: iris-classifier
kind: RealtimeAPI
predictor:
type: python
path: predictor.py
protocol: grpc/rest
protobuf: predictor.proto
compute:
cpu: 0.2
mem: 200M
To parse proto files in Go: https://github.com/tallstoat/pbparser
Motivation
Because it has lower latency and higher throughput than REST. It's already well known within the community:
Additional context
Maybe use something like Linkerd (maybe this isn't needed) for load-balancing gRPC requests: