Add gRPC support to RealtimeAPI kind

#### Description

https://grpc.io/docs/guides/concepts/
https://blog.feathersjs.com/http-vs-websockets-a-performance-comparison-da2533f13a77

1. For best performance, the protocol would be customizable in the API spec (REST vs gRPC). A field called `protocol` would be added to the `predictor` section.
1. User is responsible for providing a protobuf file in the API spec as well. Cortex would automatically generate the server files from the protobuf.
1. Both REST and gRPC cannot be served for the same API.
1. Istio doesn't have to be changed - it's got native support for HTTP 2.0. We probably need to change the service mode of APIs to headless mode.
1. The protobuf would only support a single method/service - `predict`. Its input/output is defined by the user. We don't provide a default protobuf.

Example of python predictor for gRPC:

```python
# UserProvidedServicer provides an implementation of the methods of the RouteGuide service.
class PythonPredictor(user_pb2_grpc.UserServicer):
  def start(self, config):
    self.config = config
  def predict(self, request_iterator, context):
    prev_notes = []
    for new_note in request_iterator:
      for prev_note in prev_notes:
        if prev_note.location == new_note.location:
          yield prev_note
      prev_notes.append(new_note)
```
We don't necessarily need (or can) to ask the user to subclass the generated server files - we can wrap this class instead. This way, the class could look like

```python
class PythonPredictor:
  def __init__(self, config):
    self.config = config
  def predict(self, request_iterator, context):
    prev_notes = []
    for new_note in request_iterator:
      for prev_note in prev_notes:
        if prev_note.location == new_note.location:
          yield prev_note
      prev_notes.append(new_note)
```

Creating a gRPC server is straightforward. We'll need multiple servers as dictated by `processes_per_replica` field.

```python
def serve():
  server = grpc.server(futures.ThreadPoolExecutor(max_workers=10))
  user_pb2_grpc.add_UserServices_to_server(
      PythonPredictor(), server)
  server.add_insecure_port('[::]:50051')
  server.start()
  server.wait_for_termination()
```

Example API spec:

```yaml
- name: iris-classifier
  kind: RealtimeAPI
  predictor:
    type: python
    path: predictor.py
    protocol: grpc/rest
    protobuf: predictor.proto
  compute:
    cpu: 0.2
    mem: 200M
```

To parse proto files in Go: https://github.com/tallstoat/pbparser

#### Motivation

Because it has lower latency and higher throughput than REST. It's already well known within the community:
* https://medium.com/@EmperorRXF/evaluating-performance-of-rest-vs-grpc-1b8bdf0b22da

#### Additional context

Maybe use something like Linkerd *(maybe this isn't needed)* for load-balancing gRPC requests:
* https://kubernetes.io/blog/2018/11/07/grpc-load-balancing-on-kubernetes-without-tears/


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add gRPC support to RealtimeAPI kind #1056

Description

Motivation

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add gRPC support to RealtimeAPI kind #1056

Description

Description

Motivation

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions