Description
Implementation details
We could add an additional function to the Predictor API (e.g. get()
or handle_get()
), which would be called when a GET request is sent to an API endpoint.
- Check whether we enforce for the presence of the constructor for the task api.
Questions
-
Which HTTP methods should be supported? (GET/PUT/DELETE etc.)
- All RESTful verbs. POST/GET/PUT/PATCH/DELETE.
-
What is the naming scheme of the methods?
- "handle_".
-
What about the name of the method for gRPC?
- "handle"/"run". We can go with handle, but if it's easy, we should allow for the user to specify any number of service methods (time-capped@.25).
-
How do we still name the methods for Batch/Async?
- Short-path names: "handle_batch"/"handle_async".
- Long-path names: from cortex.handler import Realtime. And Realtime class (like others) will enforce some methods to be defined. Realtime is an abstract class with default implementations for each verb method.
-
If / how would they affect autoscaling? Assuming the user does inexpensive tasks within
get()
, this could be ignored for autoscaling purposes. Otherwise, it should be counted as an in-flight request.- We can either count it as an in-flight request for simplicity reasons or we can enable decorators for the user to toggle the counting or not.
- There's a risk in making GET countable that will lead to unreasonable scale-ups. This can be caused by the user's constant API gets (using the CLI/Python client).
- For now, everything is in the queue and everything is counted as in-flight requests.
-
Should the GET request bypass the request queue, and immediately execute in an API thread?
- No. For simplicity reasons, let's use the same thread pool for all verbs.
-
If / how should this be recorded in the metrics, and shown on the metrics dashboard?
- In Grafana we would still show the same metrics, but only for the main verb (POST). Not sure what we can do about gRPC in this case.
- We could be counting the requests (in cortex get ) by verb name. We'd need to add the verb as another metric dimension.
- We can add another field in the "predictor" section (like "tracked_verb") to indicate which verb needs to be tracked (in Grafana and cortex get). A corollary is that counting the in-flight requests for all other verbs no longer seems necessary.
-
How will the latency reporting be affected in cortex get?
- Latencies would be grouped by verb name.
- Or do nothing if we add the "tracked_verb" field.
- Let's aggregate them instead (aka do nothing).