Preventing or handling of service deadlocks #199
Labels
api-devex
Restate user facing Development model
needs-refinement
Issues that need further investigation
ops-ex
Restate operational experience
semantics
System semantics and behaviour
Problem
With Restate's current model where we execute calls to a service instance sequentially and allow a service to await other services while keeping the lock, users can build applications that deadlock. Examples are the following:
A
andB
which call each other (directly or indirectly) on the same key and await the resultThe underlying problem is that we allow awaiting on a future result while holding the lock of a service instance.
Solution ideas
The current solution ideas fall into two categories: deadlock detection and prevention.
Deadlock detection
The idea would be to detect cyclic dependencies between service instances and to notify the user about it. Since a cyclic dependency can be formed by two service instances calling each other (potentially originating from different partitions), the detection algorithm will probably require some algorithm that retrieves this information from different partitions. This entails that the detection can probably not be a local decision.
Resolving the deadlock via undoing changes
Once a deadlock is detected, Restate needs to provide a way to resolve it. Ideally, the user could select an invocation which the runtime can cancel and whose changes will be rolled back so that the state stays consistent. It is unclear at this point in time how Restate could roll back state changes in the general case (e.g. assume that the service changed some state of another service
x
and that other calls were executed onx
afterwards).Deadlock prevention
If Restate's programming model wouldn't allow the creation of deadlocks, then there would be no need for a deadlock detection algorithm. Potential ideas could be the following:
Directed acyclic call graphs
The problem of deadlocks only arises if there exists a cyclic call graph. Hence, one idea to prevent this from happening is to ask the user to define a hierarchy of services where a service in level
x
is only allowed to call and await services in levely
withy > x
. That way, it would not be possible to create cycles.Such a hierarchy would only be needed for keyed and singleton services since unkeyed services cannot be called again.
How to define a service hierarchy?
One idea for defining the service hierarchy is t ask the user to provide this information when registering a service. She could provide the level as part of the service registration. This would entail that the user has a global understanding of all the services that call her service and the services that her service will call.
Optimistic concurrency control
Since the deadlock situation arises when a service awaits another service while holding a lock, we could prevent it by disallowing awaiting another service while holding the lock. This could be done by introducing a new service type "sequentially executed keyed service" which cannot await. This means that such a service type will only be allowed to send non-blocking background calls. Additionally, calls to a keyed/singleton service instance will be run in parallel (holding no service instance lock anymore) using optimistic concurrency control for resolving conflicts. This means that the user needs to explicitly handle write conflicts by redoing/merging its changes.
The text was updated successfully, but these errors were encountered: