Skip to content

Commit df1d8d0

Browse files
committed
unconditionally mark predictions as cancelled without waiting for cancellation to succeed
while I struggle with #1786, we see queue spikes because not responding to cancellation promptly causes the pod to get restarted. this is a dirty hack to pretend like cancellation works immediately. as soon as we fix the race condition (and possibly any issues with task.cancel() behaving differently from signal handlers), we can drop this. Signed-off-by: technillogue <technillogue@gmail.com>
1 parent 8d834f0 commit df1d8d0

File tree

1 file changed

+12
-1
lines changed

1 file changed

+12
-1
lines changed

python/cog/server/runner.py

Lines changed: 12 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -122,7 +122,9 @@ def __init__(
122122
self._shutdown_event = shutdown_event # __main__ waits for this event
123123

124124
self._upload_url = upload_url
125-
self._predictions: dict[str, tuple[schema.PredictionResponse, PredictionTask]] = {}
125+
self._predictions: dict[
126+
str, tuple[schema.PredictionResponse, PredictionTask]
127+
] = {}
126128
self._predictions_in_flight: set[str] = set()
127129
# it would be lovely to merge these but it's not fully clear how best to handle it
128130
# since idempotent requests can kinda come whenever?
@@ -402,6 +404,15 @@ def cancel(self, prediction_id: str) -> None:
402404
self._events.send(Cancel(prediction_id))
403405
# maybe this should probably check self._semaphore._value == self._concurrent
404406

407+
# HACK: sometimes, Done events may be dropped due to a race condition with logging
408+
# (or possibly because asyncio cancellation is less strict than signal handlers)
409+
# while we're fixing that, here's a bodge so that we always promptly respond
410+
# to cancellation requests so that director doesn't kill us.
411+
#
412+
# if the real Done event comes through, it shouldn't cause any problems
413+
# it will just stay in an ignored queue in the mux
414+
asyncio.create_task(self._mux.write(prediction_id, Done(canceled=True)))
415+
405416
_read_events_task: "Optional[asyncio.Task[None]]" = None
406417

407418
def _start_event_reader(self) -> None:

0 commit comments

Comments
 (0)