Open
Description
Observed behavior
See #606
The flusher() task is stopping because InvalidStateError is raised when attempting to call .set_result() on the future
Expected behavior
The flusher task should never stop
Server and client version
client: nats.py 2.6.0
server: 2.10.7
Host environment
NATS server is running on docker image nats:2.10.7-alpine3.18
The client is a python quart application running behing hypercorn.
Steps to reproduce
I don't have steps to reproduce, but this happens sporadically in production, and I think could be easy to fix. This is what I think is happening.
- During a reconnect attempt, _attempt_reconnect() calls _flush_pending()
- The _flush_pending() task creates a Future and adds it to the _flush_queue
- _flush_pending waits on the future
- The reconnect fails and _attempt_reconnect() is cancelled.
- This cancels the _flush_pending task which cancels the Future that was created in step 2 (python cancels a future that is being awaited by a task when the task is cancelled).
- Now there is a Future in the _flush_queue that is in cancelled state
- The new reconnect attempt starts a new _flusher() task
- The _flusher() task pulls the cancelled Future out of the queue
- When the _flusher() task calls set_result() on the future, it results in an InvalidStateError exception because the future is already done
- The flusher task aborts
Possible fixes would be:
- _flusher() could ignore any cancelled futures in the _flush_queue
- clear the _flush_queue on reconnect (or create a new queue)