Skip to content

Flusher task is stopping due to InvalidStateError #624

Open
@debbyglance

Description

@debbyglance

Observed behavior

See #606
The flusher() task is stopping because InvalidStateError is raised when attempting to call .set_result() on the future

Expected behavior

The flusher task should never stop

Server and client version

client: nats.py 2.6.0
server: 2.10.7

Host environment

NATS server is running on docker image nats:2.10.7-alpine3.18
The client is a python quart application running behing hypercorn.

Steps to reproduce

I don't have steps to reproduce, but this happens sporadically in production, and I think could be easy to fix. This is what I think is happening.

  1. During a reconnect attempt, _attempt_reconnect() calls _flush_pending()
  2. The _flush_pending() task creates a Future and adds it to the _flush_queue
  3. _flush_pending waits on the future
  4. The reconnect fails and _attempt_reconnect() is cancelled.
  5. This cancels the _flush_pending task which cancels the Future that was created in step 2 (python cancels a future that is being awaited by a task when the task is cancelled).
  6. Now there is a Future in the _flush_queue that is in cancelled state
  7. The new reconnect attempt starts a new _flusher() task
  8. The _flusher() task pulls the cancelled Future out of the queue
  9. When the _flusher() task calls set_result() on the future, it results in an InvalidStateError exception because the future is already done
  10. The flusher task aborts

Possible fixes would be:

  • _flusher() could ignore any cancelled futures in the _flush_queue
  • clear the _flush_queue on reconnect (or create a new queue)

Metadata

Metadata

Assignees

Labels

defectSuspected defect such as a bug or regression

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions