Skip to content

Asyncio Race Condition Leading to Infinite Loop #2001

Open
@TheTechromancer

Description

@TheTechromancer

This is a pyzmq bug

  • This is a pyzmq-specific bug, not an issue of zmq socket behavior. Don't worry if you're not sure! We'll figure it out together.

What pyzmq version?

26.0.3

What libzmq version?

4.3.5

Python version (and how it was installed)

Python 3.9 via apt

OS

Debian

What happened?

Recently I've run into a bug in cpython that directly affects ZMQ. It triggers whenever asyncio debugging is enabled, and the ZMQ future blocks for more than .1 second:

https://github.com/python/cpython/blob/7c2921844f9fa713f93152bf3a569812cee347a0/Lib/asyncio/base_events.py#L2021-L2023

The bug is due to an unintended recursion that happens when repr() is called on an asyncio task. The recursion is caused by ZMQ's future storing references to other futures including itself, which creates a circular reference. However, because each new layer of recursion must iterate over multiple futures, a RecursionError is never reached, and instead it results in a deadlock where the CPU is stuck at 100%:

image

This is mainly a bug in cpython, and was fixed in 3.11. However, 3.10 and earlier are still vulnerable to this bug, and based on the feedback from the cpython issue, the fix will not be back ported to those older versions.

python/cpython#122296

I'm creating this issue so you're aware of it, and so anyone else googling for the issue can find it. This one was a beast to track down, since it only happens when PYTHONASYNCIODEBUG=1 and when the ZMQ future blocks for more than .1 second. Hopefully it's helpful to someone.

Full traceback:
python_traceback.txt

Code to reproduce bug

import asyncio
import time
import functools

async def slow_callback():
    await asyncio.sleep(.1)
    time.sleep(.2)  # Blocking sleep to trigger the warning
    await asyncio.sleep(.1)

async def main():
    task = asyncio.create_task(slow_callback())
    task.add_done_callback(
        functools.partial(print, [task, task])
    )
    await task

if __name__ == "__main__":
    asyncio.run(main(), debug=True)

Traceback, if applicable

No response

More info

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions