Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

shutdown_default_executor / wait_for_tstate_lock deadlock (?) after Ctrl+C #111358

Open
xbeastx opened this issue Oct 26, 2023 · 10 comments
Open
Labels
topic-asyncio type-bug An unexpected behavior, bug, or error

Comments

@xbeastx
Copy link

xbeastx commented Oct 26, 2023

Bug report

Bug description:

Blocking function running in thread executor will produce deadlock on shutdown after Ctrl+C pressed.

Here is minimal reproducing example:

import asyncio
import time

def blocking():
    print("sleep start")
    time.sleep(1000)
    print("sleep end")

async def main():
    await asyncio.to_thread(blocking)

asyncio.run(main())

Steps to reproduce:

  1. Run script
  2. Try press Ctrl+C when blocking func sleeps | Nothing produced
  3. If press Ctrl+C again | Will print KeyboardInterrupt exception but script will still hang
  4. If press Ctrl+C again | Will print one more KeyboardInterrupt and will shutdown.

(Tested on 3.9<=python<=3.12 on Ubuntu 22.04 all have same behaviour)

Output:

$ python3.11 test.py 
sleep start
^C^CTraceback (most recent call last):
  File "/usr/lib/python3.11/asyncio/runners.py", line 118, in run
    return self._loop.run_until_complete(task)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/asyncio/base_events.py", line 653, in run_until_complete
    return future.result()
           ^^^^^^^^^^^^^^^
  File "/home/ilya/test.py", line 13, in main
    await asyncio.to_thread(blocking)
  File "/usr/lib/python3.11/asyncio/threads.py", line 25, in to_thread
    return await loop.run_in_executor(None, func_call)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
asyncio.exceptions.CancelledError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.11/asyncio/runners.py", line 190, in run
    return runner.run(main)
           ^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/asyncio/runners.py", line 123, in run
    raise KeyboardInterrupt()
KeyboardInterrupt

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/ilya/test.py", line 15, in <module>
    asyncio.run(main())
  File "/usr/lib/python3.11/asyncio/runners.py", line 189, in run
    with Runner(debug=debug) as runner:
  File "/usr/lib/python3.11/asyncio/runners.py", line 63, in __exit__
    self.close()
  File "/usr/lib/python3.11/asyncio/runners.py", line 73, in close
    loop.run_until_complete(loop.shutdown_default_executor())
  File "/usr/lib/python3.11/asyncio/base_events.py", line 640, in run_until_complete
    self.run_forever()
  File "/usr/lib/python3.11/asyncio/base_events.py", line 607, in run_forever
    self._run_once()
  File "/usr/lib/python3.11/asyncio/base_events.py", line 1884, in _run_once
    event_list = self._selector.select(timeout)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/selectors.py", line 468, in select
    fd_event_list = self._selector.poll(timeout, max_ev)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
KeyboardInterrupt
^CException ignored in: <module 'threading' from '/usr/lib/python3.11/threading.py'>
Traceback (most recent call last):
  File "/usr/lib/python3.11/threading.py", line 1560, in _shutdown
    atexit_call()
  File "/usr/lib/python3.11/concurrent/futures/thread.py", line 31, in _python_exit
    t.join()
  File "/usr/lib/python3.11/threading.py", line 1119, in join
    self._wait_for_tstate_lock()
  File "/usr/lib/python3.11/threading.py", line 1139, in _wait_for_tstate_lock
    if lock.acquire(block, timeout):
       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
KeyboardInterrupt: 

Additional:
Found a blog post https://www.roguelynn.com/words/asyncio-sync-and-threaded/ with looks like similar behavior back in 2019 where someone is trying to workaround this.

CPython versions tested on:

3.9, 3.10, 3.11, 3.12

Operating systems tested on:

Linux

Linked PRs

@xbeastx xbeastx added the type-bug An unexpected behavior, bug, or error label Oct 26, 2023
@github-project-automation github-project-automation bot moved this to Todo in asyncio Nov 24, 2023
@Yaro1
Copy link

Yaro1 commented Nov 29, 2023

Hi, I would like to take that issue!

@gvanrossum
Copy link
Member

@Yaro1 -- You also claimed another issue. Can you choose which one you'd like to tackle first?

@Yaro1
Copy link

Yaro1 commented Dec 1, 2023

@gvanrossum I will take first this one #109564

@ordinary-jamie
Copy link
Contributor

tl;dr: The first "extra" keyboard interrupt is because the loop is waiting for the executor to shutdown. The second "extra" keyboard interrupt is because the Python interpreter will wait for all non-daemon threads to join.


Waiting for ThreadPoolExecutor.shutdown

The initial "hanging" after the first Ctrl+C SIGINT is because the asyncio.runners.Runner waits for the concurrent.futures.ThreadPoolExecutor to shutdown:

loop.run_until_complete(
loop.shutdown_default_executor(constants.THREAD_JOIN_TIMEOUT))

self._default_executor.shutdown(wait=True)

And ThreadPoolExecutor.shutdown(wait=True) will wait for all of its threads to join. This is the code location which is "hanging".

Setting the wait flag to False will allow the loop to finish and exit, and the KeyboardInterrupt will be promptly shown to the user on the first SIGINT. Because the loop is no longer blocked by the executor shutdown, we need one less keyboard interrupt; just one more SIGINT is needed to stop the thread since the Python main thread will still wait for all non-daemon threads to join.

Cannot cancel concurrent.futures.Future

The final stack trace you see is from concurrent.futures.ThreadPoolExecutor, you can run this without asyncio and try to interrupt it to see the same error and hanging behaviour:

import concurrent.futures
import time

with concurrent.futures.ThreadPoolExecutor(1) as pool:
    t = pool.submit(time.sleep, 10)

In the asyncio example, The first SIGINT is indeed caught by the asyncio runner. It tries to cancel the ThreadPoolExecutor created future, but concurrent.futures.Futures cannot actually be cancelled once they have started. You will see the chained asyncio.Future attempt this but the concurrent.futures.Future.cancel() call will return False.

def cancel(self):
"""Cancel the future if possible.
Returns True if the future was cancelled, False otherwise. A future
cannot be cancelled if it is running or has already completed.
"""

@gvanrossum
Copy link
Member

Thanks, @ordinary-jamie! Putting it in terms I can understand:

  • The to_thread() call basically just calls loop.run_in_executor(), which enters the sleep in a thread managed by the loop's default executor. Nothing we do will stop that thread -- it doesn't receive signals, you can't cancel it.
  • The first time the the runner receives a SIGINT, it cancels the main task (in this example, this is running the main() coroutine) and then raises KeyboardInterrupt.
  • The main() coroutine (blocked in await) then receives a CancelledError and it exits.
  • The run() call then enters runner.__exit__() which calls runner.close(). Note that the toplevel __main__ routine isn't finished until __exit__() returns -- this is why we don't get a traceback printed at this point.
  • The runner.close() method calls the coroutine loop.shutdown_default_executor() passing it a timeout of 300 seconds (constants.THREAD_JOIN_TIMEOUT).
  • This is where Jamie's explanation picks up.
  • To shut down an executor, you must call its shutdown() method.
  • Because we (for some reason) would like to wait for the executor to complete (but not more than the given timeout), we must call shutdown() from a thread -- we create a helper thread running the loop._do_shutdown() method.
  • This is where we get in trouble.
  • The executor is still executing our sleep(1000) call, and nothing will stop that. (To illustrate this for myself, I replaced it with a countdown loop that prints the counter once a second. It just keeps ticking.)
  • The loop._do_shutdown() method hangs in executor.shutdown(wait=True) until that counter reaches zero.
  • So now we wait indefinitely -- not 300 seconds as you'd expect.
  • IOW, the timeout passed to shutdown_default_executor() is ineffective.

It appears the author of the code didn't think this through (or a change was made that broke this). A future is created and waited for before we join the thread -- the timeout is only used for joining the thread, but the future doesn't complete until the shutdown() calls exits (i.e., not until the last task already running in the executor completes -- that's the sleep(1000)).

So how do we fix this? It looks like Jamie proposes to change the shutdown() call to pass wait=False, which will cause shutdown() to return immediately, and then everything (other than the sleep(1000) call) will exit immediately. We'll see the tracebacks immediately, and then we'll still hang until the sleep(1000) finishes -- which we can interrupt with a second ^C.

But that doesn't seem to be what the author of the code intended either. If there's a task running in the executor that's almost finished, we just abandon it. The author likely intended to wait up to 300 seconds for running tasks, and then give up. Setting wait=True never gives up, setting it to False gives up immediately.

So what to do? Maybe we should just use

async with asyncio.timeout(timeout):
    await future

and we should probably only wait for the thread if we don't time out (and print the warning whenever we do).

This solution can be backported to 3.12 and 3.11, which have the fancy timeout.

@ordinary-jamie
Copy link
Contributor

It looks like Jamie proposes to change the shutdown() call to pass wait=False

Sorry for the confusion, I intended that to just be a demo of the shutdown blocking the keyboard interrupt from surfacing

IOW, the timeout passed to shutdown_default_executor() is ineffective.
Maybe we should just use [asyncio.timeout]

Agreed! I can work on a fix :)

@gvanrossum
Copy link
Member

@xbeastx If you read the PR you'll see that it doesn't really fix your issue -- it fixes a legitimate bug in the timeout handling, but you'll still need three ^C -- it's just how the exit handling of asyncio.run() works.

@xbeastx
Copy link
Author

xbeastx commented Feb 18, 2024

@gvanrossum

Thank you for the response!
I see... But from the user's point of view it still looks very strange.

Let's imagine a real program that call some blocking function (let's say it calls "input") in thread pool executor, and the user tries to abort it. How could he imagine the logic of "three ctrl+c", even if this in itself is not strange =)

So from my point of view, an asyncio program here should behave in the same way as a non-async one.

import time


def blocking():
    print("sleep start")
    time.sleep(1000)
    print("sleep end")


def main():
    blocking()


if __name__ == '__main__':
    main()

With one Ctrl+C it produces:

$ python3.11 test.py 
sleep start
^CTraceback (most recent call last):
  File ".../test.py", line 15, in <module>
    main()
  File ".../test.py", line 11, in main
    blocking()
  File ".../test.py", line 6, in blocking
    time.sleep(1000)
KeyboardInterrupt

But I would agree that problems is more broader than it and technically asyncio code more equals something like this:

import time
from concurrent.futures import ThreadPoolExecutor


def blocking():
    print("sleep start")
    time.sleep(1000)
    print("sleep end")


def main():
    with ThreadPoolExecutor(max_workers=1) as executor:
        try:
            future = executor.submit(blocking)
            future.result()
        except KeyboardInterrupt:
            print("KeyboardInterrupt")
            raise


if __name__ == '__main__':
    main()

Which leads to twice-ctrl+c logic =)

sleep start
^CKeyboardInterrupt
^CTraceback (most recent call last):
  File ".../demo.py", line 15, in main
    future.result()
  File ".../lib/python3.11/concurrent/futures/_base.py", line 451, in result
    self._condition.wait(timeout)
  File ".../lib/python3.11/threading.py", line 327, in wait
    waiter.acquire()
KeyboardInterrupt

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File ".../demo.py", line 28, in <module>
    main()
  File ".../demo.py", line 12, in main
    with ThreadPoolExecutor(max_workers=1) as executor:
  File ".../lib/python3.11/concurrent/futures/_base.py", line 647, in __exit__
    self.shutdown(wait=True)
  File ".../lib/python3.11/concurrent/futures/thread.py", line 235, in shutdown
    t.join()
  File ".../lib/python3.11/threading.py", line 1119, in join
    self._wait_for_tstate_lock()
  File ".../lib/python3.11/threading.py", line 1139, in _wait_for_tstate_lock
    if lock.acquire(block, timeout):
       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
KeyboardInterrupt

Those the problem is that there is no clean way to cancel the thread task in the threadpool executor?
If I understand correctly, this is all because we cannot kill a thread safely (it can hold some critical sections or other os resources)?

@gvanrossum
Copy link
Member

Such a real program should not use the default executor. The default executor exists because certain C library APIs (notable getaddrinfo) don't have a non-blocking version, so asyncio must run them in a thread. But you're free to use a different executor, and use asyncio's loop.run_in_executor(exe, func, <args>) API with your own executor. Then you can close it however you want.

@gvanrossum
Copy link
Member

And yes, if you want to be able to kill threads, you have to bark up another tree. There have been endless discussions about that but the conclusion is always the same -- it's unsafe, it can get the runtime in an unknown state, so please do something else.

miss-islington pushed a commit to miss-islington/cpython that referenced this issue Feb 19, 2024
…ult_executor (pythonGH-115622)

(cherry picked from commit 53d5e67)

Co-authored-by: Jamie Phan <jamie@ordinarylab.dev>
gvanrossum pushed a commit that referenced this issue Feb 19, 2024
…ault_executor (GH-115622) (#115641)

(cherry picked from commit 53d5e67)

Co-authored-by: Jamie Phan <jamie@ordinarylab.dev>
woodruffw pushed a commit to woodruffw-forks/cpython that referenced this issue Mar 4, 2024
diegorusso pushed a commit to diegorusso/cpython that referenced this issue Apr 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic-asyncio type-bug An unexpected behavior, bug, or error
Projects
Status: Todo
Development

No branches or pull requests

5 participants