Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CancelScope(shield=True) protects nonmain tasks from KeyboardInterrupt #1014

Closed
astiob opened this issue Apr 21, 2019 · 4 comments
Closed

CancelScope(shield=True) protects nonmain tasks from KeyboardInterrupt #1014

astiob opened this issue Apr 21, 2019 · 4 comments

Comments

@astiob
Copy link

astiob commented Apr 21, 2019

Using CPython 3.6.5 and trio 0.11.0 on openSUSE Leap 42.3, the following code keeps running foo forever even after multiple keyboard interrupts:

import trio

async def foo():
    try:
        with trio.CancelScope(shield=True):
            i = 0
            while True:
                i += 1
                if i == 1000:
                    await trio.sleep(0)
                    i = 0
    except BaseException as e:
        print('foo got exception: %r' % e)
        raise
    else:
        print('foo exiting normally')

async def main():
    async with trio.open_nursery() as nursery:
        nursery.start_soon(foo)

trio.run(main)

For reference, here’s how it interacts with @trio.hazmat.enable_ki_protection:

  • Adding @trio.hazmat.enable_ki_protection to foo and keeping the shielding changes nothing.
  • Adding @trio.hazmat.enable_ki_protection and removing shielding makes it get a Cancelled exception from await trio.sleep(0).
  • Removing both allows it to get KeyboardInterrupt anywhere or Cancelled from await trio.sleep(0).

In my real use-case this is actually desired behaviour as I want a certain task to run to completion, including all async operations it performs, regardless of any keyboard interrupts; but it seems this behaviour is not documented.

Furthermore, it is inconsistent with the main task (is that the right term?) that is executed by trio.run: replacing trio.run(main) with trio.run(foo) makes foo always get KeyboardInterrupt regardless of the shielding, either anywhere (without @trio.hazmat.enable_ki_protection) or exactly from await trio.sleep(0) (with @trio.hazmat.enable_ki_protection).

I tried to search for this and saw #151, but it’s not clear to me whether it says the same thing about the main task (and implies that this is the intended behaviour in descendant tasks) or that shielded scopes never protect against KeyboardInterrupt (which is clearly not the case).

If this behaviour is intended, it would be good to make it clear in the documentation. All I saw about shielded scopes is that they protect from Cancelled. Meanwhile, @trio.hazmat.enable_ki_protection didn’t make it clear whether the task would actually get a KeyboardInterrupt from await <checkpoint> or just allow a KeyboardInterrupt to be raised after it yields control back to the event loop, in which case it might get cancelled by its parent nursery but be able to shield itself. (However, even if @trio.hazmat.enable_ki_protection is indeed supposed to turn interrupts into cancels, it is still not clear why shielding alone protects me from interrupts.)

@njsmith
Copy link
Member

njsmith commented Apr 21, 2019

The short answer is: there's no special mechanism for converting KeyboardInterrupt into Cancelled, or for making KeyboardInterrupt respect shielding.

However, there is a general mechanism, where if an exception is raised in some task and you don't catch it, then the exception propagation machinery will cancel that task's siblings, because that's the only way it can keep unwinding the stack and propagating the exception. And KeyboardInterrupt is an exception.

It's a special exception because it can pop into existence anywhere and anytime (almost), but once it's raised then it's just a regular exception, and it works just like any other time some code crashes with a random exception.

To give some more details about your specific observations:

By default, the contract is that KeyboardInterrupt can appear anywhere it feels like it, at any time. If that isn't basically acceptable to you, then you should use trio.open_signal_receiver(signal.SIGINT) to designate a single task to listen for control-C and then handle it the way you want.

Assuming you don't do that, then here's how it works currently:

There are internally two different mechanisms that can raise KeyboardInterrupt. One is basically Python's regular mechanism: at any point while Python bytecode is executing, signal handlers can run, and can raise exceptions. The other mechanism is more Trio specific: it can use the cancellation machinery to raise KeyboardInterrupt at a checkpoint.

Currently, the KI protection APIs only affect the first mechanism, that can raise KeyboardInterrupt at any arbitrary bytecode. They probably need some rethinking (#733); they're a pretty crude and low-level API currently, that was only designed to handle the minimum needed for Trio's own requirements.

So when you hit control-C, the flowchart is something like:

Is there a task running right now? Does it have @enable_ki_protection? If the answers are "yes" and "no", respectively, then raise KeyboardInterrupt immediately. This uses the first mechanism.

Otherwise, pick an arbitrary victim task, and try to wake it up via the cancellation machinery. When it wakes up, send it KeyboardInterrupt instead of Cancelled. This ignores shielding (and #151 is for discussion of whether that's a good thing).

In practice, right now Trio always picks the main task (the first one running user code) as its "arbitrary" task, because this simplifies the code. This is perfectly consistent with the contract (KeyboardInterrupt can happen at any place, and the main task is a place, therefore it can happen in the main task). You shouldn't rely on this in particular, but that's why you observed the main task acting differently.

It's pretty messy.

Some general guidelines:

  • Shielding is a very blunt tool. It can make it hard to get your program to shut down without having to forcibly terminate it. That's what it's designed to do, but it's usually only helpful in pretty restricted cases, where you know the operation will complete pretty quickly in any case.

  • KeyboardInterrupt is convenient but super messy. (This isn't a trio fact, it's a general python fact.) If you really care about clean shutdown on control-C, then use open_signal_receiver and skip the whole mess. If you're writing generic library code that has to maintain internal consistency while running inside programs that use KeyboardInterrupt, then @enable_ki_protection can help you patch up the biggest gaping holes, but it's still going to be messy and imperfect; that's just the nature of the beast.

@astiob
Copy link
Author

astiob commented Apr 21, 2019

Thanks for the detailed response!

This explanation generally matches the impression I had from reading the docs. (Although in some aspects it’s clearer than the docs and the docs could possibly be improved: e. g. by adding a note that within an @enable_ki_protection frame, checkpoints may throw KeyboardInterrupt back into this same frame, or it may get thrown elsewhere and the task eventually get cancelled by a nursery instead.)

However, the example code I posted shows that just a shielded cancel scope makes a task always ignore all keyboard interrupts. No matter how much I try, I can’t interrupt it. Following your explanation, normally I’d expect that the interrupt will occasionally happen while the task is sleeping and in that case the nursery will attempt to cancel the task but the shielding will protect it, but most of the time the interrupt will happen while the foo task is running, and then it should get the spontaneous KeyboardInterrupt exception in the usual way as plain synchronous single-threaded Python code does. Indeed, without the shielded scope, I do see the interrupt most often happen within foo code. But with the shielded scope, it never gets interrupted, even when I rerun the script many times and press Ctrl+C.

As for my personal use case for ignoring interrupts, I want to do it only for specific tasks that don’t run all of the time. I do want the rest of the program to be interrupted and the program to quit as soon as all of these specific tasks complete. Before converting to async, I used to launch these tasks in new threads to evade the main thread’s KeyboardInterrupt, which worked well.

If I understand correctly, if I were to use open_signal_receiver or the signal module to set up a signal handler for SIGINT, I would affect the whole process including all the other tasks. To make it interrupt everything other than my uninterruptible tasks, I would need some magic to route the interrupt where it needs to go.

It occurs to me that I could have it explicitly cancel my root nursery or main task, thereby converting the keyboard interrupt into a cancellation, which would then be affected by shields as usual. To do this, I’d need to store my main nursery somewhere and make sure I do nothing outside of this nursery, or use hazmat.current_task() to store the main task, or even walk up the task/nursery hierarchy through hazmat to find the main task from the signal handler.

All in all, this would require extra nontrivial code. You could say that’s the price I deserve to pay for this and maybe you’d be right*, but it would be so much nicer to be able to put the critical code in a context manager and have it all done automatically. And curiously, it seems it is possible currently, even if this is possibly unintended! just by wrapping it in a shielded cancel scope within an explicit nursery.

* And another way to look at it might be that if trio didn’t have special handling for keyboard interrupts, I’d want to put the event loop in a separate thread, catch the interrupt in the main and explicitly cancel the event loop anyway. Or even that this might look better/clearer in the code regardless of trio’s ability to handle interrupts.

@njsmith
Copy link
Member

njsmith commented Apr 21, 2019

However, the example code I posted shows that just a shielded cancel scope makes a task always ignore all keyboard interrupts. No matter how much I try, I can’t interrupt it.

That's weird. I tried running your code on my laptop (CPython 3.6.5, trio 0.11.0, Ubuntu 18.10), and my first two runs I got:

^Cfoo got exception: KeyboardInterrupt()
Traceback (most recent call last):
  File "/tmp/fdsa.py", line 22, in <module>
    trio.run(main)
  File "/home/njs/.user-python3.6/lib/python3.6/site-packages/trio/_core/_run.py", line 1444, in run
    raise runner.main_task_outcome.error
  File "/tmp/fdsa.py", line 20, in main
    nursery.start_soon(foo)
  File "/home/njs/.user-python3.6/lib/python3.6/site-packages/trio/_core/_run.py", line 506, in __aexit__
    raise combined_error_from_nursery
  File "/tmp/fdsa.py", line 9, in foo
    if i == 1000:
  File "/home/njs/.user-python3.6/lib/python3.6/site-packages/trio/_core/_ki.py", line 196, in handler
    raise KeyboardInterrupt
KeyboardInterrupt

and

^C^Cfoo got exception: KeyboardInterrupt()
Traceback (most recent call last):
  File "/tmp/fdsa.py", line 22, in <module>
    trio.run(main)
  File "/home/njs/.user-python3.6/lib/python3.6/site-packages/trio/_core/_run.py", line 1444, in run
    raise runner.main_task_outcome.error
  File "/tmp/fdsa.py", line 20, in main
    nursery.start_soon(foo)
  File "/home/njs/.user-python3.6/lib/python3.6/site-packages/trio/_core/_run.py", line 506, in __aexit__
    raise combined_error_from_nursery
trio.MultiError: KeyboardInterrupt(), KeyboardInterrupt()

Details of embedded exception 1:

  Traceback (most recent call last):
    File "/home/njs/.user-python3.6/lib/python3.6/site-packages/outcome/_async.py", line 19, in capture
      return Value(sync_fn(*args, **kwargs))
    File "/home/njs/.user-python3.6/lib/python3.6/site-packages/trio/_core/_run.py", line 751, in raise_cancel
      raise KeyboardInterrupt
  KeyboardInterrupt

Details of embedded exception 2:

  Traceback (most recent call last):
    File "/tmp/fdsa.py", line 9, in foo
      if i == 1000:
    File "/home/njs/.user-python3.6/lib/python3.6/site-packages/trio/_core/_ki.py", line 196, in handler
      raise KeyboardInterrupt
  KeyboardInterrupt

So it's not 100% deterministic, but out of three control-C's, two of them did interrupt the shielded code – line 196, in handler is what my post called the "first mechanism". And one was delivered to the main task – line 751, in raise_cancel is what my post called the "second mechanism". That one caused the nursery to try to cancel the child task, but the cancellation got stuck on the cancel shield, so the program didn't actually stop until my next control-C caused the child task to exit.

I tried running it a bunch more times, and I never had to hit control-C more than four times to get the program to exit.

Are you sure you're running exactly the code you pasted in the first issue? You don't have any stray @enable_ki_protections accidentally left in, do you?

It occurs to me that I could have it explicitly cancel my root nursery or main task, thereby converting the keyboard interrupt into a cancellation, which would then be affected by shields as usual. To do this, I’d need to store my main nursery somewhere and make sure I do nothing outside of this nursery, or use hazmat.current_task() to store the main task, or even walk up the task/nursery hierarchy through hazmat to find the main task from the signal handler.

If you want to cleanly cancel everything in the program, you can always raise an exception, like SystemExit (or even KeyboardInterrupt). As long as you don't catch it, or only catch it at the top of your program, then it'll cancel everything on its way out.

Of course for a signal handler like this, you'll usually want to set it up at the top of your main function anyway, so it should be easy for it to access your top-level nursery anyway.

async def clean_shutdown_on_control_C():
    with trio.open_signal_receiver(signal.SIGINT) as signal_receiver:
        async for _ in signal_receiver:
            sys.exit("Interrupted")

async def main():
    async with trio.open_nursery() as nursery:
        nursery.start_soon(clean_shutdown_on_control_C)
        # ... main program code here ...
        # and when we're done, cancel the signal handling task
        nursery.cancel_scope.cancel()

You could also use @asynccontextmanager to wrap this up into an async with clean_shutdown_on_control_C(): ...

@astiob
Copy link
Author

astiob commented Apr 21, 2019

OK, I’m very sorry, it looks like I got excited too quickly! I have just retried the exact same test, and now I’m able to kill it after at most several Ctrl+C presses. Weird. I must’ve been very unlucky when I originally tested this. I did double-check that I had the @enable_ki_protection commented out, and I did several attempts… Oh well, now it works as documented and I have no way of figuring out why it didn’t even if it wasn’t just bad luck.

So if I understand correctly, currently @enable_ki_protection combined with a shielded scope achieves the isolation I want, but this is subject to change in future when keyboard interrupts start to get served to tasks other than the main task.

Thanks for the new suggestion! D’oh, of course I can just throw something and let it unwind by itself. That’s simpler than I thought it would be. I’ll take this approach, since I don’t want to rely on the current behaviour described above if it can silently change in the future.

@astiob astiob closed this as completed Apr 21, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants