Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cancelling a fail_after results in TooSlowError, even though the timeout didn't expire #698

Open
joernheissler opened this issue Sep 28, 2018 · 11 comments

Comments

@joernheissler
Copy link

joernheissler commented Sep 28, 2018

I've got some code that uses fail_after and cancels itself, example below.
My assumption was that the context manager just exits without any error. Yet it raises a Cancelled and TooSlowError.
When I don't use fail_after but open_cancel_scope, things work as expected. I can't find anything in the docs that explain the difference. And I can't think of any good reason either.
TooSlowError is especially strange because the timeout wasn't reached.

Version: ce131cd on cpython3.6.6

#!/usr/bin/env python

import trio


async def main_a():
    with trio.CancelScope() as cancel_scope:
        await trio.sleep(1)
        cancel_scope.cancel()
        await trio.sleep(1)

    print('Hello!')


async def main_b():
    with trio.fail_after(10) as cancel_scope:
        await trio.sleep(1)
        cancel_scope.cancel()
        await trio.sleep(1)

    print('Hello!')


trio.run(main_a)
print("---")
trio.run(main_b)

Result is:

Hello!
---
Traceback (most recent call last):
  File "/home/joern/.venv/trio/lib/python3.6/site-packages/trio/_timeouts.py", line 116, in fail_at
    yield scope
  File "./bug", line 19, in main_b
    await trio.sleep(1)
  File "/home/joern/.venv/trio/lib/python3.6/site-packages/trio/_timeouts.py", line 85, in sleep
    await sleep_until(_core.current_time() + seconds)
  File "/home/joern/.venv/trio/lib/python3.6/site-packages/trio/_timeouts.py", line 66, in sleep_until
    await sleep_forever()
  File "/home/joern/.venv/trio/lib/python3.6/site-packages/trio/_timeouts.py", line 51, in sleep_forever
    await _core.wait_task_rescheduled(lambda _: _core.Abort.SUCCEEDED)
  File "/home/joern/.venv/trio/lib/python3.6/site-packages/trio/_core/_traps.py", line 165, in wait_task_rescheduled
    return (await _async_yield(WaitTaskRescheduled(abort_func))).unwrap()
  File "/home/joern/.venv/trio/lib/python3.6/site-packages/outcome/_sync.py", line 107, in unwrap
    raise self.error
  File "/home/joern/.venv/trio/lib/python3.6/site-packages/outcome/_async.py", line 19, in capture
    return Value(sync_fn(*args, **kwargs))
  File "/home/joern/.venv/trio/lib/python3.6/site-packages/trio/_core/_run.py", line 629, in raise_cancel
    raise exc
trio.Cancelled

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "./bug", line 26, in <module>
    trio.run(main_b)
  File "/home/joern/.venv/trio/lib/python3.6/site-packages/trio/_core/_run.py", line 1334, in run
    raise runner.main_task_outcome.error
  File "./bug", line 19, in main_b
    await trio.sleep(1)
  File "/usr/lib/python3.6/contextlib.py", line 99, in __exit__
    self.gen.throw(type, value, traceback)
  File "/home/joern/.venv/trio/lib/python3.6/site-packages/trio/_timeouts.py", line 118, in fail_at
    raise TooSlowError
trio.TooSlowError
@smurfix
Copy link
Contributor

smurfix commented Sep 28, 2018 via email

@joernheissler
Copy link
Author

Thanks!
But is that a fix or is it a workaround? :-)

@belm0
Copy link
Member

belm0 commented Sep 29, 2018

Likely no one has ever thought to manually cancel a fail_after(). Would you describe your use case?

Behavior seems completely in line with docs, though I understand not what you're wanting:

Creates a cancel scope with the given timeout, and raises an error if it is actually cancelled.

This function and move_on_after() are similar in that both create a cancel scope with a given timeout, and if the timeout expires then both will cause Cancelled to be raised within the scope. The difference is that when the Cancelled exception reaches move_on_after(), it’s caught and discarded. When it reaches fail_after(), then it’s caught and TooSlowError is raised in its place.

Raises TooSlowError – if a Cancelled exception is raised in this scope and caught by the context manager

There are two things that can be cancelled: 1) the cancel scope, 2) the notion that an exception should be raised. The cancel() method can only be bound to one, and since it's a method on the CancelScope object it understandably is the former.

Possibly need a cancel_fail_after() method new property (see below).

@belm0
Copy link
Member

belm0 commented Sep 29, 2018

Once we head down this road, I think it should be possible to dynamically transition between plain cancel scope, move_on_after, and fail_after semantics while the cancel scope is in flight. It's already possible to do that with the first two via deadline. There should be an additional raise_on_cancelled read/write property which is a bool, or perhaps an exception object.

@joernheissler
Copy link
Author

Likely no one has ever thought to manually cancel a fail_after(). Would you describe your use case?

In a nutshell: There's a long running task (might run infinitely). When a certain event happens, I want to finish it without error, so I'd cancel it. If the event does not occur within N seconds, I want the task to abort with an error.

That task receives messages from a stream and runs a callback for each message. The callback might then cancel the task.

My code and design are flawed and I'm going to rewrite it without using callbacks. Then the above use case won't exist anymore. I once learned that I need to use state machines and callbacks, unlearning this is hard.

Creates a cancel scope with the given timeout, and raises an error if it is actually cancelled.

I must have missed this. So it looks like the behaviour does match the docs.

Still I find it strange that cancelling one CancelScope causes it to return silently while cancelling another CancelScope causes it to fail with TooSlowError while in fact nothing was too slow.

@njsmith
Copy link
Member

njsmith commented Sep 29, 2018

Huh, interesting case.

So in general, trio doesn't track the reason why a scope was cancelled – a timeout is exactly the same as calling cancel. This is because of you do try to track the difference then there are all kinds of ambiguous cases, e.g. what do you do if it timed out and then someone called cancel as well, or even vice-versa, what if the timeout expires while it's in the process of handling the manual cancel.

And fail_after is an extremely simple wrapper around a cancel scope: https://github.com/python-trio/trio/blob/master/trio/_timeouts.py

But yeah the end result is kind of surprising here!

Some options:

  • Leave the code as is, but document it better

  • Remove fail_* entirely; when people ask about them treat it as an opportunity to teach about the wonders of .cancelled_caught

  • Start tracking cancel scope cancellation reason, maybe just in a minimal way like "was the first call to cancel from a timeout or not"

  • Have fail_after check the current time at the moment when it decides whether to raise TooSlowError, and skip it if the timeout couldn't possibly have expired

@njsmith
Copy link
Member

njsmith commented Sep 29, 2018

Hmm, above message was written this morning and then I forgot to hit post, so the discussion has overtaken it a bit.

I have trouble telling how useful fail_after really is, once you get used to doing things "the trio way". In other systems that don't have nestable cancel scopes, like asyncio, fail_after is kind of the only way to do things, so I partly just threw it in to make new migrants feel more at home. But that's probably not the only use case? It's hard to tell. We're all unlearning the old ways and trying to invent the new ways together :-).

If you're rewriting your code anyway, then maybe we should leave this be for the moment, and just keep this thread in mind in case it comes up again in the future?

@oremanj oremanj changed the title Cancelling a fail_after results in Cancelled exception Cancelling a fail_after results in TooSlowError, even though the timeout didn't expire May 16, 2019
@smurfix
Copy link
Contributor

smurfix commented May 16, 2019

My use case (close enough): I'm connecting to a host and my connection-setup handshake is protected by a fail_after because, well, if that takes too long the host isn't useable. You can imagine what happens when the whole subtask is cancelled for completely unrelated reasons. I don't see an obviously-correct way to handle that case.

So, yes, I'm all for some way to fix this issue, preferably one that doesn't require two additional syscalls to gettimeinfo or similar.

@oremanj
Copy link
Member

oremanj commented May 16, 2019

AFAICT, this issue only applies to cancelling the fail_after scope itself. Cancelling a scope that encloses the fail_after should work fine. The Happy Eyeballs example you mention seems to me like it would be of the latter species.

Are you seeing that cancelling a scope that encloses the fail_after results in a TooSlowError? That would definitely be a bug, and code to reproduce it would be appreciated.

@smurfix
Copy link
Contributor

smurfix commented May 16, 2019

Yes, that's what I'm seeing. I don't have a test case (a trivial attempt to reproduce this failed) but in my code the fail_after's scope is not even saved.

@oremanj
Copy link
Member

oremanj commented May 16, 2019

Are you using a released Trio, or master? There were some recent cancellation changes that could theoretically have bugs... I'm not able to reproduce the outer-cancellation-causes-inner-TooSlowError on master with a simple case either, though.

@Zac-HD Zac-HD added the docs label Mar 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants