Cancelling a `fail_after` results in TooSlowError, even though the timeout didn't expire #698

joernheissler · 2018-09-28T16:28:50Z

I've got some code that uses fail_after and cancels itself, example below.
My assumption was that the context manager just exits without any error. Yet it raises a Cancelled and TooSlowError.
When I don't use fail_after but open_cancel_scope, things work as expected. I can't find anything in the docs that explain the difference. And I can't think of any good reason either.
TooSlowError is especially strange because the timeout wasn't reached.

Version: ce131cd on cpython3.6.6

#!/usr/bin/env python

import trio


async def main_a():
    with trio.CancelScope() as cancel_scope:
        await trio.sleep(1)
        cancel_scope.cancel()
        await trio.sleep(1)

    print('Hello!')


async def main_b():
    with trio.fail_after(10) as cancel_scope:
        await trio.sleep(1)
        cancel_scope.cancel()
        await trio.sleep(1)

    print('Hello!')


trio.run(main_a)
print("---")
trio.run(main_b)

Result is:

Hello!
---
Traceback (most recent call last):
  File "/home/joern/.venv/trio/lib/python3.6/site-packages/trio/_timeouts.py", line 116, in fail_at
    yield scope
  File "./bug", line 19, in main_b
    await trio.sleep(1)
  File "/home/joern/.venv/trio/lib/python3.6/site-packages/trio/_timeouts.py", line 85, in sleep
    await sleep_until(_core.current_time() + seconds)
  File "/home/joern/.venv/trio/lib/python3.6/site-packages/trio/_timeouts.py", line 66, in sleep_until
    await sleep_forever()
  File "/home/joern/.venv/trio/lib/python3.6/site-packages/trio/_timeouts.py", line 51, in sleep_forever
    await _core.wait_task_rescheduled(lambda _: _core.Abort.SUCCEEDED)
  File "/home/joern/.venv/trio/lib/python3.6/site-packages/trio/_core/_traps.py", line 165, in wait_task_rescheduled
    return (await _async_yield(WaitTaskRescheduled(abort_func))).unwrap()
  File "/home/joern/.venv/trio/lib/python3.6/site-packages/outcome/_sync.py", line 107, in unwrap
    raise self.error
  File "/home/joern/.venv/trio/lib/python3.6/site-packages/outcome/_async.py", line 19, in capture
    return Value(sync_fn(*args, **kwargs))
  File "/home/joern/.venv/trio/lib/python3.6/site-packages/trio/_core/_run.py", line 629, in raise_cancel
    raise exc
trio.Cancelled

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "./bug", line 26, in <module>
    trio.run(main_b)
  File "/home/joern/.venv/trio/lib/python3.6/site-packages/trio/_core/_run.py", line 1334, in run
    raise runner.main_task_outcome.error
  File "./bug", line 19, in main_b
    await trio.sleep(1)
  File "/usr/lib/python3.6/contextlib.py", line 99, in __exit__
    self.gen.throw(type, value, traceback)
  File "/home/joern/.venv/trio/lib/python3.6/site-packages/trio/_timeouts.py", line 118, in fail_at
    raise TooSlowError
trio.TooSlowError

The text was updated successfully, but these errors were encountered:

smurfix · 2018-09-28T17:52:31Z

On 28.09.2018 18:28, Jörn Heissler wrote: |TooSlowError| is especially strange because the timeout wasn't reached.

The problem is that `fail_after` cannot discover that it was cancelled manually by yourself instead of the timeout. Fix: create a secondary scope inside the `fail_after`, and cancel that.

…

-- -- Matthias Urlichs

joernheissler · 2018-09-28T20:23:57Z

Thanks!
But is that a fix or is it a workaround? :-)

belm0 · 2018-09-29T07:18:19Z

Likely no one has ever thought to manually cancel a fail_after(). Would you describe your use case?

Behavior seems completely in line with docs, though I understand not what you're wanting:

Creates a cancel scope with the given timeout, and raises an error if it is actually cancelled.

This function and move_on_after() are similar in that both create a cancel scope with a given timeout, and if the timeout expires then both will cause Cancelled to be raised within the scope. The difference is that when the Cancelled exception reaches move_on_after(), it’s caught and discarded. When it reaches fail_after(), then it’s caught and TooSlowError is raised in its place.

Raises TooSlowError – if a Cancelled exception is raised in this scope and caught by the context manager

There are two things that can be cancelled: 1) the cancel scope, 2) the notion that an exception should be raised. The cancel() method can only be bound to one, and since it's a method on the CancelScope object it understandably is the former.

Possibly need a ~~cancel_fail_after() method~~ new property (see below).

belm0 · 2018-09-29T07:30:12Z

Once we head down this road, I think it should be possible to dynamically transition between plain cancel scope, move_on_after, and fail_after semantics while the cancel scope is in flight. It's already possible to do that with the first two via deadline. There should be an additional raise_on_cancelled read/write property which is a bool, or perhaps an exception object.

joernheissler · 2018-09-29T09:22:45Z

Likely no one has ever thought to manually cancel a fail_after(). Would you describe your use case?

In a nutshell: There's a long running task (might run infinitely). When a certain event happens, I want to finish it without error, so I'd cancel it. If the event does not occur within N seconds, I want the task to abort with an error.

That task receives messages from a stream and runs a callback for each message. The callback might then cancel the task.

My code and design are flawed and I'm going to rewrite it without using callbacks. Then the above use case won't exist anymore. I once learned that I need to use state machines and callbacks, unlearning this is hard.

Creates a cancel scope with the given timeout, and raises an error if it is actually cancelled.

I must have missed this. So it looks like the behaviour does match the docs.

Still I find it strange that cancelling one CancelScope causes it to return silently while cancelling another CancelScope causes it to fail with TooSlowError while in fact nothing was too slow.

njsmith · 2018-09-29T09:49:31Z

Huh, interesting case.

So in general, trio doesn't track the reason why a scope was cancelled – a timeout is exactly the same as calling cancel. This is because of you do try to track the difference then there are all kinds of ambiguous cases, e.g. what do you do if it timed out and then someone called cancel as well, or even vice-versa, what if the timeout expires while it's in the process of handling the manual cancel.

And fail_after is an extremely simple wrapper around a cancel scope: https://github.com/python-trio/trio/blob/master/trio/_timeouts.py

But yeah the end result is kind of surprising here!

Some options:

Leave the code as is, but document it better
Remove fail_* entirely; when people ask about them treat it as an opportunity to teach about the wonders of .cancelled_caught
Start tracking cancel scope cancellation reason, maybe just in a minimal way like "was the first call to cancel from a timeout or not"
Have fail_after check the current time at the moment when it decides whether to raise TooSlowError, and skip it if the timeout couldn't possibly have expired

njsmith · 2018-09-29T09:55:30Z

Hmm, above message was written this morning and then I forgot to hit post, so the discussion has overtaken it a bit.

I have trouble telling how useful fail_after really is, once you get used to doing things "the trio way". In other systems that don't have nestable cancel scopes, like asyncio, fail_after is kind of the only way to do things, so I partly just threw it in to make new migrants feel more at home. But that's probably not the only use case? It's hard to tell. We're all unlearning the old ways and trying to invent the new ways together :-).

If you're rewriting your code anyway, then maybe we should leave this be for the moment, and just keep this thread in mind in case it comes up again in the future?

smurfix · 2019-05-16T17:47:24Z

My use case (close enough): I'm connecting to a host and my connection-setup handshake is protected by a fail_after because, well, if that takes too long the host isn't useable. You can imagine what happens when the whole subtask is cancelled for completely unrelated reasons. I don't see an obviously-correct way to handle that case.

So, yes, I'm all for some way to fix this issue, preferably one that doesn't require two additional syscalls to gettimeinfo or similar.

oremanj · 2019-05-16T18:22:30Z

AFAICT, this issue only applies to cancelling the fail_after scope itself. Cancelling a scope that encloses the fail_after should work fine. The Happy Eyeballs example you mention seems to me like it would be of the latter species.

Are you seeing that cancelling a scope that encloses the fail_after results in a TooSlowError? That would definitely be a bug, and code to reproduce it would be appreciated.

smurfix · 2019-05-16T18:39:47Z

Yes, that's what I'm seeing. I don't have a test case (a trivial attempt to reproduce this failed) but in my code the fail_after's scope is not even saved.

oremanj · 2019-05-16T18:45:14Z

Are you using a released Trio, or master? There were some recent cancellation changes that could theoretically have bugs... I'm not able to reproduce the outer-cancellation-causes-inner-TooSlowError on master with a simple case either, though.

oremanj added the cancellation label May 4, 2019

oremanj mentioned this issue May 16, 2019

fail_after raises TooSlow when it's cancelled #1054

Closed

oremanj changed the title ~~Cancelling a fail_after results in Cancelled exception~~ Cancelling a fail_after results in TooSlowError, even though the timeout didn't expire May 16, 2019

Zac-HD added the docs label Mar 17, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cancelling a `fail_after` results in TooSlowError, even though the timeout didn't expire #698

Cancelling a `fail_after` results in TooSlowError, even though the timeout didn't expire #698

joernheissler commented Sep 28, 2018 •

edited by Zac-HD

Loading

smurfix commented Sep 28, 2018 via email

joernheissler commented Sep 28, 2018

belm0 commented Sep 29, 2018 •

edited

Loading

belm0 commented Sep 29, 2018 •

edited

Loading

joernheissler commented Sep 29, 2018

njsmith commented Sep 29, 2018

njsmith commented Sep 29, 2018

smurfix commented May 16, 2019 •

edited

Loading

oremanj commented May 16, 2019 •

edited

Loading

smurfix commented May 16, 2019

oremanj commented May 16, 2019

Cancelling a fail_after results in TooSlowError, even though the timeout didn't expire #698

Cancelling a fail_after results in TooSlowError, even though the timeout didn't expire #698

Comments

joernheissler commented Sep 28, 2018 • edited by Zac-HD Loading

smurfix commented Sep 28, 2018 via email

joernheissler commented Sep 28, 2018

belm0 commented Sep 29, 2018 • edited Loading

belm0 commented Sep 29, 2018 • edited Loading

joernheissler commented Sep 29, 2018

njsmith commented Sep 29, 2018

njsmith commented Sep 29, 2018

smurfix commented May 16, 2019 • edited Loading

oremanj commented May 16, 2019 • edited Loading

smurfix commented May 16, 2019

oremanj commented May 16, 2019

Cancelling a `fail_after` results in TooSlowError, even though the timeout didn't expire #698

Cancelling a `fail_after` results in TooSlowError, even though the timeout didn't expire #698

joernheissler commented Sep 28, 2018 •

edited by Zac-HD

Loading

belm0 commented Sep 29, 2018 •

edited

Loading

belm0 commented Sep 29, 2018 •

edited

Loading

smurfix commented May 16, 2019 •

edited

Loading

oremanj commented May 16, 2019 •

edited

Loading