Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Opening a CancelScope with shield=True also shields sibling tasks on asyncio backend #642

Closed
2 tasks done
DaGenix opened this issue Nov 27, 2023 · 15 comments · Fixed by #648
Closed
2 tasks done

Opening a CancelScope with shield=True also shields sibling tasks on asyncio backend #642

DaGenix opened this issue Nov 27, 2023 · 15 comments · Fixed by #648
Labels
bug Something isn't working

Comments

@DaGenix
Copy link

DaGenix commented Nov 27, 2023

Things to check first

  • I have searched the existing issues and didn't find my bug already reported there

  • I have checked that my bug is still present in the latest release

AnyIO version

4.1.0

Python version

3.11

What happened?

I want to run an asyncio task from inside of an anyio task group. Anyio enforces level triggered cancelation. But, in this case, I want to run some asyncio code that may not be prepared to deal with level triggered cancellation. So, my plan is to run the asyncio code that wants edge triggered cancellation inside of a shielded CancelScope. And then I want to run a sibling task that just wants for a cancellation signal, and then sends a single cancellation into the asyncio code when that occurs.

The problem that i'm running into, is that when I open a shielded CancelScope on the main task, the sibling task also seems to be shielded when running on the asyncio backend.

How can we reproduce the bug?

When I run this code, it hangs forever:

import anyio


async def func() -> None:
    cancelled = anyio.Event()

    async with anyio.create_task_group() as task_group:
        async def wait_cancel() -> None:
            try:
                await anyio.sleep_forever()
            except anyio.get_cancelled_exc_class():
                cancelled.set()
                raise

        task_group.start_soon(wait_cancel)

        with anyio.CancelScope(shield=True):
            await cancelled.wait()


async def main() -> None:
    async with anyio.create_task_group() as task_group:
        task_group.start_soon(func)
        await anyio.sleep(0.1)
        task_group.cancel_scope.cancel()


if __name__ == "__main__":
    anyio.run(main, backend="asyncio")

However, if you switch it to use the trio backend, it then completes as expected.

Additionally, if you change the function to detect cancellation in the main task and put the shielded CancelScope in the sibling task using the asyncio backend, it also completes as expected:

async def func() -> None:
    cancelled = anyio.Event()

    async with anyio.create_task_group() as task_group:
        async def wait_cancel() -> None:
            with anyio.CancelScope(shield=True):
                await cancelled.wait()

        task_group.start_soon(wait_cancel)

        try:
            await anyio.sleep_forever()
        except anyio.get_cancelled_exc_class():
            cancelled.set()
            raise
@DaGenix DaGenix added the bug Something isn't working label Nov 27, 2023
@DaGenix
Copy link
Author

DaGenix commented Nov 27, 2023

NOTE: I've since figured out a better way to do what I was trying to do here, which works quite well. But, I figured this issue report might still be useful.

@agronholm
Copy link
Owner

NOTE: I've since figured out a better way to do what I was trying to do here, which works quite well. But, I figured this issue report might still be useful.

I'm curious to hear what this solution was. Meanwhile I will investigate this issue.

@agronholm
Copy link
Owner

So this is what happens on asyncio:

  1. The task group's cancel scope in main() is cancelled
  2. While attempting to deliver the cancellation to the two tasks attached to this scope, it sees that the current cancel scope in the second task (the one running func()) is shielded, so it doesn't try to deliver the cancellation

It looks like to me that the way cancellation is propagated on asyncio isn't quite right.

@agronholm
Copy link
Owner

The fix would be propagating cancellation through a hierarchy of cancel scopes, rather than just looking at tasks. But touching the cancel scope mechanism isn't risk free, as it's rather complicated. Rigorous review is therefore needed, and one person (not me) already had a minor burn-out because of the complexity of cancellation issues.

@DaGenix
Copy link
Author

DaGenix commented Nov 27, 2023

@agronholm My solution looks something like (where func is a function I want to run with level triggered cancelation inside of a anyio scope). In my case it just so happens that I want to run the asyncio code in a separate task anyway, if you didn't want to do that, this wouldn't work:

    done = asyncio.Event()
    fut = asyncio.ensure_future(func(*args))
    fut.add_done_callback(lambda _: done.set())

    try:
        await done.wait()
    except asyncio.CancelledError:
        fut.cancel()
        with anyio.CancelScope(shield=True):
            await done.wait()
        raise

    return fut.result()

@DaGenix
Copy link
Author

DaGenix commented Nov 27, 2023

@agronholm Thanks for taking a look! Cancellation is quite tricky and I think anyio does a fantastic job of bringing a model of cancellation that is easier to reason about (trio's model, at least for me) to asyncio code.

@agronholm
Copy link
Owner

agronholm commented Dec 8, 2023

I simplified this into a regression test that currently passes on trio but fails on asyncio:

async def test_cancel_child_task_when_host_is_shielded() -> None:
    # Regression test for #642
    # Tests that cancellation propagates to a child task even if the host task is within
    # a shielded cancel scope.
    cancelled = anyio.Event()

    async def wait_cancel() -> None:
        try:
            await anyio.sleep_forever()
        except anyio.get_cancelled_exc_class():
            cancelled.set()
            raise

    with CancelScope() as parent_scope:
        async with anyio.create_task_group() as task_group:
            task_group.start_soon(wait_cancel)
            await wait_all_tasks_blocked()

            with CancelScope(shield=True), fail_after(1):
                parent_scope.cancel()
                await cancelled.wait()

@agronholm
Copy link
Owner

I was able to make this test pass on asyncio but that made 6 other tests fail.

@agronholm
Copy link
Owner

Jesus...this resulted in another whack-a-mole of bugs. Once I fix one test, two more tests break.

@smurfix
Copy link
Collaborator

smurfix commented Dec 8, 2023

Meh. In case you're not demotivated enough, I'm fairly sure that adding asyncio-native taskgroups into the mix will yield another heap of semi-intractable bugs.

@agronholm
Copy link
Owner

I've made significant progress with this. One problem I discovered and fixed was that, because I'm now moving tasks from parent cancel scopes' task lists to child scopes (and back, once the leaf scope is exited), the task group can no longer rely on the task list on its internal cancel scope, and has to maintain a separate list. Otherwise TaskGroup.__aexit__ will not wait on tasks that have entered their own cancel scopes.

Another problem was that, as cancellation now propagates directly to child cancel scopes, the cancel messages must used the ID of the cancel scope that originated the cancel operation, and I didn't immediately realize this.

These fixes didn't resolve all the test failures, and one of them is even hanging now. Still, the situation does seem to be improving.

@agronholm
Copy link
Owner

I figured out the hanging test. This was about the task_done() callback falsely assuming that the task group's cancel scope would always contain the task in question. This assumption was wrong if the user had called __enter__() on a cancel scope but skipped __exit__().

Only two test failures remain, one of which is just due to all the debugging print() calls I peppered the asyncio backend code with.

@agronholm
Copy link
Owner

Phew, finally got it working. The necessary changes were nontrivial but I'm fairly confident I didn't mess up. I would still need at least one review.

@agronholm
Copy link
Owner

@DaGenix the script you provided works on the fix-642 branch. Would you consider the problem fixed with these changes?

@DaGenix
Copy link
Author

DaGenix commented Dec 13, 2023

@agronholm yup! I did not review the changes themselves, but it certainly seems that it resolves the behavior I reported. I also tested this slightly different case with threads, with which the changes in #648 also resolved. Thanks!

import anyio
import anyio.to_thread
import threading


def blocking_wait_for_cancel(event: threading.Event) -> None:
    event.wait()


async def main() -> None:
    blocking_event = threading.Event()

    with anyio.CancelScope() as external:
        async with anyio.create_task_group() as task_group:
            async def wait_for_cancel() -> None:
                try:
                    await anyio.sleep_forever()
                finally:
                    blocking_event.set()
                    task_group.cancel_scope.cancel()

            task_group.start_soon(wait_for_cancel)

            async def do_cancel() -> None:
                await anyio.sleep(0.1)
                external.cancel()

            task_group.start_soon(do_cancel)

            await anyio.to_thread.run_sync(blocking_wait_for_cancel, blocking_event, abandon_on_cancel=False)


if __name__ == "__main__":
    anyio.run(main)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants