Skip to content

greenlet contextvars change introduces regressions #196

Closed
@oremanj

Description

@oremanj

In gevent/gevent#1656 (discussion of what eventually became #172), it was foreshadowed:

I'm a wee bit concerned, though, that this makes every greenlet its own "context" (if I understand correctly). It's sort of like an implicit monkey-patch. It would make certain patterns impossible (such as using two greenlets from a single asyncio task). I don't know how upstream will feel about that.

There's a pattern with recently growing popularity of using greenlets to execute async code from a syntactically synchronous context. sqlalchemy relies on this heavily in their async ORM support, and I also maintain an (evolutionarily unrelated) library https://github.com/oremanj/greenback that supports the same idiom. These assume that greenlet won't mess with contextvars, and they break when used with 0.4.17 which does so.

Unfortunately, the fix is not as easy as interposing a call to Context.run() when starting up the greenlet. greenback provides the illusion of a single call stack across sync & async domains, just as if there were no greenlet involved. This requires that the code inside the greenlet execute in the same context (not just a copy) that prevailed upon entry to the greenlet, so that code inside the greenlet can change contextvars in a way that's visible to code outside. (This is analogous to the fact that if you simply call a function, and that function changes contextvars, you can see the change after the function returns.) But contextvars has protections against entering the same context when it's "already running", and greenlet does not change the "context has been entered" flag when it switches contexts in the thread state, so it winds up being impossible post-0.4.17 to use the same context both inside and outside a greenlet:

>>> import greenlet
>>> import contextvars
>>> cv = contextvars.ContextVar("cv")
>>> ctx = contextvars.copy_context()
>>> ctx.run(cv.set, 42)
<Token var=<ContextVar name='cv' at 0x7feeccf86d60> at 0x7feece421680>
>>> ctx.run(greenlet.greenlet(ctx.run).switch, cv.get)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
RuntimeError: cannot enter context: <Context object at 0x7feece421800> is already entered
>>>

So, what to do about all this?

Minimum proposal: Contrary to #172 and #183, I don't actually think it makes sense for each greenlet to automatically get its own context. I do think switching contexts when you switch greenlets is a reasonable choice; but it should be up to the creator of each greenlet whether it wants to enter a different context or not, and the default should be "keep whatever context existed before the initial call to switch()". Rationale: Giving a greenlet a new context is easy: wrap the function your greenlet runs in Context().run() for an empty context (current behavior of 0.4.17), copy_context().run() for a copy of the current context, etc. Conversely not changing the context, if greenlet has decided to change it, is challenging-to-impossible as explained above.

  • Yes, I think the behavior in 0.4.17 should be changed even at this point, because consistency with greenlet's past behavior of several years seems more important than consistency with the behavior of a release that's been out a few weeks, especially when the latter is causing downstream problems.
  • I think the only concrete change needed to implement this minimum proposal is: instead of setting tstate->context to NULL when switching to a new greenlet, just don't change it. Continue to save/restore the context when switching to an already-running greenlet.

Stretch goal: greenlet should properly exit and enter the context when it switches greenlets, rather than just assigning the thread state member. This would allow multiple greenlets to be simultaneously suspended inside independent calls to the run() method of the same Context object, which is a reasonable thing to want (I've done the analogous thing in an async event loop in some rare cases) but it's a rather niche concern. greenback/sqlalchemy don't need this as long as greenlet supports "let the old context shine through" as described in the "minimum proposal" above. Implementing this "proper switching" is tricky both because of the delicate environment in which greenlet switching occurs (can't call PyContext_Enter / PyContext_Exit), and because the existing context state isn't designed to support being active on multiple stacks at once (it tracks a single ctx_prev which is supposed to represent "the context outside this one, which I'll return to when I'm exited"). If you're willing to rely on the CPython-private definition of the PyContext object layout (in Include/internal/pycore_context.h), you can do this by saving and restoring context->ctx_prev alongside context. If that's (reasonably) not a maintenance burden you want to take on, then I don't think "proper" save/restore is possible. (The current save/restore behavior is fine as long as there's only one Context.run() call per context active across all greenlet stacks.)

I am happy to implement this once there's agreement that the design seems reasonable.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions