Skip to content

Handle generator (and coroutine) state in the bytecode. #87849

Closed
@markshannon

Description

@markshannon
BPO 43683
Nosy @markshannon, @pablogsal, @sweeneyde, @dpgeorge
PRs
  • bpo-43683: Handle check for sending None to starting generator and coroutine in the bytecode. #25137
  • bpo-43683: Handle generator entry in bytecode #25138
  • bpo-43683: Minor corrections. #25224
  • bpo-43683: bump the bytecode magic number #25225
  • bpo-43683: Streamline YIELD_VALUE and SEND #30723
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/markshannon'
    closed_at = None
    created_at = <Date 2021-03-31.16:45:03.337>
    labels = ['3.11', '3.10', 'performance']
    title = 'Handle generator (and coroutine) state in the bytecode.'
    updated_at = <Date 2022-01-24.11:08:57.165>
    user = 'https://github.com/markshannon'

    bugs.python.org fields:

    activity = <Date 2022-01-24.11:08:57.165>
    actor = 'Mark.Shannon'
    assignee = 'Mark.Shannon'
    closed = False
    closed_date = None
    closer = None
    components = []
    creation = <Date 2021-03-31.16:45:03.337>
    creator = 'Mark.Shannon'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 43683
    keywords = ['patch']
    message_count = 15.0
    messages = ['389919', '390305', '390354', '404564', '405081', '405083', '405096', '407983', '407987', '407988', '408006', '408009', '409725', '409751', '411465']
    nosy_count = 4.0
    nosy_names = ['Mark.Shannon', 'pablogsal', 'Dennis Sweeney', 'dpgeorge']
    pr_nums = ['25137', '25138', '25224', '25225', '30723']
    priority = None
    resolution = None
    stage = 'patch review'
    status = 'open'
    superseder = None
    type = 'performance'
    url = 'https://bugs.python.org/issue43683'
    versions = ['Python 3.10', 'Python 3.11']

    Linked PRs

    Activity

    markshannon

    markshannon commented on Mar 31, 2021

    @markshannon
    MemberAuthor

    Every time we send, or throw, to a generator, the C code in genobject.c needs to check what state the generator is in.
    This is inefficient and couples the generator code, which should just be a thin wrapper around the interpreter, to the internals of the interpreter.

    The state of the generator is known to the compiler. It should emit appropriate bytecodes to handle the different behavior for the different states.

    While the main reason this is robustness and maintainability, removing the complex C code between Python caller and Python callee also opens up the possibility of some worthwhile optimizations.

    There are three changes I want to make:

    1. Add a new bytecode to handle starting a generator. This GEN_START bytecode would pop TOS, raising an exception if it is not None.
      This adds some overhead for the first call to iter()/send() but speeds up all the others.

    2. Handle the case of exhausted generators. This is a bit more fiddly, and involves putting an infinite loop at the end of the generator. Something like:

      CLEAR_FRAME
      label:
      GEN_RETURN (Like RETURN_VALUE None, but does not discard the frame)
      JUMP label

    This removes a lot of special case code for corner cases of exhausted generators and coroutines.

    1. Handle throw() on YIELD_FROM. The problem here is that we need to differentiate between exceptions triggered by throw, which must call throw() on sub-generators, and exceptions propagating out of sub-generators which should be passed up the stack. By splitting the opcode into two (or more), it is clear which case is being handled in the interpreter without complicated logic in genobject.c
    self-assigned this
    on Mar 31, 2021
    markshannon

    markshannon commented on Apr 6, 2021

    @markshannon
    MemberAuthor

    New changeset b37181e by Mark Shannon in branch 'master':
    bpo-43683: Handle generator entry in bytecode (GH-25138)
    b37181e

    sweeneyde

    sweeneyde commented on Apr 6, 2021

    @sweeneyde
    Member

    Looks like we both opened PRs in the same minute.

    The MAGIC constant didn't get updated, but perhaps that can just be included in the Minor Corrections PR.

    I'd bet a CI check could be added to check that if the opcodes change then Python/importlib_external.h changes.

    dpgeorge

    dpgeorge commented on Oct 21, 2021

    @dpgeorge
    Mannequin

    It looks like this change introduced a subtle, and maybe intended (?), behavioural change.

    Consider (from MicroPython's test suite):

    def f():
        n = 0 
        while True:
            n = yield n + 1 
            print(n)
    
    g = f()
    try:
        g.send(1)
    except TypeError:
        print("caught")
    
    print(g.send(None))
    print(g.send(100))
    print(g.send(200))

    This used to work prior to commit b37181e. But after that commit it fails on the print(g.send(None)) because the generator is now stopped.

    markshannon

    markshannon commented on Oct 27, 2021

    @markshannon
    MemberAuthor

    Damien, thanks for catching this.

    The change was not intended.

    There are two kind of exceptions raised by send.

    1. Where a pre-condition is not met, e.g. a generator is already ruuning (caller errors)
    2. When the generator/coroutine raises an exception (callee exceptions).

    (1) does not change the state of the generator, (2) does.

    Sending non-None to a generator that has not started falls into kind 1, IMO. So we should revert the change.

    markshannon

    markshannon commented on Oct 27, 2021

    @markshannon
    MemberAuthor

    Just to be clear, it is the behavior change that should be reverted, not necessarily the new bytecode.

    In fact we should probably push on with (2) and add an exception handler wrapping the whole generator except the GEN_START. That way a GEN_START exception will not clear the generator.
    GEN_START might need to do a bit of trickery like setting lasti back to -1.

    dpgeorge

    dpgeorge commented on Oct 27, 2021

    @dpgeorge
    Mannequin

    Thanks for confirming the bug.

    Sending non-None to a not-started generator could arguably be case (2), because that's exactly the semantics introduced by the commit that broke the test case :)

    Honestly I don't have a strong opinion on which way this goes. But I think it would be interesting to know if there was code out there that relied on the original behaviour.

    pablogsal

    pablogsal commented on Dec 7, 2021

    @pablogsal
    Member

    Mark, is something left in this issue?

    sweeneyde

    sweeneyde commented on Dec 7, 2021

    @sweeneyde
    Member

    bpo-46009 about the same behavior change

    pablogsal

    pablogsal commented on Dec 7, 2021

    @pablogsal
    Member

    Unfortunately, this has not been fixed into 3.10.1 and 3.11.0a3 as this hasn't version information and therefore has missed our automatic checks for blockers.

    I have marked https://bugs.python.org/issue46009? as release blocker, as well as this issue and I have marked the versions.

    Please, ensure this is fixed ASAP so it doesn't go into the next bugfix release or alpha release.

    added
    3.10only security fixes
    3.11only security fixes
    on Dec 7, 2021

    23 remaining items

    Loading
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

    Metadata

    Metadata

    Assignees

    Labels

    3.10only security fixes3.11only security fixes3.12bugs and security fixesperformancePerformance or resource usage

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      Handle generator (and coroutine) state in the bytecode. · Issue #87849 · python/cpython