Skip to content

[Bug] GeneratorExit inadvertently caught, and also core timer mismatch  #325

Closed
@cretz

Description

@cretz

Describe the bug

There are two problems here:

GeneratorExit is caught somewhere

When a coroutine is closed, Python throws a GeneratorExit from the coroutine and expects that to be bubbled out. But somewhere we are not bubbling that out so Python throws a runtime exception on this. So we're getting:

RuntimeError: coroutine ignored GeneratorExit
Exception ignored in: <coroutine object _WorkflowInstanceImpl._apply_start_workflow..run_workflow at 0x000001F7B495E340>

With a stack trace.

Core timer error

When workflow is evicted, sometimes when it reruns from scratch it gets something like:

2023-06-05T22:04:56.500788Z WARN temporal_sdk_core::worker::workflow: Failing workflow task run_id=b4c01744-339b-4d39-98f8-10e0c7d3846c failure=Failure { failure: Some(Failure { message: "Fatal("Timer fired event did not have expected timer id 8, it was 2!")", source: "", stack_trace: "", encoded_attributes: None, cause: None, failure_info: Some(ApplicationFailureInfo(ApplicationFailureInfo { r#type: "", non_retryable: false, details: None })) }), force_cause: Unspecified }

Unsure if this is due to GeneratorExit above or if that failure just surfaced the problem.

Replication

import asyncio
import logging
from uuid import uuid4

from temporalio import workflow
from temporalio.testing import WorkflowEnvironment
from temporalio.worker import Worker


@workflow.defn
class MyWorkflow:
    @workflow.run
    async def run(self) -> None:
        try:
            for _ in range(20):
                await asyncio.sleep(0.5)
        finally:
            await asyncio.sleep(1)

    @workflow.signal
    async def my_signal(self) -> None:
        pass


async def main():
    logging.basicConfig(level=logging.INFO)

    async with await WorkflowEnvironment.start_local() as env:
        logging.info("Starting worker")
        task_queue = f"tq-{uuid4()}"
        async with Worker(
            env.client,
            task_queue=task_queue,
            workflows=[MyWorkflow],
            max_cached_workflows=0,
        ):

            logging.info("Starting workflow")
            handle = await env.client.start_workflow(
                MyWorkflow.run,
                id=f"wf-{uuid4()}",
                task_queue=task_queue,
            )

            logging.info("Signalling every 300ms")
            for _ in range(20):
                await asyncio.sleep(0.3)
                await handle.signal(MyWorkflow.my_signal)

            logging.info("Waiting for workflow completion")
            await handle.result()
            logging.info("Workflow done")


if __name__ == "__main__":
    asyncio.run(main())

This is based on a user repo. This disables cache and makes a bunch of timers. But this causes constant evictions for each timer and signal sent which is causing the GeneratorExit issue. But that GeneratorExit issue is also causing this core error. I have not yet determined whether the GeneratorExit issue is surfacing a core bug or is just the bug itself w/ an eviction problem and core is somehow showing it.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions