Description
Describe the bug
There are two problems here:
GeneratorExit is caught somewhere
When a coroutine is closed, Python throws a GeneratorExit
from the coroutine and expects that to be bubbled out. But somewhere we are not bubbling that out so Python throws a runtime exception on this. So we're getting:
RuntimeError: coroutine ignored GeneratorExit
Exception ignored in: <coroutine object _WorkflowInstanceImpl._apply_start_workflow..run_workflow at 0x000001F7B495E340>
With a stack trace.
Core timer error
When workflow is evicted, sometimes when it reruns from scratch it gets something like:
2023-06-05T22:04:56.500788Z WARN temporal_sdk_core::worker::workflow: Failing workflow task run_id=b4c01744-339b-4d39-98f8-10e0c7d3846c failure=Failure { failure: Some(Failure { message: "Fatal("Timer fired event did not have expected timer id 8, it was 2!")", source: "", stack_trace: "", encoded_attributes: None, cause: None, failure_info: Some(ApplicationFailureInfo(ApplicationFailureInfo { r#type: "", non_retryable: false, details: None })) }), force_cause: Unspecified }
Unsure if this is due to GeneratorExit
above or if that failure just surfaced the problem.
Replication
import asyncio
import logging
from uuid import uuid4
from temporalio import workflow
from temporalio.testing import WorkflowEnvironment
from temporalio.worker import Worker
@workflow.defn
class MyWorkflow:
@workflow.run
async def run(self) -> None:
try:
for _ in range(20):
await asyncio.sleep(0.5)
finally:
await asyncio.sleep(1)
@workflow.signal
async def my_signal(self) -> None:
pass
async def main():
logging.basicConfig(level=logging.INFO)
async with await WorkflowEnvironment.start_local() as env:
logging.info("Starting worker")
task_queue = f"tq-{uuid4()}"
async with Worker(
env.client,
task_queue=task_queue,
workflows=[MyWorkflow],
max_cached_workflows=0,
):
logging.info("Starting workflow")
handle = await env.client.start_workflow(
MyWorkflow.run,
id=f"wf-{uuid4()}",
task_queue=task_queue,
)
logging.info("Signalling every 300ms")
for _ in range(20):
await asyncio.sleep(0.3)
await handle.signal(MyWorkflow.my_signal)
logging.info("Waiting for workflow completion")
await handle.result()
logging.info("Workflow done")
if __name__ == "__main__":
asyncio.run(main())
This is based on a user repo. This disables cache and makes a bunch of timers. But this causes constant evictions for each timer and signal sent which is causing the GeneratorExit
issue. But that GeneratorExit
issue is also causing this core error. I have not yet determined whether the GeneratorExit
issue is surfacing a core bug or is just the bug itself w/ an eviction problem and core is somehow showing it.