Skip to content

Classify runtime errors across workflow boundaries#2709

Open
daryllimyt wants to merge 21 commits into
mainfrom
feat/error-clarity-1
Open

Classify runtime errors across workflow boundaries#2709
daryllimyt wants to merge 21 commits into
mainfrom
feat/error-clarity-1

Conversation

@daryllimyt
Copy link
Copy Markdown
Contributor

@daryllimyt daryllimyt commented May 16, 2026

Checklist

  • Read CONTRIBUTING.md.
  • PR title is short and non-generic (see previously merged PRs for examples).
  • PR only implements a single feature or fixes a single bug.
  • Tests passing (uv run pytest tests)?
  • Lint / pre-commits passing (pre-commit run --all-files)?

Description

This PR adds first-class runtime error classification for workflow execution and carries that classification across Temporal boundaries.

It introduces runtime error envelopes for user, platform, and infra failures, adds Temporal adapters for activity and workflow-originated failures, and wires those envelopes into DSL action execution, scheduler failures, trigger-input normalization, registry lock resolution, agent setup, tier/workspace activities, and related activity boundaries.

The PR also adds DSLWorkflowV2 behind TRACECAT__FEATURE_FLAGS=DSL_WORKFLOW_V2 so new executions can target the v2 workflow type while existing histories stay on the current workflow. Worker registration, workflow start paths, schedule update/start handling, generated frontend types, and event/history parsing are updated for both workflow types.

The mental model implemented here is:

  • User errors: caused by user-authored config, permissions, input, or user code.
  • Platform errors: caused by Tracecat orchestration/runtime invariants.
  • Infra errors: caused by backing services, storage, networking, or OS/resource failures.
  • Activity boundaries classify newly raised failures with ActivityRuntimeError.
  • Workflow-originated failures classify with WorkflowRuntimeError.
  • Workflow wrappers translating an activity failure preserve the activity's runtime envelope instead of reclassifying it.

Related Issues

N/A

Screenshots / Recordings

N/A

Steps to QA

Focused verification run locally:

uv run pytest tests/unit/test_materialize_context.py tests/unit/test_agent_preset_activities.py tests/unit/test_agent_activities.py tests/unit/test_workflow_definitions_activities.py tests/unit/test_tier_activities.py tests/unit/test_workspace_org_resolution.py tests/unit/test_registry_sync_workflow.py tests/unit/test_executor_activities.py
uv run pytest tests/unit/test_dsl_workflow_error_unwrap.py tests/unit/test_materialize_context.py
uv run basedpyright tracecat/temporal/errors.py tracecat/dsl/workflow.py tests/unit/test_dsl_workflow_error_unwrap.py
uv run ruff check tracecat/temporal/errors.py tracecat/dsl/workflow.py tests/unit/test_dsl_workflow_error_unwrap.py
uv run ruff format --check tracecat/temporal/errors.py tracecat/dsl/workflow.py tests/unit/test_dsl_workflow_error_unwrap.py

Commit-time hooks also passed for the committed changes.


Summary by cubic

Classifies runtime errors end-to-end and preserves their kind across activities, the scheduler, and workflows to improve error clarity and retries. Adds feature-flagged DSLWorkflowV2, consolidates error details into a single wrapper, and standardizes activity/workflow error boundaries.

  • New Features

    • Added runtime error envelopes (kind/origin/phase) in tracecat.runtime.errors and Temporal helpers in tracecat.temporal.errors (ActivityRuntimeError, WorkflowRuntimeError, TemporalErrorDetails, extract helpers).
    • Consolidated ApplicationError details into TemporalErrorDetails.v1 with payloads and a per-ref runtime_errors map; malformed details are ignored.
    • Standardized error boundaries via tracecat.temporal.activity_errors and tracecat.dsl.activity_errors to classify user/platform/infra errors and attach envelopes across storage, tiers/workspaces, workflow management/schedules, agents/sessions/presets, interactions.
    • Propagated envelopes across activity/workflow boundaries and into scheduler task exceptions; wrappers preserve activity classification; executor uses tracecat.executor.errors.ActionRuntimeError and retries infra errors by default.
    • Introduced DSLWorkflowV2 behind dsl-workflow-v2; workers register both; new starts/schedules route via dsl_workflow_run_method_for_new_execution; history/event parsing accepts both via is_dsl_workflow_type_name; frontend flag enums updated.
  • Bug Fixes

    • Preserved materialization retry semantics; honored workflow-scoped runtime errors for retries/failure; preserved scheduler error payloads and runtime envelopes in task exceptions.
    • Classified common failures as non-retryable user/platform errors (registry sync validation, workspace org missing, invalid concurrency caps, agent/preset/model/session lookups, interaction/session not found).
    • Replaced legacy temporal.exceptions.UserError with typed runtime errors; webhooks now use a generic workflow handle and return StoredObject.
    • CI: capped pytest xdist workers to 15 to avoid Redis DB collisions; simplified a one-off scheduler error message.

Written for commit 0b33755. Summary will update on new commits. Review in cubic

@blacksmith-sh

This comment has been minimized.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4117a02525

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread tracecat/temporal/errors.py Outdated
Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4 issues found across 40 files

Confidence score: 3/5

  • There is a concrete regression risk in tracecat/dsl/workflow.py: runtime-only ApplicationError details no longer match the error-handler parser’s expected ActionErrorInfo map shape, which can break handler dispatch and obscure the original workflow failure.
  • tracecat/runtime/errors.py may miss implicitly chained infra exceptions unless __context__ is traversed, creating medium risk of misclassification and harder diagnosis during failures.
  • packages/tracecat-ee/tracecat_ee/agent/workflows/durable.py and tracecat/dsl/action.py introduce behavior changes that can amplify failure impact (retrying deterministic validation via activity:fail_slow, and bypassing ActivityRuntimeError wrapping when storage init fails), so this is mergeable but with notable runtime-risk areas.
  • Pay close attention to tracecat/dsl/workflow.py, tracecat/runtime/errors.py, packages/tracecat-ee/tracecat_ee/agent/workflows/durable.py, tracecat/dsl/action.py - error-path compatibility and classification need validation to avoid masked failures and incorrect retries.
Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="tracecat/dsl/workflow.py">

<violation number="1" location="tracecat/dsl/workflow.py:579">
P1: New runtime-only ApplicationError details are incompatible with the existing error-handler parsing logic, which expects ActionErrorInfo maps. This can break error-handler workflow dispatch and mask the original workflow failure.</violation>
</file>

<file name="tracecat/runtime/errors.py">

<violation number="1" location="tracecat/runtime/errors.py:175">
P2: Include `__context__` in exception-chain traversal; otherwise implicitly chained infra exceptions can be missed and misclassified.</violation>
</file>

<file name="packages/tracecat-ee/tracecat_ee/agent/workflows/durable.py">

<violation number="1" location="packages/tracecat-ee/tracecat_ee/agent/workflows/durable.py:484">
P2: Using `activity:fail_slow` here causes deterministic subagent validation failures to retry multiple times instead of failing immediately.</violation>
</file>

<file name="tracecat/dsl/action.py">

<violation number="1" location="tracecat/dsl/action.py:744">
P2: `get_object_storage()` is outside the error-classification try block, so backend initialization failures bypass `ActivityRuntimeError` wrapping and lose runtime error classification.</violation>
</file>

Tip: cubic can generate docs of your entire codebase and keep them up to date. Try it here.
Re-trigger cubic

Comment thread tracecat/dsl/workflow.py
Comment thread tracecat/runtime/errors.py Outdated
Comment thread packages/tracecat-ee/tracecat_ee/agent/workflows/durable.py
Comment thread tracecat/dsl/action.py Outdated
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e41beddf65

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread tracecat/dsl/scheduler.py
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: cf698c9cb2

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread tracecat/dsl/workflow.py Outdated
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7d26370ccd

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread tracecat/executor/errors.py Outdated
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2bea4ff8e8

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread tracecat/dsl/scheduler.py Outdated
@zeropath-ai
Copy link
Copy Markdown

zeropath-ai Bot commented May 16, 2026

No security or compliance issues detected. Reviewed everything up to 0b33755.

Security Overview
Detected Code Changes

The diff is too large to display a summary of code changes.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b5eb46529b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread tracecat/temporal/errors.py Outdated
@cubic-dev-ai
Copy link
Copy Markdown
Contributor

cubic-dev-ai Bot commented May 21, 2026

You're iterating quickly on this pull request. To help protect your rate limits, cubic has paused automatic reviews on new pushes for now—when you're ready for another review, comment @cubic-dev-ai review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant