Skip to content

Conversation

@adityachoudhari26
Copy link
Contributor

@adityachoudhari26 adityachoudhari26 commented Dec 23, 2025

Summary by CodeRabbit

Release Notes

  • New Features

    • Added automatic rollback capability that reverts to the last successful release when jobs fail or verification checks fail.
    • Rollback behavior is configurable through release policies, allowing granular control over which job statuses or verification failures trigger rollbacks.
  • Tests

    • Added comprehensive end-to-end tests covering rollback scenarios, including job failures, verification failures, policy matching, and edge cases.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 23, 2025

Walkthrough

This pull request introduces a rollback feature for the workspace engine, enabling automatic rollback of deployments to previously successful releases when jobs fail or verifications fail. The feature spans schema definitions, policy evaluation, action orchestration, and comprehensive testing.

Changes

Cohort / File(s) Summary
Schema & Type Definitions
apps/workspace-engine/oapi/openapi.json, apps/workspace-engine/oapi/spec/schemas/policy.jsonnet, apps/workspace-engine/pkg/oapi/oapi.gen.go
Adds new RollbackRule schema with onVerificationFailure (boolean) and rollBackJobStatuses (array) properties. Extends PolicyRule with optional rollback field referencing the new rule. Generated Go types created accordingly.
Action Trigger System
apps/workspace-engine/pkg/workspace/releasemanager/action/action.go, apps/workspace-engine/pkg/workspace/releasemanager/action/orchestrator.go
Introduces TriggerJobStatusChange constant as new action trigger. Updates determineTrigger logic to detect any status transition (current vs. previous) and emit the new trigger.
Rollback Orchestration
apps/workspace-engine/pkg/workspace/releasemanager/action/rollback/rollback.go, apps/workspace-engine/pkg/workspace/releasemanager/action/rollback/hooks.go
Implements RollbackAction to execute rollback operations, creating and dispatching new jobs referencing prior successful releases. Introduces RollbackHooks to listen for verification completion and trigger rollback jobs on failure when configured. Both include policy checking and telemetry.
Policy Evaluation
apps/workspace-engine/pkg/workspace/releasemanager/policy/evaluator/evaulator.go, apps/workspace-engine/pkg/workspace/releasemanager/policy/evaluator/rollback/rollback.go
Adds RuleTypeRollback constant to policy evaluator. Implements RollbackEvaluator to evaluate rollback eligibility based on job status, verification failure state, and configured rollback rules.
Manager Integration
apps/workspace-engine/pkg/workspace/releasemanager/manager.go, apps/workspace-engine/pkg/workspace/workspace.go
Integrates rollback hooks into release manager via CompositeHooks. Registers RollbackAction in workspace action orchestration chain with a new dispatcher instance.
Testing
apps/workspace-engine/test/integration/opts.go, apps/workspace-engine/test/e2e/engine_policy_rollback_test.go
Adds policy rule builder helpers (WithRuleRollback*) for test configuration. Introduces 11 comprehensive e2e test cases covering rollback on job failure, verification failure, edge cases (no prior release, unmatched status, disabled policy), and helper for constructing job update events.

Sequence Diagrams

sequenceDiagram
    autonumber
    participant JobAgent as Job Agent
    participant Orchestrator as Action Orchestrator
    participant RollbackAction as Rollback Action
    participant Store as Store
    participant Dispatcher as Job Dispatcher

    JobAgent->>Orchestrator: Job status changed (e.g., Failure)
    activate Orchestrator
    Orchestrator->>Orchestrator: determineTrigger: detect status change
    Orchestrator->>RollbackAction: Execute(TriggerJobStatusChange, context)
    deactivate Orchestrator
    
    activate RollbackAction
    RollbackAction->>Store: Load policies & job/release context
    RollbackAction->>RollbackAction: shouldRollback: check if job status<br/>in any enabled rule's rollBackJobStatuses
    alt Rollback applicable & prior release exists
        RollbackAction->>Store: Create new Job with prior release UUID
        RollbackAction->>Store: Upsert Job
        RollbackAction->>Dispatcher: Dispatch rollback job
        Dispatcher->>JobAgent: Execute deployment from prior release
    else No rollback or no prior release
        RollbackAction->>RollbackAction: Record span status & exit gracefully
    end
    deactivate RollbackAction
Loading
sequenceDiagram
    autonumber
    participant VerificationManager as Verification Manager
    participant RollbackHooks as Rollback Hooks
    participant PolicyEvaluator as Policy Evaluator
    participant Store as Store
    participant Dispatcher as Job Dispatcher

    VerificationManager->>RollbackHooks: OnVerificationComplete(result with failure)
    activate RollbackHooks
    RollbackHooks->>Store: Load associated job & release
    RollbackHooks->>PolicyEvaluator: Evaluate rollback via shouldRollbackOnVerificationFailure
    activate PolicyEvaluator
    PolicyEvaluator->>Store: Load policies with rollback rules
    PolicyEvaluator->>PolicyEvaluator: Check onVerificationFailure flag<br/>in RollbackRule
    alt onVerificationFailure=true & prior release exists
        PolicyEvaluator-->>RollbackHooks: Return true
    else Not configured or no prior release
        PolicyEvaluator-->>RollbackHooks: Return false
    end
    deactivate PolicyEvaluator
    
    alt Should rollback
        RollbackHooks->>Store: Create rollback Job
        RollbackHooks->>Dispatcher: Dispatch job
        Dispatcher->>VerificationManager: Execute prior release
    else No rollback needed
        RollbackHooks->>RollbackHooks: Record span event & exit
    end
    deactivate RollbackHooks
Loading

Estimated Code Review Effort

🎯 4 (Complex) | ⏱️ ~45 minutes

The changes involve substantial new logic across multiple interconnected components: rollback action execution with state management, verification hooks with policy evaluation, new Go types and schema definitions, and intricate control flow for determining when and how to rollback. The feature spans policy evaluation, action orchestration, and integration points within the release manager, requiring careful review of policy-matching logic, state handling, and the interaction between job status triggers and verification-based rollbacks.

Poem

🐰 A rollback so graceful, when jobs go awry,
We leap back to safety, no need to cry,
Policies guide us to the prior release,
Verification whispers bring automated peace,
One hop, then another, until all is complete! 🔄

Pre-merge checks and finishing touches

❌ Failed checks (1 inconclusive)
Check name Status Explanation Resolution
Title check ❓ Inconclusive The title 'chore: rollback rule' is vague and generic, using non-descriptive phrasing that doesn't convey the scope or meaningful details of the substantial changes implemented. Provide a more descriptive title that clarifies the main implementation, such as 'feat: implement automatic rollback on job failure and verification failure' or 'feat: add rollback policy evaluation and automatic rollback orchestration'.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage ✅ Passed Docstring coverage is 85.00% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch rollback-rule

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (4)
apps/workspace-engine/oapi/spec/schemas/policy.jsonnet (1)

68-69: RollbackRule schema wiring looks good; consider field-name casing

The new rollback: openapi.schemaRef('RollbackRule') on PolicyRule and the RollbackRule definition itself are wired consistently with the rest of the schema surface.

One minor API-shape nit: the property name rollBackJobStatuses is slightly inconsistent with other fields like retryOnStatuses / successStatuses. If you want to standardize before this goes public, rollbackJobStatuses (lowercase “b”) would better match existing naming patterns in both JSON and Go.

Please double‑check any consumers/generated code for rollBackJobStatuses before considering a rename, since this is part of the external API.

Also applies to: 239-253

apps/workspace-engine/oapi/openapi.json (1)

1287-1335: OpenAPI rollback schema matches jsonnet; watch rollBackJobStatuses casing

The OpenAPI additions for:

  • PolicyRule.properties.rollback -> $ref: "#/components/schemas/RollbackRule", and
  • the new RollbackRule schema with onVerificationFailure and rollBackJobStatuses

are consistent with the jsonnet source and the rest of the components block.

Same minor naming concern as in the jsonnet: rollBackJobStatuses breaks the usual lower‑camel convention and will be the canonical JSON field name exposed to clients. If you intend rollbackJobStatuses, now is the time to adjust (and regenerate) before external consumers depend on it.

Please confirm the desired JSON field spelling and, if you change it, regenerate oapi.gen.go and update any callers relying on the old name.

Also applies to: 1857-1873

apps/workspace-engine/pkg/workspace/workspace.go (1)

3-16: Rollback action wiring into workspace orchestrator looks coherent

The new rollback action:

RegisterAction(
    rollback.NewRollbackAction(
        s,
        jobs.NewDispatcher(s, ws.releasemanager.VerificationManager()),
    ),
)

is wired consistently with the rest of the workspace setup: it shares the same store and VerificationManager, and is appended to the existing action chain without altering prior behavior.

Creating a dedicated Dispatcher instance here is fine; if you ever want to reuse it across actions, you could hoist it into the workspace struct, but that’s optional.

Also applies to: 36-54

apps/workspace-engine/test/e2e/engine_policy_rollback_test.go (1)

487-585: Strong rollback e2e coverage; consider avoiding fixed time.Sleep for verification flows

These tests do a good job covering:

  • Job-status-based rollback (single and multiple statuses, disabled/mismatched policies, no previous release).
  • Verification-based rollback with WithRuleRollbackOnVerificationFailure.
  • Combined job + verification configuration and the “same release” edge case.
  • Realistic wiring via events and the createJobUpdateEvent helper.

One thing to watch: the verification-centric tests rely on time.Sleep(3 * time.Second) to wait for verification to complete. That works, but it lengthens the suite and can become flaky if verification processing ever takes longer than expected.

If you start seeing flakes or want to speed things up, consider replacing the fixed sleeps with a small polling loop (with a timeout) that checks either:

  • The verification status in the store, or
  • The presence/absence of pending jobs indicating rollback has (or has not) occurred.

For now, the structure and assertions look solid.

Also applies to: 591-685, 687-769, 771-850, 852-964, 966-983

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 6ce45fb and a011534.

📒 Files selected for processing (13)
  • apps/workspace-engine/oapi/openapi.json
  • apps/workspace-engine/oapi/spec/schemas/policy.jsonnet
  • apps/workspace-engine/pkg/oapi/oapi.gen.go
  • apps/workspace-engine/pkg/workspace/releasemanager/action/action.go
  • apps/workspace-engine/pkg/workspace/releasemanager/action/orchestrator.go
  • apps/workspace-engine/pkg/workspace/releasemanager/action/rollback/hooks.go
  • apps/workspace-engine/pkg/workspace/releasemanager/action/rollback/rollback.go
  • apps/workspace-engine/pkg/workspace/releasemanager/manager.go
  • apps/workspace-engine/pkg/workspace/releasemanager/policy/evaluator/evaulator.go
  • apps/workspace-engine/pkg/workspace/releasemanager/policy/evaluator/rollback/rollback.go
  • apps/workspace-engine/pkg/workspace/workspace.go
  • apps/workspace-engine/test/e2e/engine_policy_rollback_test.go
  • apps/workspace-engine/test/integration/opts.go
🧰 Additional context used
📓 Path-based instructions (3)
apps/workspace-engine/**/*.go

📄 CodeRabbit inference engine (apps/workspace-engine/CLAUDE.md)

apps/workspace-engine/**/*.go: Do not add extraneous inline comments that state the obvious
Do not add comments that simply restate what the code does
Do not add comments for standard Go patterns (e.g., noting WaitGroup or semaphore usage)
Write comments that explain why, document complex logic/algorithms, provide non-obvious context, include TODO/FIXME, and document exported functions/types/methods

Files:

  • apps/workspace-engine/pkg/workspace/releasemanager/policy/evaluator/evaulator.go
  • apps/workspace-engine/pkg/workspace/workspace.go
  • apps/workspace-engine/pkg/workspace/releasemanager/action/action.go
  • apps/workspace-engine/test/integration/opts.go
  • apps/workspace-engine/pkg/workspace/releasemanager/action/rollback/rollback.go
  • apps/workspace-engine/pkg/oapi/oapi.gen.go
  • apps/workspace-engine/pkg/workspace/releasemanager/manager.go
  • apps/workspace-engine/pkg/workspace/releasemanager/action/rollback/hooks.go
  • apps/workspace-engine/pkg/workspace/releasemanager/policy/evaluator/rollback/rollback.go
  • apps/workspace-engine/test/e2e/engine_policy_rollback_test.go
  • apps/workspace-engine/pkg/workspace/releasemanager/action/orchestrator.go
apps/workspace-engine/**/*_test.go

📄 CodeRabbit inference engine (apps/workspace-engine/CLAUDE.md)

Follow the existing test structure used in *_test.go files

Files:

  • apps/workspace-engine/test/e2e/engine_policy_rollback_test.go
**/*.{js,jsx,ts,tsx,json,md,yml,yaml}

📄 CodeRabbit inference engine (CLAUDE.md)

Formatting: Prettier is used with @ctrlplane/prettier-config

Files:

  • apps/workspace-engine/oapi/openapi.json
🧠 Learnings (11)
📓 Common learnings
Learnt from: adityachoudhari26
Repo: ctrlplanedev/ctrlplane PR: 637
File: packages/events/src/kafka/client.ts:10-16
Timestamp: 2025-08-01T04:41:41.345Z
Learning: User adityachoudhari26 prefers not to add null safety checks for required environment variables when they are guaranteed to be present in their deployment configuration, similar to their preference for simplicity over defensive programming in test code.
Learnt from: adityachoudhari26
Repo: ctrlplanedev/ctrlplane PR: 601
File: e2e/tests/api/policies/retry-policy.spec.ts:23-24
Timestamp: 2025-06-24T23:52:50.732Z
Learning: The user adityachoudhari26 prefers not to add null safety checks or defensive programming in test code, particularly in e2e tests, as they prioritize simplicity and focus on the main functionality being tested rather than comprehensive error handling within the test itself.
📚 Learning: 2025-08-12T20:49:05.086Z
Learnt from: adityachoudhari26
Repo: ctrlplanedev/ctrlplane PR: 655
File: apps/workspace-engine/pkg/engine/workspace/fluent.go:166-171
Timestamp: 2025-08-12T20:49:05.086Z
Learning: The UpdateDeploymentVersions() method with OperationCreate case in apps/workspace-engine/pkg/engine/workspace/fluent.go is specifically designed only for creating new deployment versions, not for handling potential duplicates or existing versions.

Applied to files:

  • apps/workspace-engine/pkg/workspace/workspace.go
  • apps/workspace-engine/test/e2e/engine_policy_rollback_test.go
📚 Learning: 2025-08-12T18:13:54.630Z
Learnt from: CR
Repo: ctrlplanedev/ctrlplane PR: 0
File: apps/workspace-engine/CLAUDE.md:0-0
Timestamp: 2025-08-12T18:13:54.630Z
Learning: Applies to apps/workspace-engine/**/*.go : Write comments that explain why, document complex logic/algorithms, provide non-obvious context, include TODO/FIXME, and document exported functions/types/methods

Applied to files:

  • apps/workspace-engine/pkg/workspace/workspace.go
  • apps/workspace-engine/test/integration/opts.go
  • apps/workspace-engine/pkg/workspace/releasemanager/manager.go
  • apps/workspace-engine/test/e2e/engine_policy_rollback_test.go
📚 Learning: 2025-10-24T00:02:29.723Z
Learnt from: adityachoudhari26
Repo: ctrlplanedev/ctrlplane PR: 696
File: apps/workspace-engine/pkg/workspace/releasemanager/deployment/executor.go:63-67
Timestamp: 2025-10-24T00:02:29.723Z
Learning: In workspace-engine replay mode (apps/workspace-engine/pkg/workspace/releasemanager/deployment/executor.go), jobs are created with Pending status even during replay because job updates are still accepted and processed. If a job remains Pending after replay completes, it genuinely is still pending because no updates occurred during replay.

Applied to files:

  • apps/workspace-engine/pkg/workspace/workspace.go
  • apps/workspace-engine/test/e2e/engine_policy_rollback_test.go
📚 Learning: 2025-08-12T18:13:54.630Z
Learnt from: CR
Repo: ctrlplanedev/ctrlplane PR: 0
File: apps/workspace-engine/CLAUDE.md:0-0
Timestamp: 2025-08-12T18:13:54.630Z
Learning: Applies to apps/workspace-engine/**/*_test.go : Follow the existing test structure used in *_test.go files

Applied to files:

  • apps/workspace-engine/test/integration/opts.go
  • apps/workspace-engine/pkg/workspace/releasemanager/manager.go
  • apps/workspace-engine/test/e2e/engine_policy_rollback_test.go
📚 Learning: 2025-08-12T18:13:54.630Z
Learnt from: CR
Repo: ctrlplanedev/ctrlplane PR: 0
File: apps/workspace-engine/CLAUDE.md:0-0
Timestamp: 2025-08-12T18:13:54.630Z
Learning: Applies to apps/workspace-engine/pkg/mapping/protobuf_mappings.go : Add mapping functions for the new condition in pkg/mapping/protobuf_mappings.go

Applied to files:

  • apps/workspace-engine/test/integration/opts.go
📚 Learning: 2025-08-12T18:13:54.630Z
Learnt from: CR
Repo: ctrlplanedev/ctrlplane PR: 0
File: apps/workspace-engine/CLAUDE.md:0-0
Timestamp: 2025-08-12T18:13:54.630Z
Learning: Applies to apps/workspace-engine/**/*.go : Do not add comments for standard Go patterns (e.g., noting WaitGroup or semaphore usage)

Applied to files:

  • apps/workspace-engine/pkg/workspace/releasemanager/manager.go
  • apps/workspace-engine/test/e2e/engine_policy_rollback_test.go
📚 Learning: 2025-08-12T18:13:54.630Z
Learnt from: CR
Repo: ctrlplanedev/ctrlplane PR: 0
File: apps/workspace-engine/CLAUDE.md:0-0
Timestamp: 2025-08-12T18:13:54.630Z
Learning: Applies to apps/workspace-engine/pkg/model/selector/**/*_test.go : Write comprehensive, data-driven tests for new condition types

Applied to files:

  • apps/workspace-engine/test/e2e/engine_policy_rollback_test.go
📚 Learning: 2025-08-12T18:13:54.630Z
Learnt from: CR
Repo: ctrlplanedev/ctrlplane PR: 0
File: apps/workspace-engine/CLAUDE.md:0-0
Timestamp: 2025-08-12T18:13:54.630Z
Learning: Applies to apps/workspace-engine/**/*.go : Do not add comments that simply restate what the code does

Applied to files:

  • apps/workspace-engine/test/e2e/engine_policy_rollback_test.go
📚 Learning: 2025-08-12T18:13:54.630Z
Learnt from: CR
Repo: ctrlplanedev/ctrlplane PR: 0
File: apps/workspace-engine/CLAUDE.md:0-0
Timestamp: 2025-08-12T18:13:54.630Z
Learning: Applies to apps/workspace-engine/pkg/model/selector/**/*_test.go : Use table-driven tests for all condition types

Applied to files:

  • apps/workspace-engine/test/e2e/engine_policy_rollback_test.go
📚 Learning: 2025-08-12T18:13:54.630Z
Learnt from: CR
Repo: ctrlplanedev/ctrlplane PR: 0
File: apps/workspace-engine/CLAUDE.md:0-0
Timestamp: 2025-08-12T18:13:54.630Z
Learning: Applies to apps/workspace-engine/**/*.go : Do not add extraneous inline comments that state the obvious

Applied to files:

  • apps/workspace-engine/test/e2e/engine_policy_rollback_test.go
🧬 Code graph analysis (8)
apps/workspace-engine/pkg/workspace/workspace.go (2)
apps/workspace-engine/pkg/workspace/releasemanager/action/rollback/rollback.go (1)
  • NewRollbackAction (25-30)
apps/workspace-engine/pkg/workspace/releasemanager/deployment/jobs/dispatcher.go (1)
  • NewDispatcher (20-25)
apps/workspace-engine/test/integration/opts.go (1)
apps/workspace-engine/pkg/oapi/oapi.gen.go (3)
  • JobStatus (688-688)
  • PolicyRule (791-805)
  • RollbackRule (1001-1007)
apps/workspace-engine/pkg/workspace/releasemanager/action/rollback/rollback.go (3)
packages/trpc/src/trpc.ts (1)
  • Context (26-26)
apps/workspace-engine/pkg/workspace/releasemanager/action/action.go (2)
  • ActionTrigger (9-9)
  • ActionContext (20-24)
apps/workspace-engine/pkg/oapi/oapi.gen.go (6)
  • Release (902-908)
  • Job (657-671)
  • ReleaseTarget (911-915)
  • JobAgentConfig (683-685)
  • Policy (768-781)
  • JobStatus (688-688)
apps/workspace-engine/pkg/oapi/oapi.gen.go (1)
apps/web/app/routes/ws/deployments/_components/types.ts (1)
  • JobStatus (11-21)
apps/workspace-engine/pkg/workspace/releasemanager/manager.go (3)
apps/workspace-engine/pkg/workspace/releasemanager/action/rollback/hooks.go (1)
  • NewRollbackHooks (23-28)
apps/workspace-engine/pkg/workspace/releasemanager/deployment/jobs/dispatcher.go (1)
  • NewDispatcher (20-25)
apps/workspace-engine/pkg/workspace/releasemanager/verification/hooks.go (1)
  • NewCompositeHooks (62-64)
apps/workspace-engine/pkg/workspace/releasemanager/action/rollback/hooks.go (3)
apps/workspace-engine/pkg/workspace/releasemanager/verification/hooks.go (1)
  • VerificationHooks (8-23)
apps/workspace-engine/pkg/oapi/oapi.gen.go (7)
  • JobVerification (710-721)
  • VerificationMeasurement (1148-1160)
  • JobVerificationStatusFailed (158-158)
  • ReleaseTarget (911-915)
  • Job (657-671)
  • JobAgentConfig (683-685)
  • Policy (768-781)
apps/workspace-engine/pkg/workspace/releasemanager/manager.go (1)
  • New (43-64)
apps/workspace-engine/test/e2e/engine_policy_rollback_test.go (6)
apps/workspace-engine/pkg/oapi/oapi.gen.go (11)
  • JobStatusFailure (129-129)
  • JobStatusSuccessful (135-135)
  • JobStatusCancelled (127-127)
  • JobStatusInvalidJobAgent (132-132)
  • PolicyTargetSelector (841-846)
  • PolicyRule (791-805)
  • RollbackRule (1001-1007)
  • MetricProvider (752-754)
  • SleepMetricProvider (1050-1055)
  • Sleep (205-205)
  • VerificationMetricSpec (1166-1188)
apps/workspace-engine/test/integration/creators/deployment_version.go (1)
  • NewDeploymentVersion (15-36)
apps/workspace-engine/pkg/events/handler/handler.go (4)
  • DeploymentVersionCreate (44-44)
  • JobUpdate (68-68)
  • PolicyCreate (70-70)
  • ResourceUpdate (27-27)
apps/workspace-engine/test/integration/creators/policy.go (2)
  • NewPolicy (13-31)
  • NewPolicyTargetSelector (34-40)
apps/workspace-engine/pkg/memsql/memsql_test.go (1)
  • ResourceSelector (2874-2877)
packages/rule-engine/src/manager/types.ts (1)
  • ReleaseManager (10-12)
apps/workspace-engine/pkg/workspace/releasemanager/action/orchestrator.go (1)
apps/workspace-engine/pkg/workspace/releasemanager/action/action.go (1)
  • TriggerJobStatusChange (16-16)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: workspace-engine-tests
  • GitHub Check: Lint
  • GitHub Check: tests
  • GitHub Check: Typecheck
  • GitHub Check: build (linux/amd64)
🔇 Additional comments (7)
apps/workspace-engine/pkg/workspace/releasemanager/action/action.go (1)

11-17: Trigger enum extension looks consistent

Adding TriggerJobStatusChange alongside the other job lifecycle triggers is consistent and matches its usage in determineTrigger. No issues here.

apps/workspace-engine/pkg/workspace/releasemanager/policy/evaluator/evaulator.go (1)

62-72: Rollback rule type constant is aligned and safe

Adding RuleTypeRollback = "rollback" to the rule-type constants cleanly extends the existing set and matches the new rollback evaluator/policy wiring. No behavior changes in this file.

apps/workspace-engine/pkg/workspace/releasemanager/action/orchestrator.go (1)

110-139: New generic status-change trigger broadens action invocation surface

The fallback:

if currentStatus != previousStatus {
    return TriggerJobStatusChange
}

is a clean way to surface all non‑special status transitions while preserving precedence for the explicit created/started/success/failure cases.

This does mean actions will now be called for more transitions (e.g., cancelled/skipped) if they choose to handle TriggerJobStatusChange. As long as existing actions either switch on specific triggers or no‑op for unknown ones (per the Execute contract), this is safe.

It’s worth quickly scanning existing PolicyAction.Execute implementations to ensure they don’t implicitly treat any non‑empty trigger as applicable.

apps/workspace-engine/pkg/workspace/releasemanager/manager.go (1)

3-24: Rollback hooks are correctly integrated into the verification hook chain

In New, wiring:

releaseManagerHooks := newReleaseManagerVerificationHooks(store, stateCache)
rollbackHooks := rollback.NewRollbackHooks(store, jobs.NewDispatcher(store, verificationManager))
compositeHooks := verification.NewCompositeHooks(releaseManagerHooks, rollbackHooks)
verificationManager.SetHooks(compositeHooks)

cleanly composes rollback behavior into the existing verification lifecycle without altering the public API. Using the same store and verificationManager keeps the dispatcher and hooks aligned with the rest of the release manager.

Also applies to: 48-56

apps/workspace-engine/pkg/oapi/oapi.gen.go (1)

790-805: RollbackRule wiring into PolicyRule looks consistent

The added Rollback *RollbackRule field on PolicyRule and the RollbackRule type (with OnVerificationFailure and RollBackJobStatuses) follow the existing OpenAPI-generated patterns and align with how other rule types (e.g., RetryRule, VersionCooldownRule) are modeled. No issues from the workspace-engine perspective as long as the OpenAPI spec is the single source of truth for these fields.

Also applies to: 1000-1007

apps/workspace-engine/test/integration/opts.go (1)

901-939: Rollback PolicyRuleOption helpers are consistent with existing option patterns

WithRuleRollback, WithRuleRollbackOnJobStatuses, and WithRuleRollbackOnVerificationFailure mirror how retry and environment progression options are modeled (using pointer fields for slices/bools). This keeps tests expressive without introducing new patterns; the address-of-slice/bool usage is already established in this file.

apps/workspace-engine/pkg/workspace/releasemanager/policy/evaluator/rollback/rollback.go (1)

19-86: RollbackEvaluator correctly encodes status- and verification-based rollback semantics

The evaluator cleanly handles both configured rollback triggers: it denies when the latest job’s status is in RollBackJobStatuses, and, when OnVerificationFailure is set, it checks persisted job verifications and denies on any failure. The use of ScopeReleaseTarget, memoization, and sorting by CreatedAt for “latest job” is consistent with the surrounding evaluator patterns.

Comment on lines +3 to +14
import (
"context"
"time"
"workspace-engine/pkg/oapi"
"workspace-engine/pkg/workspace/releasemanager/deployment/jobs"
"workspace-engine/pkg/workspace/releasemanager/verification"
"workspace-engine/pkg/workspace/store"

"github.com/google/uuid"
"go.opentelemetry.io/otel/attribute"
"go.opentelemetry.io/otel/codes"
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Undefined tracer in rollback hooks (compile-time error) + minor job-shape nits

OnVerificationComplete calls tracer.Start(...), but this file neither imports go.opentelemetry.io/otel nor defines a tracer variable in the rollback package. That will fail to compile.

You can mirror the pattern from action/orchestrator.go / releasemanager/manager.go by importing otel and defining a package-level tracer:

Proposed fix for tracer + optional job metadata tweak
 import (
 	"context"
 	"time"
 	"workspace-engine/pkg/oapi"
 	"workspace-engine/pkg/workspace/releasemanager/deployment/jobs"
 	"workspace-engine/pkg/workspace/releasemanager/verification"
 	"workspace-engine/pkg/workspace/store"

 	"github.com/google/uuid"
+	"go.opentelemetry.io/otel"
 	"go.opentelemetry.io/otel/attribute"
 	"go.opentelemetry.io/otel/codes"
 )
 
+var tracer = otel.Tracer("workspace/releasemanager/rollback")
+
@@
 	now := time.Now()
 	newJob := oapi.Job{
 		Id:             uuid.New().String(),
 		ReleaseId:      lastSuccessfulJob.ReleaseId,
 		JobAgentId:     lastSuccessfulJob.JobAgentId,
 		JobAgentConfig: lastSuccessfulJob.JobAgentConfig,
 		Status:         oapi.JobStatusPending,
 		CreatedAt:      now,
 		UpdatedAt:      now,
+		// Optionally preserve metadata/trace context from the original successful job:
+		// Metadata:   maps.Clone(lastSuccessfulJob.Metadata),
+		// TraceToken: lastSuccessfulJob.TraceToken,
 	}

If you care about downstream API shape and tracing, also consider:

  • Initializing Metadata (either cloned from lastSuccessfulJob or as an empty map) to avoid null in serialized responses.
  • Optionally propagating TraceToken so rollback jobs remain linked in traces.

Also applies to: 42-45, 111-120, 138-156

🤖 Prompt for AI Agents
apps/workspace-engine/pkg/workspace/releasemanager/action/rollback/hooks.go
around lines 3-14 (and also affecting 42-45, 111-120, 138-156): the file calls
tracer.Start but neither imports otel nor defines a package-level tracer,
causing a compile error; import go.opentelemetry.io/otel and create a
package-level tracer (e.g., otel.Tracer("workspace/releasemanager/rollback"))
and use that tracer in the hook functions (ensure spans are ended, set
attributes/status as done elsewhere), and while here, initialize Job.Metadata to
a non-nil map (clone from lastSuccessfulJob.Metadata if present or
make(map[string]string)) and propagate lastSuccessfulJob.TraceToken into the new
rollback job so traces remain linked.

Comment on lines +51 to +56
if !r.shouldRollback(actx.Policies, actx.Job.Status) {
span.SetAttributes(attribute.Bool("rollback_applicable", false))
span.SetStatus(codes.Ok, "no applicable rollback policy")
return nil
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

OnVerificationFailure rollback rules are never honored in RollbackAction

Right now Execute bails out unless:

  • At least one enabled policy has a Rollback rule, and
  • That rule’s RollBackJobStatuses slice is non-nil and contains the current job status.

OnVerificationFailure is not considered at all in shouldRollback, so any policy that configures rollback only via OnVerificationFailure (with RollBackJobStatuses == nil) will never trigger a rollback from this action, even if verifications have clearly failed and the evaluator has decided the rollback rule should deny.

This contradicts the intended behavior implied by the RollbackRule schema and the new e2e tests that exercise WithRuleRollbackOnVerificationFailure() and combined job/verification configurations.

To fix this, you probably want shouldRollback (or the Execute gating) to also take OnVerificationFailure into account, for example by:

  • Either removing the policy re-check here and letting the action be invoked only when the rollback evaluator has already decided rollback is required, or
  • Extending shouldRollback to:
    • Still check RollBackJobStatuses against jobStatus, and
    • When OnVerificationFailure is true, inspect store.JobVerifications.GetByJobId(job.Id) and return true if any verification has JobVerificationStatusFailed.

A possible direction (simplified, assuming you pass the job into shouldRollback) would look like:

Example adjustment to consider
-func (r *RollbackAction) Execute(
+func (r *RollbackAction) Execute(
     ctx context.Context,
     trigger action.ActionTrigger,
     actx action.ActionContext,
 ) error {
@@
-    if !r.shouldRollback(actx.Policies, actx.Job.Status) {
+    if !r.shouldRollback(actx.Policies, actx.Job) {
         span.SetAttributes(attribute.Bool("rollback_applicable", false))
         span.SetStatus(codes.Ok, "no applicable rollback policy")
         return nil
     }
@@
-func (r *RollbackAction) shouldRollback(policies []*oapi.Policy, jobStatus oapi.JobStatus) bool {
+func (r *RollbackAction) shouldRollback(policies []*oapi.Policy, job *oapi.Job) bool {
+    jobStatus := job.Status
     for _, policy := range policies {
         if !policy.Enabled {
             continue
         }
 
         for _, rule := range policy.Rules {
             if rule.Rollback == nil {
                 continue
             }
 
-            if rule.Rollback.RollBackJobStatuses != nil &&
-                slices.Contains(*rule.Rollback.RollBackJobStatuses, jobStatus) {
-                return true
-            }
+            rb := rule.Rollback
+            if rb.RollBackJobStatuses != nil &&
+                slices.Contains(*rb.RollBackJobStatuses, jobStatus) {
+                return true
+            }
+
+            if rb.OnVerificationFailure != nil && *rb.OnVerificationFailure {
+                verifications := r.store.JobVerifications.GetByJobId(job.Id)
+                for _, v := range verifications {
+                    if v.Status() == oapi.JobVerificationStatusFailed {
+                        return true
+                    }
+                }
+            }
         }
     }
 
     return false
 }

That keeps the local decision logic aligned with how the rollback evaluator interprets RollbackRule, and makes the OnVerificationFailure flag effective for action execution.

Also applies to: 101-117

🤖 Prompt for AI Agents
In
apps/workspace-engine/pkg/workspace/releasemanager/action/rollback/rollback.go
around lines 51-56 (also apply same change to 101-117), the Execute path bails
out unless a policy has Rollback.RollBackJobStatuses matching the current job
status and therefore ignores RollbackRule.OnVerificationFailure; update the
gating so OnVerificationFailure is honored: either remove this local policy
re-check and rely on the evaluator decision, or extend shouldRollback (or
Execute) to accept the job and when a policy's OnVerificationFailure is true
call store.JobVerifications.GetByJobId(job.ID) and return true if any
verification has JobVerificationStatusFailed while still keeping existing
RollBackJobStatuses checks. Ensure span attributes/status handling remains
consistent when deciding to return nil vs proceed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants