-
Notifications
You must be signed in to change notification settings - Fork 12
chore: rollback rule #733
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
chore: rollback rule #733
Conversation
WalkthroughThis pull request introduces a rollback feature for the workspace engine, enabling automatic rollback of deployments to previously successful releases when jobs fail or verifications fail. The feature spans schema definitions, policy evaluation, action orchestration, and comprehensive testing. Changes
Sequence DiagramssequenceDiagram
autonumber
participant JobAgent as Job Agent
participant Orchestrator as Action Orchestrator
participant RollbackAction as Rollback Action
participant Store as Store
participant Dispatcher as Job Dispatcher
JobAgent->>Orchestrator: Job status changed (e.g., Failure)
activate Orchestrator
Orchestrator->>Orchestrator: determineTrigger: detect status change
Orchestrator->>RollbackAction: Execute(TriggerJobStatusChange, context)
deactivate Orchestrator
activate RollbackAction
RollbackAction->>Store: Load policies & job/release context
RollbackAction->>RollbackAction: shouldRollback: check if job status<br/>in any enabled rule's rollBackJobStatuses
alt Rollback applicable & prior release exists
RollbackAction->>Store: Create new Job with prior release UUID
RollbackAction->>Store: Upsert Job
RollbackAction->>Dispatcher: Dispatch rollback job
Dispatcher->>JobAgent: Execute deployment from prior release
else No rollback or no prior release
RollbackAction->>RollbackAction: Record span status & exit gracefully
end
deactivate RollbackAction
sequenceDiagram
autonumber
participant VerificationManager as Verification Manager
participant RollbackHooks as Rollback Hooks
participant PolicyEvaluator as Policy Evaluator
participant Store as Store
participant Dispatcher as Job Dispatcher
VerificationManager->>RollbackHooks: OnVerificationComplete(result with failure)
activate RollbackHooks
RollbackHooks->>Store: Load associated job & release
RollbackHooks->>PolicyEvaluator: Evaluate rollback via shouldRollbackOnVerificationFailure
activate PolicyEvaluator
PolicyEvaluator->>Store: Load policies with rollback rules
PolicyEvaluator->>PolicyEvaluator: Check onVerificationFailure flag<br/>in RollbackRule
alt onVerificationFailure=true & prior release exists
PolicyEvaluator-->>RollbackHooks: Return true
else Not configured or no prior release
PolicyEvaluator-->>RollbackHooks: Return false
end
deactivate PolicyEvaluator
alt Should rollback
RollbackHooks->>Store: Create rollback Job
RollbackHooks->>Dispatcher: Dispatch job
Dispatcher->>VerificationManager: Execute prior release
else No rollback needed
RollbackHooks->>RollbackHooks: Record span event & exit
end
deactivate RollbackHooks
Estimated Code Review Effort🎯 4 (Complex) | ⏱️ ~45 minutes The changes involve substantial new logic across multiple interconnected components: rollback action execution with state management, verification hooks with policy evaluation, new Go types and schema definitions, and intricate control flow for determining when and how to rollback. The feature spans policy evaluation, action orchestration, and integration points within the release manager, requiring careful review of policy-matching logic, state handling, and the interaction between job status triggers and verification-based rollbacks. Poem
Pre-merge checks and finishing touches❌ Failed checks (1 inconclusive)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (4)
apps/workspace-engine/oapi/spec/schemas/policy.jsonnet (1)
68-69: RollbackRule schema wiring looks good; consider field-name casingThe new
rollback: openapi.schemaRef('RollbackRule')onPolicyRuleand theRollbackRuledefinition itself are wired consistently with the rest of the schema surface.One minor API-shape nit: the property name
rollBackJobStatusesis slightly inconsistent with other fields likeretryOnStatuses/successStatuses. If you want to standardize before this goes public,rollbackJobStatuses(lowercase “b”) would better match existing naming patterns in both JSON and Go.Please double‑check any consumers/generated code for
rollBackJobStatusesbefore considering a rename, since this is part of the external API.Also applies to: 239-253
apps/workspace-engine/oapi/openapi.json (1)
1287-1335: OpenAPI rollback schema matches jsonnet; watchrollBackJobStatusescasingThe OpenAPI additions for:
PolicyRule.properties.rollback -> $ref: "#/components/schemas/RollbackRule", and- the new
RollbackRuleschema withonVerificationFailureandrollBackJobStatusesare consistent with the jsonnet source and the rest of the components block.
Same minor naming concern as in the jsonnet:
rollBackJobStatusesbreaks the usual lower‑camel convention and will be the canonical JSON field name exposed to clients. If you intendrollbackJobStatuses, now is the time to adjust (and regenerate) before external consumers depend on it.Please confirm the desired JSON field spelling and, if you change it, regenerate
oapi.gen.goand update any callers relying on the old name.Also applies to: 1857-1873
apps/workspace-engine/pkg/workspace/workspace.go (1)
3-16: Rollback action wiring into workspace orchestrator looks coherentThe new rollback action:
RegisterAction( rollback.NewRollbackAction( s, jobs.NewDispatcher(s, ws.releasemanager.VerificationManager()), ), )is wired consistently with the rest of the workspace setup: it shares the same
storeandVerificationManager, and is appended to the existing action chain without altering prior behavior.Creating a dedicated
Dispatcherinstance here is fine; if you ever want to reuse it across actions, you could hoist it into the workspace struct, but that’s optional.Also applies to: 36-54
apps/workspace-engine/test/e2e/engine_policy_rollback_test.go (1)
487-585: Strong rollback e2e coverage; consider avoiding fixedtime.Sleepfor verification flowsThese tests do a good job covering:
- Job-status-based rollback (single and multiple statuses, disabled/mismatched policies, no previous release).
- Verification-based rollback with
WithRuleRollbackOnVerificationFailure.- Combined job + verification configuration and the “same release” edge case.
- Realistic wiring via events and the
createJobUpdateEventhelper.One thing to watch: the verification-centric tests rely on
time.Sleep(3 * time.Second)to wait for verification to complete. That works, but it lengthens the suite and can become flaky if verification processing ever takes longer than expected.If you start seeing flakes or want to speed things up, consider replacing the fixed sleeps with a small polling loop (with a timeout) that checks either:
- The verification status in the store, or
- The presence/absence of pending jobs indicating rollback has (or has not) occurred.
For now, the structure and assertions look solid.
Also applies to: 591-685, 687-769, 771-850, 852-964, 966-983
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (13)
apps/workspace-engine/oapi/openapi.jsonapps/workspace-engine/oapi/spec/schemas/policy.jsonnetapps/workspace-engine/pkg/oapi/oapi.gen.goapps/workspace-engine/pkg/workspace/releasemanager/action/action.goapps/workspace-engine/pkg/workspace/releasemanager/action/orchestrator.goapps/workspace-engine/pkg/workspace/releasemanager/action/rollback/hooks.goapps/workspace-engine/pkg/workspace/releasemanager/action/rollback/rollback.goapps/workspace-engine/pkg/workspace/releasemanager/manager.goapps/workspace-engine/pkg/workspace/releasemanager/policy/evaluator/evaulator.goapps/workspace-engine/pkg/workspace/releasemanager/policy/evaluator/rollback/rollback.goapps/workspace-engine/pkg/workspace/workspace.goapps/workspace-engine/test/e2e/engine_policy_rollback_test.goapps/workspace-engine/test/integration/opts.go
🧰 Additional context used
📓 Path-based instructions (3)
apps/workspace-engine/**/*.go
📄 CodeRabbit inference engine (apps/workspace-engine/CLAUDE.md)
apps/workspace-engine/**/*.go: Do not add extraneous inline comments that state the obvious
Do not add comments that simply restate what the code does
Do not add comments for standard Go patterns (e.g., noting WaitGroup or semaphore usage)
Write comments that explain why, document complex logic/algorithms, provide non-obvious context, include TODO/FIXME, and document exported functions/types/methods
Files:
apps/workspace-engine/pkg/workspace/releasemanager/policy/evaluator/evaulator.goapps/workspace-engine/pkg/workspace/workspace.goapps/workspace-engine/pkg/workspace/releasemanager/action/action.goapps/workspace-engine/test/integration/opts.goapps/workspace-engine/pkg/workspace/releasemanager/action/rollback/rollback.goapps/workspace-engine/pkg/oapi/oapi.gen.goapps/workspace-engine/pkg/workspace/releasemanager/manager.goapps/workspace-engine/pkg/workspace/releasemanager/action/rollback/hooks.goapps/workspace-engine/pkg/workspace/releasemanager/policy/evaluator/rollback/rollback.goapps/workspace-engine/test/e2e/engine_policy_rollback_test.goapps/workspace-engine/pkg/workspace/releasemanager/action/orchestrator.go
apps/workspace-engine/**/*_test.go
📄 CodeRabbit inference engine (apps/workspace-engine/CLAUDE.md)
Follow the existing test structure used in *_test.go files
Files:
apps/workspace-engine/test/e2e/engine_policy_rollback_test.go
**/*.{js,jsx,ts,tsx,json,md,yml,yaml}
📄 CodeRabbit inference engine (CLAUDE.md)
Formatting: Prettier is used with
@ctrlplane/prettier-config
Files:
apps/workspace-engine/oapi/openapi.json
🧠 Learnings (11)
📓 Common learnings
Learnt from: adityachoudhari26
Repo: ctrlplanedev/ctrlplane PR: 637
File: packages/events/src/kafka/client.ts:10-16
Timestamp: 2025-08-01T04:41:41.345Z
Learning: User adityachoudhari26 prefers not to add null safety checks for required environment variables when they are guaranteed to be present in their deployment configuration, similar to their preference for simplicity over defensive programming in test code.
Learnt from: adityachoudhari26
Repo: ctrlplanedev/ctrlplane PR: 601
File: e2e/tests/api/policies/retry-policy.spec.ts:23-24
Timestamp: 2025-06-24T23:52:50.732Z
Learning: The user adityachoudhari26 prefers not to add null safety checks or defensive programming in test code, particularly in e2e tests, as they prioritize simplicity and focus on the main functionality being tested rather than comprehensive error handling within the test itself.
📚 Learning: 2025-08-12T20:49:05.086Z
Learnt from: adityachoudhari26
Repo: ctrlplanedev/ctrlplane PR: 655
File: apps/workspace-engine/pkg/engine/workspace/fluent.go:166-171
Timestamp: 2025-08-12T20:49:05.086Z
Learning: The UpdateDeploymentVersions() method with OperationCreate case in apps/workspace-engine/pkg/engine/workspace/fluent.go is specifically designed only for creating new deployment versions, not for handling potential duplicates or existing versions.
Applied to files:
apps/workspace-engine/pkg/workspace/workspace.goapps/workspace-engine/test/e2e/engine_policy_rollback_test.go
📚 Learning: 2025-08-12T18:13:54.630Z
Learnt from: CR
Repo: ctrlplanedev/ctrlplane PR: 0
File: apps/workspace-engine/CLAUDE.md:0-0
Timestamp: 2025-08-12T18:13:54.630Z
Learning: Applies to apps/workspace-engine/**/*.go : Write comments that explain why, document complex logic/algorithms, provide non-obvious context, include TODO/FIXME, and document exported functions/types/methods
Applied to files:
apps/workspace-engine/pkg/workspace/workspace.goapps/workspace-engine/test/integration/opts.goapps/workspace-engine/pkg/workspace/releasemanager/manager.goapps/workspace-engine/test/e2e/engine_policy_rollback_test.go
📚 Learning: 2025-10-24T00:02:29.723Z
Learnt from: adityachoudhari26
Repo: ctrlplanedev/ctrlplane PR: 696
File: apps/workspace-engine/pkg/workspace/releasemanager/deployment/executor.go:63-67
Timestamp: 2025-10-24T00:02:29.723Z
Learning: In workspace-engine replay mode (apps/workspace-engine/pkg/workspace/releasemanager/deployment/executor.go), jobs are created with Pending status even during replay because job updates are still accepted and processed. If a job remains Pending after replay completes, it genuinely is still pending because no updates occurred during replay.
Applied to files:
apps/workspace-engine/pkg/workspace/workspace.goapps/workspace-engine/test/e2e/engine_policy_rollback_test.go
📚 Learning: 2025-08-12T18:13:54.630Z
Learnt from: CR
Repo: ctrlplanedev/ctrlplane PR: 0
File: apps/workspace-engine/CLAUDE.md:0-0
Timestamp: 2025-08-12T18:13:54.630Z
Learning: Applies to apps/workspace-engine/**/*_test.go : Follow the existing test structure used in *_test.go files
Applied to files:
apps/workspace-engine/test/integration/opts.goapps/workspace-engine/pkg/workspace/releasemanager/manager.goapps/workspace-engine/test/e2e/engine_policy_rollback_test.go
📚 Learning: 2025-08-12T18:13:54.630Z
Learnt from: CR
Repo: ctrlplanedev/ctrlplane PR: 0
File: apps/workspace-engine/CLAUDE.md:0-0
Timestamp: 2025-08-12T18:13:54.630Z
Learning: Applies to apps/workspace-engine/pkg/mapping/protobuf_mappings.go : Add mapping functions for the new condition in pkg/mapping/protobuf_mappings.go
Applied to files:
apps/workspace-engine/test/integration/opts.go
📚 Learning: 2025-08-12T18:13:54.630Z
Learnt from: CR
Repo: ctrlplanedev/ctrlplane PR: 0
File: apps/workspace-engine/CLAUDE.md:0-0
Timestamp: 2025-08-12T18:13:54.630Z
Learning: Applies to apps/workspace-engine/**/*.go : Do not add comments for standard Go patterns (e.g., noting WaitGroup or semaphore usage)
Applied to files:
apps/workspace-engine/pkg/workspace/releasemanager/manager.goapps/workspace-engine/test/e2e/engine_policy_rollback_test.go
📚 Learning: 2025-08-12T18:13:54.630Z
Learnt from: CR
Repo: ctrlplanedev/ctrlplane PR: 0
File: apps/workspace-engine/CLAUDE.md:0-0
Timestamp: 2025-08-12T18:13:54.630Z
Learning: Applies to apps/workspace-engine/pkg/model/selector/**/*_test.go : Write comprehensive, data-driven tests for new condition types
Applied to files:
apps/workspace-engine/test/e2e/engine_policy_rollback_test.go
📚 Learning: 2025-08-12T18:13:54.630Z
Learnt from: CR
Repo: ctrlplanedev/ctrlplane PR: 0
File: apps/workspace-engine/CLAUDE.md:0-0
Timestamp: 2025-08-12T18:13:54.630Z
Learning: Applies to apps/workspace-engine/**/*.go : Do not add comments that simply restate what the code does
Applied to files:
apps/workspace-engine/test/e2e/engine_policy_rollback_test.go
📚 Learning: 2025-08-12T18:13:54.630Z
Learnt from: CR
Repo: ctrlplanedev/ctrlplane PR: 0
File: apps/workspace-engine/CLAUDE.md:0-0
Timestamp: 2025-08-12T18:13:54.630Z
Learning: Applies to apps/workspace-engine/pkg/model/selector/**/*_test.go : Use table-driven tests for all condition types
Applied to files:
apps/workspace-engine/test/e2e/engine_policy_rollback_test.go
📚 Learning: 2025-08-12T18:13:54.630Z
Learnt from: CR
Repo: ctrlplanedev/ctrlplane PR: 0
File: apps/workspace-engine/CLAUDE.md:0-0
Timestamp: 2025-08-12T18:13:54.630Z
Learning: Applies to apps/workspace-engine/**/*.go : Do not add extraneous inline comments that state the obvious
Applied to files:
apps/workspace-engine/test/e2e/engine_policy_rollback_test.go
🧬 Code graph analysis (8)
apps/workspace-engine/pkg/workspace/workspace.go (2)
apps/workspace-engine/pkg/workspace/releasemanager/action/rollback/rollback.go (1)
NewRollbackAction(25-30)apps/workspace-engine/pkg/workspace/releasemanager/deployment/jobs/dispatcher.go (1)
NewDispatcher(20-25)
apps/workspace-engine/test/integration/opts.go (1)
apps/workspace-engine/pkg/oapi/oapi.gen.go (3)
JobStatus(688-688)PolicyRule(791-805)RollbackRule(1001-1007)
apps/workspace-engine/pkg/workspace/releasemanager/action/rollback/rollback.go (3)
packages/trpc/src/trpc.ts (1)
Context(26-26)apps/workspace-engine/pkg/workspace/releasemanager/action/action.go (2)
ActionTrigger(9-9)ActionContext(20-24)apps/workspace-engine/pkg/oapi/oapi.gen.go (6)
Release(902-908)Job(657-671)ReleaseTarget(911-915)JobAgentConfig(683-685)Policy(768-781)JobStatus(688-688)
apps/workspace-engine/pkg/oapi/oapi.gen.go (1)
apps/web/app/routes/ws/deployments/_components/types.ts (1)
JobStatus(11-21)
apps/workspace-engine/pkg/workspace/releasemanager/manager.go (3)
apps/workspace-engine/pkg/workspace/releasemanager/action/rollback/hooks.go (1)
NewRollbackHooks(23-28)apps/workspace-engine/pkg/workspace/releasemanager/deployment/jobs/dispatcher.go (1)
NewDispatcher(20-25)apps/workspace-engine/pkg/workspace/releasemanager/verification/hooks.go (1)
NewCompositeHooks(62-64)
apps/workspace-engine/pkg/workspace/releasemanager/action/rollback/hooks.go (3)
apps/workspace-engine/pkg/workspace/releasemanager/verification/hooks.go (1)
VerificationHooks(8-23)apps/workspace-engine/pkg/oapi/oapi.gen.go (7)
JobVerification(710-721)VerificationMeasurement(1148-1160)JobVerificationStatusFailed(158-158)ReleaseTarget(911-915)Job(657-671)JobAgentConfig(683-685)Policy(768-781)apps/workspace-engine/pkg/workspace/releasemanager/manager.go (1)
New(43-64)
apps/workspace-engine/test/e2e/engine_policy_rollback_test.go (6)
apps/workspace-engine/pkg/oapi/oapi.gen.go (11)
JobStatusFailure(129-129)JobStatusSuccessful(135-135)JobStatusCancelled(127-127)JobStatusInvalidJobAgent(132-132)PolicyTargetSelector(841-846)PolicyRule(791-805)RollbackRule(1001-1007)MetricProvider(752-754)SleepMetricProvider(1050-1055)Sleep(205-205)VerificationMetricSpec(1166-1188)apps/workspace-engine/test/integration/creators/deployment_version.go (1)
NewDeploymentVersion(15-36)apps/workspace-engine/pkg/events/handler/handler.go (4)
DeploymentVersionCreate(44-44)JobUpdate(68-68)PolicyCreate(70-70)ResourceUpdate(27-27)apps/workspace-engine/test/integration/creators/policy.go (2)
NewPolicy(13-31)NewPolicyTargetSelector(34-40)apps/workspace-engine/pkg/memsql/memsql_test.go (1)
ResourceSelector(2874-2877)packages/rule-engine/src/manager/types.ts (1)
ReleaseManager(10-12)
apps/workspace-engine/pkg/workspace/releasemanager/action/orchestrator.go (1)
apps/workspace-engine/pkg/workspace/releasemanager/action/action.go (1)
TriggerJobStatusChange(16-16)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: workspace-engine-tests
- GitHub Check: Lint
- GitHub Check: tests
- GitHub Check: Typecheck
- GitHub Check: build (linux/amd64)
🔇 Additional comments (7)
apps/workspace-engine/pkg/workspace/releasemanager/action/action.go (1)
11-17: Trigger enum extension looks consistentAdding
TriggerJobStatusChangealongside the other job lifecycle triggers is consistent and matches its usage indetermineTrigger. No issues here.apps/workspace-engine/pkg/workspace/releasemanager/policy/evaluator/evaulator.go (1)
62-72: Rollback rule type constant is aligned and safeAdding
RuleTypeRollback = "rollback"to the rule-type constants cleanly extends the existing set and matches the new rollback evaluator/policy wiring. No behavior changes in this file.apps/workspace-engine/pkg/workspace/releasemanager/action/orchestrator.go (1)
110-139: New generic status-change trigger broadens action invocation surfaceThe fallback:
if currentStatus != previousStatus { return TriggerJobStatusChange }is a clean way to surface all non‑special status transitions while preserving precedence for the explicit created/started/success/failure cases.
This does mean actions will now be called for more transitions (e.g., cancelled/skipped) if they choose to handle
TriggerJobStatusChange. As long as existing actions either switch on specific triggers or no‑op for unknown ones (per theExecutecontract), this is safe.It’s worth quickly scanning existing
PolicyAction.Executeimplementations to ensure they don’t implicitly treat any non‑empty trigger as applicable.apps/workspace-engine/pkg/workspace/releasemanager/manager.go (1)
3-24: Rollback hooks are correctly integrated into the verification hook chainIn
New, wiring:releaseManagerHooks := newReleaseManagerVerificationHooks(store, stateCache) rollbackHooks := rollback.NewRollbackHooks(store, jobs.NewDispatcher(store, verificationManager)) compositeHooks := verification.NewCompositeHooks(releaseManagerHooks, rollbackHooks) verificationManager.SetHooks(compositeHooks)cleanly composes rollback behavior into the existing verification lifecycle without altering the public API. Using the same
storeandverificationManagerkeeps the dispatcher and hooks aligned with the rest of the release manager.Also applies to: 48-56
apps/workspace-engine/pkg/oapi/oapi.gen.go (1)
790-805: RollbackRule wiring into PolicyRule looks consistentThe added
Rollback *RollbackRulefield onPolicyRuleand theRollbackRuletype (withOnVerificationFailureandRollBackJobStatuses) follow the existing OpenAPI-generated patterns and align with how other rule types (e.g.,RetryRule,VersionCooldownRule) are modeled. No issues from the workspace-engine perspective as long as the OpenAPI spec is the single source of truth for these fields.Also applies to: 1000-1007
apps/workspace-engine/test/integration/opts.go (1)
901-939: Rollback PolicyRuleOption helpers are consistent with existing option patterns
WithRuleRollback,WithRuleRollbackOnJobStatuses, andWithRuleRollbackOnVerificationFailuremirror how retry and environment progression options are modeled (using pointer fields for slices/bools). This keeps tests expressive without introducing new patterns; the address-of-slice/bool usage is already established in this file.apps/workspace-engine/pkg/workspace/releasemanager/policy/evaluator/rollback/rollback.go (1)
19-86: RollbackEvaluator correctly encodes status- and verification-based rollback semanticsThe evaluator cleanly handles both configured rollback triggers: it denies when the latest job’s status is in
RollBackJobStatuses, and, whenOnVerificationFailureis set, it checks persisted job verifications and denies on any failure. The use ofScopeReleaseTarget, memoization, and sorting byCreatedAtfor “latest job” is consistent with the surrounding evaluator patterns.
| import ( | ||
| "context" | ||
| "time" | ||
| "workspace-engine/pkg/oapi" | ||
| "workspace-engine/pkg/workspace/releasemanager/deployment/jobs" | ||
| "workspace-engine/pkg/workspace/releasemanager/verification" | ||
| "workspace-engine/pkg/workspace/store" | ||
|
|
||
| "github.com/google/uuid" | ||
| "go.opentelemetry.io/otel/attribute" | ||
| "go.opentelemetry.io/otel/codes" | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Undefined tracer in rollback hooks (compile-time error) + minor job-shape nits
OnVerificationComplete calls tracer.Start(...), but this file neither imports go.opentelemetry.io/otel nor defines a tracer variable in the rollback package. That will fail to compile.
You can mirror the pattern from action/orchestrator.go / releasemanager/manager.go by importing otel and defining a package-level tracer:
Proposed fix for tracer + optional job metadata tweak
import (
"context"
"time"
"workspace-engine/pkg/oapi"
"workspace-engine/pkg/workspace/releasemanager/deployment/jobs"
"workspace-engine/pkg/workspace/releasemanager/verification"
"workspace-engine/pkg/workspace/store"
"github.com/google/uuid"
+ "go.opentelemetry.io/otel"
"go.opentelemetry.io/otel/attribute"
"go.opentelemetry.io/otel/codes"
)
+var tracer = otel.Tracer("workspace/releasemanager/rollback")
+
@@
now := time.Now()
newJob := oapi.Job{
Id: uuid.New().String(),
ReleaseId: lastSuccessfulJob.ReleaseId,
JobAgentId: lastSuccessfulJob.JobAgentId,
JobAgentConfig: lastSuccessfulJob.JobAgentConfig,
Status: oapi.JobStatusPending,
CreatedAt: now,
UpdatedAt: now,
+ // Optionally preserve metadata/trace context from the original successful job:
+ // Metadata: maps.Clone(lastSuccessfulJob.Metadata),
+ // TraceToken: lastSuccessfulJob.TraceToken,
}If you care about downstream API shape and tracing, also consider:
- Initializing
Metadata(either cloned fromlastSuccessfulJobor as an empty map) to avoidnullin serialized responses. - Optionally propagating
TraceTokenso rollback jobs remain linked in traces.
Also applies to: 42-45, 111-120, 138-156
🤖 Prompt for AI Agents
apps/workspace-engine/pkg/workspace/releasemanager/action/rollback/hooks.go
around lines 3-14 (and also affecting 42-45, 111-120, 138-156): the file calls
tracer.Start but neither imports otel nor defines a package-level tracer,
causing a compile error; import go.opentelemetry.io/otel and create a
package-level tracer (e.g., otel.Tracer("workspace/releasemanager/rollback"))
and use that tracer in the hook functions (ensure spans are ended, set
attributes/status as done elsewhere), and while here, initialize Job.Metadata to
a non-nil map (clone from lastSuccessfulJob.Metadata if present or
make(map[string]string)) and propagate lastSuccessfulJob.TraceToken into the new
rollback job so traces remain linked.
| if !r.shouldRollback(actx.Policies, actx.Job.Status) { | ||
| span.SetAttributes(attribute.Bool("rollback_applicable", false)) | ||
| span.SetStatus(codes.Ok, "no applicable rollback policy") | ||
| return nil | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OnVerificationFailure rollback rules are never honored in RollbackAction
Right now Execute bails out unless:
- At least one enabled policy has a
Rollbackrule, and - That rule’s
RollBackJobStatusesslice is non-nil and contains the current job status.
OnVerificationFailure is not considered at all in shouldRollback, so any policy that configures rollback only via OnVerificationFailure (with RollBackJobStatuses == nil) will never trigger a rollback from this action, even if verifications have clearly failed and the evaluator has decided the rollback rule should deny.
This contradicts the intended behavior implied by the RollbackRule schema and the new e2e tests that exercise WithRuleRollbackOnVerificationFailure() and combined job/verification configurations.
To fix this, you probably want shouldRollback (or the Execute gating) to also take OnVerificationFailure into account, for example by:
- Either removing the policy re-check here and letting the action be invoked only when the rollback evaluator has already decided rollback is required, or
- Extending
shouldRollbackto:- Still check
RollBackJobStatusesagainstjobStatus, and - When
OnVerificationFailureis true, inspectstore.JobVerifications.GetByJobId(job.Id)and return true if any verification hasJobVerificationStatusFailed.
- Still check
A possible direction (simplified, assuming you pass the job into shouldRollback) would look like:
Example adjustment to consider
-func (r *RollbackAction) Execute(
+func (r *RollbackAction) Execute(
ctx context.Context,
trigger action.ActionTrigger,
actx action.ActionContext,
) error {
@@
- if !r.shouldRollback(actx.Policies, actx.Job.Status) {
+ if !r.shouldRollback(actx.Policies, actx.Job) {
span.SetAttributes(attribute.Bool("rollback_applicable", false))
span.SetStatus(codes.Ok, "no applicable rollback policy")
return nil
}
@@
-func (r *RollbackAction) shouldRollback(policies []*oapi.Policy, jobStatus oapi.JobStatus) bool {
+func (r *RollbackAction) shouldRollback(policies []*oapi.Policy, job *oapi.Job) bool {
+ jobStatus := job.Status
for _, policy := range policies {
if !policy.Enabled {
continue
}
for _, rule := range policy.Rules {
if rule.Rollback == nil {
continue
}
- if rule.Rollback.RollBackJobStatuses != nil &&
- slices.Contains(*rule.Rollback.RollBackJobStatuses, jobStatus) {
- return true
- }
+ rb := rule.Rollback
+ if rb.RollBackJobStatuses != nil &&
+ slices.Contains(*rb.RollBackJobStatuses, jobStatus) {
+ return true
+ }
+
+ if rb.OnVerificationFailure != nil && *rb.OnVerificationFailure {
+ verifications := r.store.JobVerifications.GetByJobId(job.Id)
+ for _, v := range verifications {
+ if v.Status() == oapi.JobVerificationStatusFailed {
+ return true
+ }
+ }
+ }
}
}
return false
}That keeps the local decision logic aligned with how the rollback evaluator interprets RollbackRule, and makes the OnVerificationFailure flag effective for action execution.
Also applies to: 101-117
🤖 Prompt for AI Agents
In
apps/workspace-engine/pkg/workspace/releasemanager/action/rollback/rollback.go
around lines 51-56 (also apply same change to 101-117), the Execute path bails
out unless a policy has Rollback.RollBackJobStatuses matching the current job
status and therefore ignores RollbackRule.OnVerificationFailure; update the
gating so OnVerificationFailure is honored: either remove this local policy
re-check and rely on the evaluator decision, or extend shouldRollback (or
Execute) to accept the job and when a policy's OnVerificationFailure is true
call store.JobVerifications.GetByJobId(job.ID) and return true if any
verification has JobVerificationStatusFailed while still keeping existing
RollBackJobStatuses checks. Ensure span attributes/status handling remains
consistent when deciding to return nil vs proceed.
Summary by CodeRabbit
Release Notes
New Features
Tests
✏️ Tip: You can customize this high-level summary in your review settings.