Revenant is a native, database-backed durable execution engine for Salesforce Apex, inspired by Temporal and DBOS. By orchestrating native platform features—Queueable Apex, Platform Events, Transaction Finalizers, and Apex Cursors—Revenant allows developers to build complex, reliable, and resumable state machines that survive transaction failures, governor limit exhaustion, and platform limits.
- Resumable Execution (Yielding): Long-running processing loops or query pagination steps can call
shouldYield()to monitor governor limits. If limits are exceeded, the step checkpoints its state to custom objects and resumes execution transparently in a fresh asynchronous transaction. - Scatter-Gather (Parallel Processing): Split execution flow across multiple parallel branches and rejoin their output payloads before moving to subsequent steps.
- Continue-As-New (Perpetual Loops): Execute perpetual poller tasks or long-lived daemons. A step can request a transition to a new successor run linked via
Previous_Instance__cto prevent storage footprint explosion and clear heap and debug log limits. The successor'sStepContext.previousRunAtcarries the engine-set timestamp of when the predecessor completed, so incremental polling workflows can query only records modified since the last run without any manual timestamp bookkeeping. See docs/incremental-polling.md and IncrementalSyncWorkflowExample.
- Distributed Transaction Rollbacks (Sagas): Steps implementing CompensatableStep register on a LIFO rollback stack upon successful forward completion. If a forward step fails permanently, the engine automatically executes their
compensatemethods in reverse order. - Recoverable Rollbacks (Rollback Incomplete): If a
compensate()step itself exhausts itsRetryPolicyor throws mid-rollback, the engine preserves the remaining LIFO stack intact and parks the saga in a distinguishable, operator-visibleCompensationFailed("Rollback Incomplete") state instead of silently abandoning the deeper compensations. The dashboard surfaces exactly how many forward effects are still un-reversed and offers a Resume Rollback action that replays the remaining stack from the stalled point — idempotently, append-only, and without ever re-running a successful compensation or a forward step. See SagaStalledRollbackExample. - Watchdog Step Timeouts: Steps can declare custom execution timeouts. A single global watchdog poller (WorkflowWatchdog) sweeps the database for any timed-out steps or suspended instances, failing or resuming them cleanly without hitting Salesforce's 100 concurrent scheduled jobs limit.
- Large Payload Offloading: When input, output, or state serialization strings exceed 100,000 characters (approaching the 131,072-character long text area limit), the engine transparently offloads the payload to
ContentVersionfiles and links them to the parent instance. - Effective-Once Side Effects (Idempotency Keys): The engine guarantees only at-least-once execution, so
execute()(andcompensate()) can re-run on retries, operator re-drives, and at-least-once event resumes. StepContext exposes a read-onlyidempotencyKeythat is stable across every re-execution of the same logical step yet distinct per(instance, step). Forward it to an external system (e.g. as a StripeIdempotency-Keyheader) for effective-once side effects without inventing your own dedup token. See IdempotentChargeWorkflowExample. - Capture-Once Local Values (
once()): Stabilizes values generated locally inside a step — generated reference numbers, Crypto-random tokens, random branch decisions, timestamps — so they do not drift across retries, yield/resume cycles, at-least-once event resumes, or operator re-drives.StepContext.once(key, producer)invokes the producer at most once per(instance, step, key)and returns the same durably recorded value on every subsequent re-execution. Captures are JSON-normalized for durability, so producers should return JSON-native values (capture a timestamp as epoch millis or an ISO string rather than a nativeDatetime), and a capture becomes durable once the step reaches its next checkpoint (complete/yield/sleep/suspend/graceful retry). Contrast withidempotencyKey: useidempotencyKeyto make an external API call effective-once; useonce()to keep a locally generated value stable. See CaptureOnceWorkflowExample.
- Platform Event Signaling: External integrations, webhook listeners, or human-in-the-loop approvals wake up suspended workflows by publishing
Workflow_Event__eplatform events. The resuming step reads the inbound signal name and payload directly fromStepContext(e.g.ctx.getSignal('Approve:Order')), and the engine marks observed signals consumed at the step'sCOMPLETEtransition so at-least-once redelivered duplicates are never double-processed. - Effectively-Once Outbound Emit (
ctx.emit()): A step can hand the engine one or more author-owned domain Platform Events to publish mid-workflow —ctx.emit(new Order_Shipped__e(...))orStepResult.complete(next, output, events)— instead of callingEventBus.publish()itself. The engine buffers them and publishes them in the same transaction that writes the step's append-onlyCOMPLETE/SPLITrecord, before the next step begins, so retries, yields, suspends, operator re-drives, and at-least-once resumes that re-runexecute()before that commit publish nothing, and a step already durablyCOMPLETEnever re-executes and so never re-emits — producer-side effectively-once with no hand-rolled dedup token. Authors keep their own__e(usepublishBehavior=PublishAfterCommit); the engine imposes no envelope. Delivery to subscribers stays at-least-once, so subscribers remain idempotent. This is the mid-flight outbound complement to inbound signals and the terminal lifecycle event. See OrderChoreographyWorkflowExample (workflow A emits an event that starts workflow B). - Outbound Lifecycle Events: The engine publishes a
Workflow_Lifecycle__eplatform event (outcome metadata only) each time an instance reaches a terminal state (Completed/Failed/Compensated/Cancelled), so a Flow Pause element or an external subscriber can react event-driven instead of polling — exactly one event per logical workflow (one perContinuedAsNewchain). Fire-and-forget and operator-toggleable viaRevenant_Config__mdt.Publish_Lifecycle_Events__c. See docs/workflow-lifecycle-event.md. - Salesforce Flow Interoperability: Launches or signals workflows using Invocable Actions from Salesforce Flow, or executes standard Autolaunched Flows as steps within a workflow using the generic
WorkflowFlowStepwrapper. - Custom Metadata Alerts: Supports operator-configurable failure notification thresholds (consecutive failures, sliding rate counts) using
Workflow_Alert_Config__mdtcustom metadata records. - Concurrency Limits (In-Flight Ceiling): Declare a maximum number of simultaneously in-flight instances per workflow definition via a
Concurrency_Config__mdtrecord (same DeveloperName +Default-fallback convention as alerts). A bursty start is throttled to a safe ceiling — instances beyond the limit park in a throttled state and re-attempt admission automatically through the existing sleep/watchdog plumbing, with the slot released on every terminal transition and leaked slots reclaimed by the watchdog. Distinct fromRateLimiter(rate/throughput) and get-or-start dedup. See docs/concurrency-limits.md. - Declarative Recurring Schedules (0-slot): Create a
Workflow_Schedule__crecord to run any workflow on a cron cadence — no Apex, and zero additional scheduled-job slots beyond the existing watchdog. See docs/recurring-schedules.md.
graph TD
Start([Start Workflow]) --> Enqueue[Enqueue WorkflowOrchestrator]
Enqueue --> Run[WorkflowOrchestrator runs runStep]
Run --> Hydrate[Hydrate Instance & Step State]
Hydrate --> Exec{Execute Step}
Exec -- COMPLETE --> Next{Has Next Step?}
Next -- Yes --> SaveStack[Append Compensatable Steps]
SaveStack --> Run
Next -- No --> Finish[Status: Completed]
Exec -- YIELD / SLEEP --> Suspend[Save State & Suspend]
Suspend --> Resume[Signal Event / Timer Fired]
Resume --> Enqueue
Exec -- ERROR / CRASH --> Fail{Has Compensation Stack?}
Fail -- Yes --> TransitionComp[Status: Compensating]
TransitionComp --> Rollback[Run Compensation Steps LIFO]
Rollback --> CompSuccess{All Rollbacks Done?}
CompSuccess -- Yes --> TerminalComp[Status: Compensated]
CompSuccess -- No --> RollbackFail{Compensation Step Fails?}
RollbackFail -- Yes --> TerminalFail[Status: Failed]
RollbackFail -- No --> Rollback
Fail -- No --> TerminalFail[Status: Failed]
TerminalFail --> Alert[WorkflowAlertManager Evaluates CMDT & Sends Email]
To create a step, implement the WorkflowStep interface (or CompensatableStep if rollback logic is required):
public class ProvisionSandboxStep implements CompensatableStep {
/**
* Executes forward step logic.
*/
public StepResult execute(StepContext ctx) {
// Business logic execution
String sandboxId = 'sb_98765';
// Return COMPLETE action and output payload
return StepResult.complete(null, new Map<String, Object>{'sandboxId' => sandboxId});
}
/**
* Executes rollback logic if a subsequent step in the DAG fails.
*/
public StepResult compensate(StepContext ctx) {
// Hydrate forward step state from the context
Map<String, Object> state = (Map<String, Object>)JSON.deserializeUntyped(ctx.stepStateJson);
String sandboxId = (String)state.get('sandboxId');
// De-provision resources
System.debug('Deprovisioning sandbox: ' + sandboxId);
return StepResult.complete(null, 'Deprovisioned');
}
}Create a class implementing WorkflowDefinition to model step transitions. The contract has three methods: getSteps() declares the complete step inventory (required — it is what WorkflowValidator checks the DAG against, see §9), getInitialStep() names the entry point, and getNextStep() routes transitions:
public class OnboardingWorkflow implements WorkflowDefinition {
/**
* Declares the complete list of steps involved in the workflow.
*/
public List<String> getSteps() {
return new List<String>{
'VerifyOrderStep',
'ProvisionSandboxStep',
'SendWelcomeEmailStep'
};
}
/**
* Designates the starting step.
*/
public String getInitialStep() {
return 'VerifyOrderStep';
}
/**
* Determines the next transition step based on the outcome of the active step.
*/
public String getNextStep(String currentStepName, StepResult result) {
if (currentStepName == 'VerifyOrderStep') {
return 'ProvisionSandboxStep';
}
if (currentStepName == 'ProvisionSandboxStep') {
return 'SendWelcomeEmailStep';
}
return null; // Null indicates terminal completion
}
}Execute the workflow asynchronously from any Apex context (Triggers, Controllers, or Queueables):
// Parameters: WorkflowDefinition ClassName, Unique Correlation Key, JSON Input
Id instanceId = WorkflowEngine.start(
'OnboardingWorkflow',
'Opp_Onboarding_006As00000abcde',
'{"accountId": "001As0000012345", "vipOnboarding": true}'
);A step that suspends for an approval (StepResult.waitForApproval(...)) or an external event (StepResult.suspend()) is woken by WorkflowEngine.signal(keyOrId, name, payload) (or a SIGNAL:-typed Workflow_Event__e). On resume, the step reads the signal that woke it directly from StepContext — no SOQL against engine-internal objects, and no hand-rolled "consumed" marker:
public StepResult execute(StepContext ctx) {
// Most recent signal of this name; never null.
StepContext.Signal decision = ctx.getSignal('Approve:Order');
if (!decision.isPresent()) {
// First run (or resumed by a timer): nothing has signaled us yet.
return StepResult.waitForApproval('Order', 'Manager');
}
Map<String, Object> payload =
(Map<String, Object>) JSON.deserializeUntyped(decision.payload);
Boolean approved = (Boolean) payload.get('approved');
return approved
? StepResult.complete('ApproveStep', payload)
: StepResult.complete('RejectStep', payload);
}Accessors available inside execute() and compensate():
| Accessor | Returns |
|---|---|
ctx.getSignals() |
All pending signals, in arrival order (never null). |
ctx.getSignals(name) |
Pending signals matching name, in arrival order. |
ctx.getSignal(name) |
The most recent pending signal of name, or a clean empty result (isPresent() == false, payload == null) when none is pending — never null. |
ctx.hasSignal(name) |
true if any pending signal matches name. |
Consumption is engine-managed and tied to the step's successful COMPLETE (or SPLIT) transition: signals the step observes are marked consumed only once the step completes, so a step that yields or retries before completing re-observes the same pending signal, and an at-least-once redelivered duplicate cannot be reprocessed by a later step. Transitions that suspend and re-run the same step — SUSPEND, WAIT_FOR_APPROVAL, SLEEP, and START_CHILD — intentionally do not consume, so the signal that resumes the step survives to be read. A START_CHILD step that also reads a kickoff signal should therefore check its child-completion signal before re-acting on the kickoff (or stash what it needs in step state). Cancel / CancelWorkflow control signals remain engine-handled and are never surfaced as readable payloads. See ApprovalSignalWorkflowExample for a complete approve/reject example with a redelivery test.
When a step reads a signal while running as one branch of a parallel (scatter-gather) fan-out, the engine atomically claims each signal it reads (moving it to an internal Processing state) so two concurrent branches can never both process the same payload; the claim is promoted to consumed when the branch completes, and rolled back if the branch yields/suspends/retries. Each branch consumes (and rolls back) only the signals it itself claimed, tracked per row, so it never disturbs a signal a sibling has claimed or merely observed. A parallel instance suspended on a signal is resumed branch-by-branch when the signal arrives, and a bulk signal to many parallel instances wakes their branches with a single platform-event publish.
Claims are bounded for governor safety. A branch that needs to inspect many distinct signal names should call ctx.getSignals() once and filter in memory rather than issuing a separate ctx.getSignal(name) / ctx.hasSignal(name) lookup per name: each named claim is its own DML statement and lock query, so beyond an internal claim budget the overflow reads degrade to at-least-once (read without claiming) instead of failing. getSignals() claims a whole page in one statement and stays exactly-once. ParallelSignalFanInWorkflowExample is the copyable reference for this fan-in pattern.
Because claiming writes uncommitted DML, a parallel-branch step that makes an HTTP callout whose endpoint or body comes from the signal payload must implement the CalloutStep marker. Such a step reads signals without claiming (at-least-once delivery instead of exactly-once), so the callout is legal; use distinct signal names per branch or idempotent callouts when relying on this mode. A CalloutStep may also be TimeoutConfigurable: the engine defers its pre-execution timeout write past execute() (and past compensate() for a CalloutStep that is also a CompensatableStep) so the synchronous callout is not blocked by uncommitted work (the step's timeout was already armed when it was queued, so the watchdog stays in effect).
Recommended pattern — keep exactly-once and the callout by splitting them into two steps. The at-least-once CalloutStep mode above exists because a single step cannot atomically own two un-rollback-able concerns: a signal claim is transactional DML, but an HTTP callout is not, and the two cannot be made to commit or roll back together. Rather than dropping the signal claim to at-least-once, give each concern its own step:
- A signal-wait step reads the signal the normal (claiming) way — pure transactional DML, so consumption stays exactly-once — and on receipt transitions (
StepResult.complete('DoCallout', payload)) to - A callout step that issues the HTTP callout from the payload it was handed. With no signal claim in its transaction, this is an ordinary callout the engine already makes safe (continuation + deferred timeout), so its only commit concern is the callout itself.
This decomposition removes the claim-vs-callout conflict entirely: the signal is consumed exactly once in step 1, and the callout in step 2 carries no uncommitted signal DML. Reach for the inline CalloutStep+signal mode only when the callout must react to the signal in place — e.g. a compensating callout fired in response to a signal during unwind, where the wait and the callout cannot be pre-split — and make that callout idempotent.
The suspend→signal→resume pattern combined with saga rollback on rejection is the most common enterprise durable-workflow shape — and the one that native Approval Processes cannot express, because they have no durable multi-step orchestration or compensating rollback. ApprovalWorkflowExample is the copyable reference. Three elements work together:
1. A compensatable forward step — any step that implements CompensatableStep and does real work before the approval gate. The engine pushes its name onto Compensation_Stack__c when it completes, so the work can be automatically rolled back if the workflow is later rejected.
2. The approval gate — a WorkflowStep that reads the inbound decision signal from StepContext and surfaces the payload as its output, leaving routing to the DAG:
public StepResult execute(StepContext ctx) {
StepContext.Signal decision = ctx.getSignal('Approve:PurchaseApproval');
if (!decision.isPresent()) {
// First run: nothing decided yet -- suspend until an approver signals the workflow.
// The second argument is an optional Custom Permission API name the dashboard
// requires an approver to hold; null leaves the gate unrestricted (pass e.g.
// 'Workflow_Admin' to restrict who may decide).
return StepResult.waitForApproval('PurchaseApproval', null);
}
// Resume: forward the decision payload; getNextStep() will route to approve or reject.
Map<String, Object> payload = (Map<String, Object>) JSON.deserializeUntyped(decision.payload);
return StepResult.complete(null, payload);
}3. DAG-level approve/reject routing in getNextStep():
public String getNextStep(String currentStepName, StepResult result) {
if (currentStepName == 'ApprovalWorkflowExample.RequestApprovalStep') {
Map<String, Object> decision =
(Map<String, Object>) JSON.deserializeUntyped(result.outputJson);
Boolean approved = (Boolean) decision.get('approved');
return approved
? 'ApprovalWorkflowExample.PlaceOrderStep'
: 'ApprovalWorkflowExample.RejectStep';
}
...
}Delivering the decision. An Admin or Flow Builder uses the Signal Workflow invocable action (WorkflowSignalInvocableAction) with:
- Correlation Key / Instance ID — the workflow correlation key (or instance Id)
- Signal Name —
Approve:PurchaseApproval(i.e.Approve:+ the approval key passed towaitForApproval) - Payload JSON —
{"approved":true}or{"approved":false,"reason":"..."}
From Apex the equivalent call is:
WorkflowEngine.signal(correlationKey, 'Approve:PurchaseApproval', '{"approved":true}');Saga rollback on rejection. The reject step throws a WorkflowEngine.WorkflowException. Because the prior CompensatableStep is on Compensation_Stack__c, the engine automatically calls its compensate() method in LIFO order — reading the original step output from ctx.stepStateJson to retrieve any resource identifiers — and the instance reaches Compensated when the rollback is done. No additional wiring is required: the stack is maintained by the engine whenever a CompensatableStep completes on the forward path.
Deliberate, terminal failures — StepResult.fail(reason). For a planned business-rule failure ("order already refunded", "account closed", "KYC rejected"), return StepResult.fail(reason) instead of throwing. Unlike a thrown exception — which surfaces an Apex stack trace in Error_Message__c / on the dashboard — fail() records your operator-readable reason verbatim and stops the workflow in zero retry hops (it is Revenant's non-retryable / permanent-failure primitive, the terminal counterpart to opt-in StepResult.retry()). It honors the same compensation contract: with a non-empty Compensation_Stack__c the instance transitions to Compensating and runs compensate() LIFO; with no stack it goes straight to Failed. Pass structured data with StepResult.fail(reason, failureData) — it is persisted on the failing step's Error_Details__c for downstream compensation/alerting to inspect.
public StepResult execute(StepContext ctx) {
if (orderAlreadyRefunded(ctx)) {
return StepResult.fail(
'Order already refunded',
new Map<String, Object>{ 'orderId' => ctx.getSignal('Order').payload }
);
}
...
}Revenant hooks directly into Salesforce's native Approval Process (and the standard Approval Requests tab, Approvals Home page, and standard related lists) without requiring custom LWC pages or new objects. Standard SObjects (like Case or Opportunity) that already undergo approval processes can drive and resume Revenant workflows.
NativeApprovalWorkflowExample is the reference for wiring the two together using the standard Case object and a clean Apex-trigger bridge:
- Start & Submission: When a
Caseis created, the after-insert triggerCaseApprovalBridgeTriggerstarts the workflow (using theCase.Idas the correlation key). When the workflow reachesRequestApprovalStep, the step automatically queries the Case and submits it into the native Salesforce Approval Process (Case.Revenant_Trigger_Approval_Example.approvalProcess-meta.xml). - Approval Request Inbox: The record immediately appears in the standard Salesforce Approval Requests tab (listing the native
ProcessInstanceWorkitem). The designated approver reviews and approves/rejects it there. - Resume Hook: On approval/rejection, the native process performs a field update to
Case.Approval_Status__c(e.g., setting it to'Approved'or'Rejected'). - Trigger Signal: The after-update trigger (
CaseApprovalBridgeTrigger) detects this status change, queries the approver's comments fromProcessInstanceStep, and callsWorkflowEngine.signal()to resume the suspended workflow.
This approach keeps your implementation completely standard, bulk-safe by construction, and fully testable in Apex unit tests (NativeApprovalTriggerBridgeExampleTest).
Revenant still contributes what a native Approval Process cannot express on its own: durable multi-step orchestration and saga rollback on rejection (ReserveBudgetStep unwinds via Compensation_Stack__c automatically if rejected).
This is example-only wiring on the standard Case object: the trigger fires on every Case in the org, not just demo records, which is fine for a reference example but would need scoping (e.g. a dedicated Record Type) before adapting this for a real deployment where Case already carries other business processes.
The engine ships full parent→child orchestration: StepResult.startChild() suspends the parent, runs the child in its own durable Queueable chain, and resumes the parent via a ChildCompleted:<childKey> Platform Event when the child successfully completes. ChildWorkflowCompositionExample is the copyable reference — a loan-application parent that delegates credit scoring to a child and then branches on the child's score.
Contract that authors must get exactly right
| Concern | Rule |
|---|---|
| Suspend | Return StepResult.startChild(childWorkflowName, childKey, input) from the step that launches the child. The engine suspends the parent automatically — do not also return StepResult.suspend(). |
| Child output | The child's final outcome (status, error message, and output) can be read with ctx.getChildOutcome(childKey). This is the preferred way to distinguish a successful child from a failed, compensated, or cancelled one without hand-rolled SOQL. For backward compatibility, the successful child's final output still arrives as the payload of the ChildCompleted:<childKey> signal (read with ctx.getSignal("ChildCompleted:" + childKey).payload). |
| Idempotent resume | The step that launched the child also handles the resume: check for the child outcome first, then act on it. Return StepResult.complete() (not suspend()) on the resume path — returning COMPLETE triggers engine-managed signal consumption, so an at-least-once redelivered duplicate completion or failure event cannot double-advance the parent. |
| Idempotent launch | The engine automatically dedupes child launches against the deterministic (Parent_Instance__c, Correlation_Key__c) pair. If startChild() is called again during a re-entrant hop or watchdog re-check, and the active child already exists, the engine resolves to an idempotent re-suspend without starting a duplicate or failing the parent. |
| Cancellation | WorkflowEngine.cancel(parentId, false) cancels the parent and all of its active descendants (root-first traversal over Parent_Instance__c), so explicitly cancelling a parent reaps its in-flight children. |
Caveats (failure paths are not auto-handled). This contract covers parent→child composition. One failure mode still needs explicit handling in a production composite:
- A parent that fails does not cascade to its children.
failWorkflowInstance()does not traverseParent_Instance__c; only explicitWorkflowEngine.cancel()reaps descendants. A failing parent leaves its in-flight children running unless you cancel them.
Minimal launcher + resume step
public class RequestCreditCheckStep implements WorkflowStep {
public StepResult execute(StepContext ctx) {
String childKey = 'CreditCheck_' + ctx.workflowInstanceId;
StepContext.ChildOutcome outcome = ctx.getChildOutcome(childKey);
if (outcome.isPresent()) {
if (outcome.isSuccess()) {
// Resume: read the child's output.
Map<String, Object> childResult = (Map<String, Object>) JSON.deserializeUntyped(
WorkflowEngine.resolvePayload(outcome.output)
);
// ... inspect childResult and return StepResult.complete(nextStep, output)
} else {
// Fallback: child failed, compensated, or was cancelled.
// ... handle failure (e.g. route to a fallback step)
}
}
// First run or re-entrant hop: start the child and suspend.
// The engine handles re-entrancy idempotently (no duplicate is started).
Map<String, Object> input = (Map<String, Object>) JSON.deserializeUntyped(ctx.workflowInputJson);
return StepResult.startChild('MyChildWorkflow', childKey, input);
}
}The correlation key format '<prefix>_' + ctx.workflowInstanceId guarantees uniqueness across concurrent parent instances while remaining stable across retries of the same step.
Flow Builders interact with the engine through supported Invocable Actions (category Revenant Workflows) — no internal field API names required:
| Action | Apex Class | Purpose |
|---|---|---|
| Start Workflow | WorkflowStartInvocableAction |
Launch a durable workflow, returning its Instance Id. |
| Signal Workflow | WorkflowSignalInvocableAction |
Send a signal (approve, cancel, resume) to a running instance. |
| Get Workflow Status | WorkflowStatusInvocableAction |
Read an instance's outcome back into Flow (read-only). |
Reading a workflow's outcome. Get Workflow Status accepts either a Workflow_Instance__c Id or a Correlation Key and returns typed outputs a Decision element can branch on:
found—false(instead of a fault) when nothing matches the key/Id.status— the rawStatus__cvalue (e.g.Running,Completed,Failed).isTerminal—trueonce the workflow has finished (Completed/Failed/Compensated/Cancelled).isSuccess—trueonly forCompleted.outputJson— the workflow output, fully rehydrated even when it was offloaded to ContentVersion (>100k); never the raw storage pointer.errorMessage— failure detail fromError_Message__c.
The action is strictly read-only (no transition, enqueue, signal, schedule, or DML) and bulk-safe across Flow batch sizes. Lookup behavior differs by identifier, which matters for workflows that use ContinueAsNew:
- By Correlation Key — automatically follows the
ContinuedAsNewchain and reports the live/terminal successor rather than a stale predecessor. Use the correlation key for outcome polling whenever a workflow may continue-as-new. - By Instance Id — reads that exact instance and deliberately does not follow the chain (an Id is a precise handle). The Id returned by Start Workflow points at the original generation, so polling that saved Id on a continue-as-new workflow would keep reading the predecessor and miss the successor's outcome.
Note: a single read returns the full rehydrated
outputJsoneven for offloaded (>100k) payloads. A Flow batch that polls many instances whose outputs are all large/offloaded materializes them all at once and can approach the Apex heap limit; use smaller batch sizes for that case.
Reference recipe — start a workflow, then later branch on its outcome:
- A record-triggered Flow calls Start Workflow (
workflowName, a stablecorrelationKey, optionalinputJson). - Later (a scheduled Flow, a screen action, or a subsequent automation) calls Get Workflow Status, passing that same
correlationKey(preferred — it resolves continue-as-new chains; the saved Id only reads the original generation). - A Decision element branches:
found = false→ not found;isSuccess = true→ success path;status = Failed→ surfaceerrorMessage;isTerminal = false→ keep waiting; default → ended without success (e.g. Cancelled/Compensated).
The autolaunched Flow Revenant_Read_Workflow_Status_Example (examples/main/default/flows/) implements step 2–3 verbatim.
Executing Flows as Steps. The engine provides a bundled step class WorkflowFlowStep to execute standard Autolaunched Flows as steps within a workflow:
- The step's input JSON is expected to be a JSON object with:
flowName(String, required): The developer name of the Autolaunched Flow to invoke.variables(Map, optional): Key-value pairs representing the input variables passed to the Flow.
- The step returns a
StepResult.completewith the Flow's output parameters as its JSON output.
At-Most-Once Flow Execution Guarantee. Because step bodies are executed under an at-least-once contract, the engine guards WorkflowFlowStep using its capture-once (once()) idempotency machinery. The first time a Flow is executed for a given step visit, its output parameters are durably recorded in the step's audit log (Captured_Values__c). On any subsequent sequential re-execution (due to a watchdog sweep or stray orchestrator hop), the engine retrieves the original Flow outputs directly from the audit log without re-invoking the Flow, preventing duplicate Flow-driven DML, emails, or callouts.
Honest Boundary: The guard is best-effort under concurrent re-hops (mirroring the engine's general terminal-window deduplication boundary). Flow Builders whose Flows interact with external systems should still design their Flow logic to tolerate at-least-once invocation where possible.
For Apex callers — ISVs, trigger handlers, batch jobs, invocable wrappers — WorkflowEngine.getStatus is the supported read contract. Do not query Workflow_Instance__c fields directly: the field names are internal, and Output__c silently holds a storage pointer (not the real value) when the output exceeds 100k characters.
// By instance Id (precise handle — does not follow ContinuedAsNew chains)
WorkflowEngine.WorkflowStatus ws = WorkflowEngine.getStatus(instanceId);
// By correlation key (preferred for polling — picks the live/latest run)
WorkflowEngine.WorkflowStatus ws = WorkflowEngine.getStatus(correlationKey);
// Bulk variants (constant SOQL, regardless of list size)
List<WorkflowEngine.WorkflowStatus> results = WorkflowEngine.getStatus(idList);
List<WorkflowEngine.WorkflowStatus> results = WorkflowEngine.getStatus(keyList);WorkflowStatus fields:
| Field | Type | Description |
|---|---|---|
instanceId |
Id |
The Workflow_Instance__c record Id. |
definitionName |
String |
The workflow class name (Workflow_Name__c). |
correlationKey |
String |
The correlation key the instance was started with. |
status |
String |
Raw Status__c value (e.g. Running, Completed, Failed). |
isTerminal |
Boolean |
true when the workflow has reached a final state (Completed, Failed, Compensated, or Cancelled). |
errorMessage |
String |
Failure detail from Error_Message__c; null unless the instance failed. |
output |
String |
The fully rehydrated output string for terminal instances — transparently resolved from ContentVersion when the output was offloaded (>100k chars). null for non-terminal instances; blank string ("") for a terminal instance that produced no output. |
Key semantics:
outputisnullfor in-flight instances.isTerminal = truewithoutput = ""means the workflow completed successfully with no output — this is distinct from a still-running instance.- The Id overloads cost at most 2 SOQL queries (one for the instance(s), at most one more for ContentVersion rehydration) and zero DML. The correlation-key overloads resolve each key's winning instance from metadata first and fetch outputs only for the winners, costing a small constant number of SOQL queries regardless of list size (and zero DML) — bounded per key, so a single hot key reused across many terminal runs can neither crowd out other requested keys nor exhaust the heap. All overloads are safe to call from any Apex context.
- By correlation key, the active/live run is preferred over a recently-terminal one when a key has been reused. Lookup is case-insensitive (matching the correlation-key fields) and follows
ContinuedAsNewchains via the shared root key, so polling the original key — or any intermediate successor key — returns the live/final successor, not a stale predecessor.nullis returned (not an exception) when nothing matches. - Known limitation — don't reuse a chain's correlation key for an unrelated run while that chain is live. Chain following matches on the shared
Root_Correlation_Key__c, which a continue-as-new chain shares with any independent run that reuses the same key. If you start a separate workflow with a correlation key that is also the root of a still-live continue-as-new chain, polling an intermediate successor key of that chain may resolve to the unrelated reused run. Use distinct correlation keys per logical workflow (the normal case) to avoid this; precise disambiguation would require walking thePrevious_Instance__clineage, which is intentionally out of scope to keepgetStatusbounded and constant-SOQL.
A WorkflowDefinition is otherwise only exercised at runtime: the engine resolves each step class lazily via Type.forName as it reaches it, so a typo'd successor, an entry point missing from getSteps(), or a step name that doesn't resolve to a real WorkflowStep only surfaces mid-execution — after earlier forward steps (and their side effects) have already committed, triggering a real compensation rollback in production. WorkflowValidator moves that detection to authoring/deploy time.
Call WorkflowValidator.validate(workflowClassName) — typically from an Apex test that runs in CI before deploy. It is strictly read-only: it creates no Workflow_Instance__c, writes no Workflow_Step_Execution__c, touches no compensation state, publishes no platform events, and enqueues no jobs. A single call returns every structural defect found (not just the first), each naming the offending step:
@isTest
static void onboardingWorkflowIsWellFormed() {
WorkflowValidator.ValidationResult result =
WorkflowValidator.validate('OnboardingWorkflow');
System.assert(
result.isValid(),
'OnboardingWorkflow DAG is malformed: ' + result.getMessages()
);
}validate flags: a definition class that doesn't resolve or doesn't implement WorkflowDefinition; a getInitialStep() that is blank or not contained in getSteps(); any getSteps() entry that doesn't resolve to an instantiable WorkflowStep/CompensatableStep; and duplicate getSteps() entries. It additionally runs a best-effort transition probe that drives getNextStep(...) for each declared step and flags any returned successor not in getSteps() — best-effort because getNextStep is data-dependent and cannot be fully enumerated. Inspect result.defects (each has a code, stepName, and message) or result.getMessages() for the full list. This is why getSteps() is part of the WorkflowDefinition contract: it is the authoritative step inventory the validator checks the DAG against.
Revenant supports Custom Metadata-driven failure alerting. Operators can configure notifications directly in Salesforce Setup without code modifications by creating Workflow Alert Config (Workflow_Alert_Config__mdt) records:
The engine maps a workflow's class name to a custom metadata record's DeveloperName (Workflow Alert Config Name) by replacing all non-alphanumeric characters (such as dots) with underscores:
- Standard Class:
OnboardingWorkflowmaps toOnboardingWorkflow - Inner Class / Nested Class:
CalloutTimeoutWorkflowExample.CalloutWorkflowmaps toCalloutTimeoutWorkflowExample_CalloutWorkflow - Global Fallback: If no specific configuration record matches a failing workflow, the engine automatically falls back to a record named
Default.
- Email Recipients (
Email_Recipients__c): A comma- or semicolon-separated list of target email addresses (e.g.,ops@example.com, alerts@example.com). - Enable Alerts (
Enable_Alerts__c): Checkbox to toggle notifications for this configuration. - Threshold Customization (Optional - if left blank, alerts fire immediately on any failure):
Consecutive_Failures_Limit__c: Trigger email alerts only afterNconsecutive executions fail.Failure_Count_Limit__candTime_Window_Minutes__c: Trigger email alerts ifNfailures occur within a sliding window ofMminutes.
force-app/main/default/- Core Engine & UI Componentsclasses/- Framework classes, queueables, finalizers, and scheduling utilities.objects/- Core database schemas (Workflow_Instance__c,Workflow_Step_Execution__c), Platform Events, and Custom Metadata Types.lwc/- Responsive visual monitoring timeline dashboard.
examples/main/default/- Reference Architecturesclasses/- Onboarding, Saga rollback, version upgrades, Apex Cursor parallel processing, HTTP Callout/Timeout Watchdog, and parent→child workflow composition implementations.triggers/- Opportunity stage triggers demonstrating automated workflow instantiation.flows/- Reference Flow demonstrating reading a workflow's outcome via the Get Workflow Status invocable action.
sf project deploy start # deploy to default scratch org
sf apex run test -w 10 # run the full test suiteFor testing patterns — WorkflowTestHarness, step-level unit tests, governor limit guidance, and when to use each — see docs/testing.md.
For AI and Agentforce integration — aiplatform.ModelsAPI, multi-turn agent conversations, sessionId threading, a ReAct tool-calling loop, testing mocks, and org setup — see docs/agentforce.md.
Revenant implements a Hybrid Watchdog Model for high-precision, fail-safe step timeouts, sleeps, and retries:
- Dynamic High-Precision Scheduling: When a step sleeps, retries, or registers a timeout, the engine dynamically schedules a seconds-level Apex job (
WorkflowSleepJob,WorkflowRetryJob, orWorkflowTimeoutJob) to run immediately. - Delayed Queueable Optimization: For delays that fall between 1 and 10 minutes (multiples of 60 seconds), the engine automatically routes scheduling through Delayed Queueables (
System.enqueueJob(job, delayMinutes)) instead of standard scheduled Apex. This consumes 0 scheduled job slots while still executing precisely. - Graceful Degradation (Overflow Protection): If the system has reached Salesforce's limit of 100 concurrent scheduled jobs, any failed
System.schedulecall falls back gracefully to tracking the timeout/sleep deadlines in the database (Sleep_Until__candTimeout_At__c). - Durable Safety Poller (Watchdog Workflow): A native, durable workflow (
WatchdogWorkflow) runs perpetually (using continue-as-new) to sweep the database and resume/fail any instances that were not dynamically scheduled. Because it uses the Delayed Queueable optimization, the watchdog itself consumes 0 scheduled job slots in production.
Revenant settings can be configured without code modifications by editing the Default record of the Revenant Config (Revenant_Config__mdt) Custom Metadata Type:
- Use Dynamic Scheduling (
Use_Dynamic_Scheduling__c- Checkbox, defaulttrue):true: The engine attempts precise, second-level scheduling viaSystem.schedulefirst, falling back to the database watchdog only if limits are reached.false: The engine completely bypassesSystem.scheduleand writes all sleep/retry/timeout states directly to the database. All operations are then processed sequentially by the watchdog poller.
- Watchdog Delay Minutes (
Watchdog_Delay_Minutes__c- Number, default10):- The delay interval (between 1 and 10 minutes) before the watchdog enqueues its next self-chaining execution.
- Dedup Window Minutes (
Dedup_Window_Minutes__c- Number, default1440):- Controls idempotent get-or-start dedup for at-least-once event sources.
WorkflowEngine.start(...)returns the existing instance's Id when a start arrives with a correlation key that matches an instance that is still active, or that became terminal (Completed/Failed/Compensated/Cancelled/ContinuedAsNew) within this many minutes — rather than throwing or spawning a duplicate that re-runs side effects. 0(minimum) preserves active-only behavior: only in-flight instances are deduped, and a redelivery after completion starts a fresh instance.- Active instances are always deduped regardless of this value. A blank correlation key is never deduped (a key is required to start).
- Use
WorkflowEngine.startOrGet(...)(or the Start Workflow Invocable'sIs Newoutput) to observe whether a call started a new instance or returned an existing one, without a re-query. - Concurrency note: the unique
Active_Correlation_Key__cindex is the hard backstop for two simultaneous active starts (the loser receives the winner's Id). Terminal-window dedup is best-effort under concurrent redelivery: if a sibling run both starts and reaches a terminal state in the narrow window between a redelivery's lookup and its insert, a duplicate of the just-finished (in-window) run can still be created, because the unique index no longer applies once the original is terminal. Active redelivery and sequential post-completion redelivery are fully covered; closing the concurrent terminal race would require an extra per-start query and is intentionally not done to keep the start path to a single indexed SOQL.
- Controls idempotent get-or-start dedup for at-least-once event sources.
| Metric / Aspect | Dynamic Precise Scheduling (Default) | Watchdog-Only (Use_Dynamic_Scheduling__c = false) |
|---|---|---|
| Precision | High-precision (down to the second). | Delayed by up to the watchdog delay (e.g., 10 minutes). |
| Latency | Low-latency (runs immediately at target time). | Coarse-grained polling latency. |
| Scheduled Job Slots | Consumes 1 slot per active sleep/retry/timeout job (max 100 concurrent). | Consumes 0 scheduled job slots. |
| Limit Vulnerability | Vulnerable to hitting the 100 scheduled jobs limit in high-volume orgs. | Completely immune to the 100 scheduled jobs limit. |
| Use Case | Low-volume, time-sensitive or interactive workflows. | High-volume, non-interactive batch or transactional processing workflows. |
The Workflow Dashboard includes a System Doctor tab to monitor limits, check configuration settings, and audit watchdog health:
- Watchdog Health: Indicates whether the self-chaining watchdog Queueable chain is active (
Running) or has stalled (Stopped). - Bootstrap Action: Includes an Enqueue Watchdog button to manually trigger and restart the Queueable chain if it ever halts (e.g., during major platform maintenance windows).
- Limits Auditing: Displays active
CronTriggerutilization (against the 100-job limit) and pending database sweeps (sleeping instances and step timeouts).
By default, Salesforce Platform Event triggers (like WorkflowEventTrigger) execute sequentially under the context of the Automated Process system user with a batch size of 2,000. To scale throughput and ensure governor limit safety in high-volume environments, configure a PlatformEventSubscriberConfig record for WorkflowEventTrigger via the Tooling or Metadata API.
-
Parallel Subscriptions (Partitioning)
NumPartitions: Scale throughput by setting this between1and10to process events concurrently in parallel execution streams.PartitionKey: Set this toWorkflow_Instance_Id__c. The platform hashes this key to distribute events across partitions, ensuring events for the same workflow instance are processed sequentially (in-order) to prevent race conditions, while different instances run concurrently.
-
Batch Size Tuning
batchSize: Set a smaller chunk size (e.g.,50or100) instead of the default2,000. This reduces the risk of hitting CPU time, heap, or SOQL limits within a single trigger execution block.
-
Running User Context
userId: Triggers delegate step execution to Queueable Apex (WorkflowOrchestrator). By default, these run under theAutomated Processuser. Whileautoproccan be assigned Named Credential/External Credential access using Anonymous Apex (insert new PermissionSetAssignment(...)), you can optionally specify a dedicated integrationuserIdhere to run the subscriber trigger and Queueables under a standard user context.
Licensed under either of
- Apache License, Version 2.0 (LICENSE-APACHE)
- MIT License (LICENSE-MIT)
at your option.