@@ -3,28 +3,29 @@ title: Retries
33description : Automatic retry behavior for failed steps and workflows
44---
55
6- In any application , things fail sometimes - a third-party API returns a
7- 500 error, a database connection times out, or a network blip drops a request.
8- These are transient failures: they go away on their own if you try again.
6+ In any app , things fail sometimes - a third-party API returns a 500, a database
7+ connection times out, or a network blip drops a request. These are transient
8+ failures that go away if you try again.
99
10- OpenWorkflow handles this automatically. When a step throws an error, the
11- workflow is rescheduled with an exponential backoff (increasing delays between
12- retries). Previously completed steps aren't re-run - only the failed step is
13- retried.
10+ OpenWorkflow handles this automatically by retrying failed steps. When a step
11+ throws, the workflow is rescheduled with an exponential backoff. Previously
12+ completed steps aren't re-run, only the failed step re-executes.
1413
1514## How Retries Work
1615
1716When a step throws an error:
1817
19- 1 . The step attempt is marked as ` failed `
20- 2 . The error is recorded in the database
21- 3 . The workflow is rescheduled with exponential backoff
22- 4 . When the workflow resumes, it replays to the failed step
23- 5 . The step function executes again (not the cached result)
18+ 1 . The step attempt is marked as ` failed ` and the error is recorded
19+ 2 . The workflow run is rescheduled with exponential backoff
20+ 3 . When the workflow resumes, it replays to the failed step
21+ 4 . The step function executes again
2422
25- ## Automatic Retries in Steps
23+ Steps retry up to 10 times by default. If the step still fails after all
24+ attempts, the workflow is permanently marked as ` failed ` .
2625
27- Steps that throw are automatically retried:
26+ ## Step Retries
27+
28+ Steps that throw are retried automatically:
2829
2930``` ts
3031await step .run ({ name: " call-api" }, async () => {
@@ -39,19 +40,18 @@ await step.run({ name: "call-api" }, async () => {
3940});
4041```
4142
42- Each retry:
43-
44- - Replays the workflow from the beginning
45- - Returns cached results for completed steps
46- - Re-executes the failed step
43+ ### Step Retry Policy
4744
48- ## Retry Policy
45+ Each step can define its own retry policy. If omitted, steps use these defaults:
4946
50- Both steps and workflows use the same retry policy shape. A retry policy
51- controls exponential backoff — how long to wait between retries, how fast delays
52- grow, and when to stop retrying.
47+ | Field | Default | Description |
48+ | -------------------- | -------- | ---------------------------------------------------------- |
49+ | ` initialInterval ` | ` "1s" ` | Delay before the first retry |
50+ | ` backoffCoefficient ` | ` 2 ` | Multiplier applied to each subsequent retry delay |
51+ | ` maximumInterval ` | ` "100s" ` | Upper bound for retry delay |
52+ | ` maximumAttempts ` | ` 10 ` | Total attempts including the initial one (` 0 ` = unlimited) |
5353
54- With the defaults, retry delays look like this:
54+ With these defaults, retry delays look like this:
5555
5656| Attempt | Delay |
5757| ------- | --------- |
@@ -62,23 +62,7 @@ With the defaults, retry delays look like this:
6262| 5 | ~ 8s |
6363| ... | ... |
6464
65- This prevents overwhelming external services during outages. Retries continue
66- until canceled, until ` deadlineAt ` is reached (or the next retry would pass it),
67- or until ` maximumAttempts ` is exhausted.
68-
69- Retry policies have the following fields:
70-
71- | Field | Default | Description |
72- | -------------------- | ---------- | --------------------------------------------------- |
73- | ` initialInterval ` | ` "1s" ` | Delay before the first retry after a failed attempt |
74- | ` backoffCoefficient ` | ` 2 ` | Multiplier applied to each subsequent retry delay |
75- | ` maximumInterval ` | ` "100s" ` | Upper bound for retry delay |
76- | ` maximumAttempts ` | ` Infinity ` | Maximum attempts, including the initial one |
77-
78- ### Step Retry Policy
79-
80- Each ` step.run(...) ` can define its own retry policy. If you omit ` retryPolicy ` ,
81- OpenWorkflow uses the defaults shown above.
65+ Override the defaults per step:
8266
8367``` ts
8468await step .run (
@@ -97,11 +81,16 @@ await step.run(
9781);
9882```
9983
100- ### Workflow Retry Policy
84+ Retries also stop early if the workflow has a ` deadlineAt ` and the next retry
85+ would exceed it.
86+
87+ ## Workflow Retries
10188
102- Workflow-level ` retryPolicy ` applies to non-step failures — for example, missing
103- workflow definitions or errors thrown outside ` step.run ` . If you omit
104- ` retryPolicy ` (or individual fields), OpenWorkflow uses the same defaults.
89+ Errors thrown outside of ` step.run(...) ` are workflow-level failures.
90+ ** Workflow-level failures are not retried by default** — the workflow is
91+ marked as ` failed ` .
92+
93+ To enable workflow-level retries, set a ` retryPolicy ` on the workflow spec:
10594
10695``` ts
10796import { defineWorkflow } from " openworkflow" ;
@@ -122,23 +111,38 @@ defineWorkflow(
122111);
123112```
124113
114+ <Note >
115+ Step retries and workflow retries are independent. Step failures use the
116+ step's own retry policy. The workflow retry policy only applies to errors
117+ thrown outside steps.
118+ </Note >
119+
120+ ## Missing Workflow Definitions
121+
122+ If a worker claims a run but doesn't have the matching workflow registered, it
123+ reschedules the run with exponential backoff (starting at 5s, capped at 5min).
124+ This keeps the run alive during rolling deploys or multi-worker setups where the
125+ right worker hasn't started yet.
126+
127+ Once a worker with the correct definition comes online, it claims the run and
128+ executes normally.
129+
125130## What Triggers a Retry
126131
127132Retries happen when:
128133
129- - A step function throws an exception
130- - A step function returns a rejected promise
131- - The worker crashes during step execution
134+ - A step function throws an error or returns a rejected promise
135+ - A worker crashes during step execution (the step is re-executed on recovery)
132136
133137Retries do ** not** happen for:
134138
135- - Completed steps (they return cached results)
139+ - Completed steps (cached results are returned )
136140- Explicitly canceled workflows
137- - Workflows that complete successfully
141+ - Workflow-level errors (unless a workflow ` retryPolicy ` is configured)
138142
139143## Error Handling
140144
141- You can catch and handle errors within your workflow:
145+ You can catch step errors inside a workflow to run fallback logic :
142146
143147``` ts
144148defineWorkflow ({ name: " with-error-handling" }, async ({ input , step }) => {
@@ -147,82 +151,29 @@ defineWorkflow({ name: "with-error-handling" }, async ({ input, step }) => {
147151 await externalApi .call ();
148152 });
149153 } catch (error ) {
150- // Log the error and continue with fallback
151- console .error (" API call failed:" , error );
152-
153- await step .run ({ name: " fallback-operation" }, async () => {
154+ await step .run ({ name: " fallback" }, async () => {
154155 await fallbackApi .call ();
155156 });
156157 }
157158});
158159```
159160
160161<Note >
161- When you catch an error, the workflow continues normally. The step is still
162- marked as failed in the database, but the workflow doesn't retry from that
163- point.
162+ When you catch an error the workflow continues normally. The step is still
163+ recorded as failed, but no retry is triggered.
164164</Note >
165165
166- ## Permanent Failures
167-
168- A workflow is marked as ` failed ` permanently when it can no longer be retried
169- (for example, because ` deadlineAt ` is reached, the next retry would exceed that
170- deadline, or ` maximumAttempts ` has been reached):
171-
172- - The error is stored in the workflow run record
173- - No more automatic retries occur
174- - You can view failed workflows in the dashboard
175- - Failed workflows can be manually retried or investigated
176-
177- ## Transient vs. Permanent Errors
178-
179- Design your steps to distinguish between transient and permanent errors:
166+ ## Terminal Failures
180167
181- ``` ts
182- await step .run ({ name: " call-api" }, async () => {
183- const response = await fetch (" https://api.example.com/data" );
184-
185- if (response .status === 503 ) {
186- // Transient - throw to trigger retry
187- throw new Error (" Service temporarily unavailable" );
188- }
189-
190- if (response .status === 400 ) {
191- // Permanent - bad request won't succeed on retry
192- // Handle differently (return error result, cancel workflow, etc.)
193- return { success: false , error: " Invalid request" };
194- }
195-
196- return await response .json ();
197- });
198- ```
199-
200- ## Best Practices
201-
202- ### Use Meaningful Error Messages
203-
204- Include context in errors for debugging:
205-
206- ``` ts
207- await step .run ({ name: " fetch-user" }, async () => {
208- const user = await db .users .findOne ({ id: input .userId });
209-
210- if (! user ) {
211- throw new Error (` User not found: ${input .userId } ` );
212- }
213-
214- return user ;
215- });
216- ```
168+ A workflow is permanently marked ` failed ` when step retries are exhausted
169+ (` maximumAttempts ` reached) or ` deadlineAt ` expires.
217170
218- ## Monitoring Retries
171+ Once terminal, no more automatic retries occur. You can inspect and manually
172+ retry failed workflows from the [ dashboard] ( /docs/dashboard ) .
219173
220- Use the dashboard to monitor workflow health:
174+ ## Monitoring
221175
222- - View failed workflow runs
223- - Inspect step attempt errors
224- - See retry history for a workflow
225- - Identify patterns in failures
176+ Use the [ dashboard] ( /docs/dashboard ) to monitor retry health:
226177
227178<CodeGroup >
228179``` bash npm
0 commit comments