Skip to content

Add optional retry to workflow steps #1015

Open
@osterman

Description

Describe the Feature

Steps in a workflow may need to get retried. We should support a configurable and optional retry mechanism with backoffs.

retry:
  max_attempts: 5       # Maximum number of retry attempts
  backoff_strategy: exponential # Options: "exponential", "constant", "linear"
  initial_delay: 2s     # Initial delay between retries (supports "ms", "s", "m", etc.)
  max_delay: 30s        # Maximum delay between retries
  random_jitter: true   # Whether to add jitter to backoff times
  multiplier: 2.0       # Multiplier for exponential backoff
  max_elapsed_time: 5m  # Maximum time to spend retrying before giving up

Expected Behavior

When a step exits non-zero, and a retry is configured, the step will be retried up until max_attempts. If it still fails, the workflow will exit non-zero.

Use Case

  • Run some raw terraform commands (e.g. terraform import) inside of a component directory.
  • Run some commands in /tmp (e.g. to download files and unpack them)

Describe Ideal Solution

Use https://github.com/cenkalti/backoff

workflows:
  test-1:
    description: "Test workflow"
    steps:
      - command: echo Command 1
        name: step1
        type: shell
      
        # All parameters are optional
        retry:
          max_attempts: 5       # Maximum number of retry attempts
          backoff_strategy: exponential # Options: "exponential", "constant", "linear"
          initial_delay: 2s     # Initial delay between retries (supports "ms", "s", "m", etc.)
          max_delay: 30s        # Maximum delay between retries
          random_jitter: true   # Whether to add jitter to backoff times
          multiplier: 2.0       # Multiplier for exponential backoff
          max_elapsed_time: 5m  # Maximum time to spend retrying before giving up

Or be able to set retries at the workflow level.

workflows:
  test-1:
    description: "Test workflow"
    # All parameters are optional
    retry:
      max_attempts: 5       # Maximum number of retry attempts
      backoff_strategy: exponential # Options: "exponential", "constant", "linear"
      initial_delay: 2s     # Initial delay between retries (supports "ms", "s", "m", etc.)
      max_delay: 30s        # Maximum delay between retries
      random_jitter: true   # Whether to add jitter to backoff times
      multiplier: 2.0       # Multiplier for exponential backoff
      max_elapsed_time: 5m  # Maximum time to spend retrying before giving up
    steps:
      - command: echo Command 1
        name: step1
        type: shell

Alternatives Considered

We could find some retry command that could be called in the shell step. However, that would require that command to be installed, along with proper error handling.

Additional Context

No response

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions