Skip to content

Workflow status update delayed by 60 seconds due to dual locks when task is failed with lock enabled #295

@pugazhenthi-elangovan-E3338

Description

Describe the bug
Whenever the task fails, workflow status has to be updated as failed. When the workflowExecutionLockEnabled is set to true and workflowOffsetTimeout is greater than 60 seconds, this particular flow tries to acquire the new lock over existing lock without releasing it. Thus, the workflow state transition to failed status takes lockLeaseTime time (default 60 seconds) which is not the intended.

Details
Conductor version: v3.19.0
Persistence implementation: Postgres
Queue implementation: Postgres
Lock: Postgres
Workflow definition:

{
  "createTime": 1729257249830,
  "accessPolicy": {},
  "name": "google_error_hit",
  "description": "Edit or extend this sample workflow. Set the workflow name to get started",
  "version": 1,
  "tasks": [
    {
      "name": "call_remote_api",
      "taskReferenceName": "call_remote_api",
      "inputParameters": {
        "http_request": {
          "uri": "https://google-wrong.com",
          "method": "GET"
        }
      },
      "type": "HTTP",
      "startDelay": 0,
      "optional": false,
      "asyncComplete": false,
      "permissive": false
    }
  ],
  "inputParameters": [],
  "outputParameters": {
    "data": "${call_remote_api.output.response.body.data}",
  },
  "schemaVersion": 1,
  "restartable": true,
  "workflowStatusListenerEnabled": false,
  "ownerEmail": "example@email.com",
  "timeoutPolicy": "ALERT_ONLY",
  "timeoutSeconds": 0,
  "variables": {},
  "inputTemplate": {}
}

To Reproduce
Steps to reproduce the behavior:

  1. Start the conductor with the app property conductor.app.workflowExecutionLockEnabled as true and workflowOffsetTimeout > 60 seconds.
  2. Save the provided workflow in 'Worklfow Definitions'
  3. Click on 'Workbench' and execute the google_error_hit workflow.
  4. Click on New Execution created at right top
  5. See Task error and failed status
  6. Workflow will only be failed after 60 seconds of task failure.

Expected behavior
Workflow should get failed immediately after task failure as per the workflow execution flow handled in code.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions