Skip to content

Possible race condition when completing jobs #2143

Closed
@josenavas

Description

@josenavas

If a job has more than one validator, sometimes the validators are set to "waiting" but the job never finish (i.e. deadlock). My guess is that the race condition is happening in this call. If the validators are executing that call at the same time, they will not see that the other job is already in the waiting status because the entire operation is executed inside a transaction, so both jobs end up waiting for the other job to complete.

Possible solutions that I can think of:

  1. Instead of performing the counting by checking how many jobs are not in "waiting", put a specific counter with a lock in which serialization of operations is ensured by using a "ROW EXCLUSIVE)" lock.
  2. Explicitly commit the change of status before performing that check. The amount of code changes may be higher than it may seem as there are multiple nested contexts that creates transactions, and enforcing a commit may break some other unexpected functionality.

I think (1) is the more correct and easy solution although it requires DB changes.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions