Closed
Description
If a job has more than one validator, sometimes the validators are set to "waiting" but the job never finish (i.e. deadlock). My guess is that the race condition is happening in this call. If the validators are executing that call at the same time, they will not see that the other job is already in the waiting status because the entire operation is executed inside a transaction, so both jobs end up waiting for the other job to complete.
Possible solutions that I can think of:
- Instead of performing the counting by checking how many jobs are not in "waiting", put a specific counter with a lock in which serialization of operations is ensured by using a "ROW EXCLUSIVE)" lock.
- Explicitly commit the change of status before performing that check. The amount of code changes may be higher than it may seem as there are multiple nested contexts that creates transactions, and enforcing a commit may break some other unexpected functionality.
I think (1) is the more correct and easy solution although it requires DB changes.
Metadata
Metadata
Assignees
Labels
No labels