Fix unique scheduling issues #619

bgentry · 2024-09-26T16:04:16Z

PR #590 introduced a more flexible unique job mechanism that allowed for the unique states to be customized, all while still benefiting from the performance improvements of using a unique index rather than an advisory lock. The retryable state is included in the default list of states, but can be removed if the user doesn't want to prevent an erroring job from blocking duplicate inserts.

However this created an issue: when trying to schedule a retryable job (move it to available) it could potentially have a conflict with a duplicate unique job. To handle this, special logic was added to try to deal with this scenario for unique jobs, moving the conflicting row to discarded rather than available. Unfortunately this logic had issues and was insufficiently tested. There were a couple specific scenarios that caused issues:

A unique job that was being scheduled because it was either inserted as scheduled or had errored and become retryable would actually be considered a conflict with itself because the query didn't properly exclude the row being scheduled.
Attempting to schedule two duplicate retryable unique jobs at the same time would lead to a unique conflict because there was no mechanism to prevent this.

The query changes in this PR address both of the above cases along with test coverage.

The increased complexity is unfortunate, and we're probably nearing the limit of what should be dealt with in a single SQL query. If this still isn't complete I'm more inclined to fix the issues by catching these conflicts at the application level, explicitly moving the conflicting row(s) to discarded, and trying again. This can be looped with a backoff or recursed to ensure that progress keeps being made as individual conflicts get resolved. But hopefully that won't be necessary.

Fixes #618.

When deciding whether or not to discard a retryable job due to a unique conflict, don't consider the job that we're trying to schedule. It obviously cannot conflict with itself, only other rows.

If two unique jobs happen to be getting scheduled in the same batch, only one of them can successfully be moved to `available`—the other must be discarded. Pick the one that's first in line.

PR riverqueue#590 introduced a more flexible unique job mechanism that allowed for the unique states to be customized, all while still benefiting from the performance improvements of using a unique index rather than an advisory lock. The retryable state is included in the default list of states, but can be removed if the user doesn't want to prevent an erroring job from blocking duplicate inserts. However this created an issue: when trying to schedule a retryable job (move it to available) it could potentially have a conflict with a duplicate unique job. To handle this, special logic was added to try to deal with this scenario for unique jobs, moving the conflicting row to discarded rather than available. Unfortunately this logic had issues and was insufficiently tested. There were a couple specific scenarios that caused issues: A unique job that was being scheduled because it was either inserted as scheduled or had errored and become retryable would actually be considered a conflict with itself because the query didn't properly exclude the row being scheduled. Attempting to schedule two duplicate retryable unique jobs at the same time would lead to a unique conflict because there was no mechanism to prevent this. The query changes in this PR address both of the above cases along with test coverage. The increased complexity is unfortunate, and we're probably nearing the limit of what should be dealt with in a single SQL query. If this still isn't complete I'm more inclined to fix the issues by catching these conflicts at the application level, explicitly moving the conflicting row(s) to discarded, and trying again. This can be looped with a backoff or recursed to ensure that progress keeps being made as individual conflicts get resolved. But hopefully that won't be necessary. Fixes riverqueue#618.

bgentry added 2 commits September 26, 2024 10:54

JobSchedule: ensure that a job's unique key doesn't conflict w/ itself

675b0d7

When deciding whether or not to discard a retryable job due to a unique conflict, don't consider the job that we're trying to schedule. It obviously cannot conflict with itself, only other rows.

JobSchedule: handle duplicate uniques being scheduled at once

f9f3b94

If two unique jobs happen to be getting scheduled in the same batch, only one of them can successfully be moved to `available`—the other must be discarded. Pick the one that's first in line.

bgentry requested a review from brandur September 26, 2024 16:04

changelog

09e56b8

bgentry mentioned this pull request Sep 26, 2024

v0.12.0 all tasks go to the status discarded with InsertTx #618

Closed

bgentry merged commit 2b9dd83 into master Sep 26, 2024
14 checks passed

bgentry deleted the bg-fix-unique-scheduling-issues branch September 26, 2024 16:19

bgentry mentioned this pull request Sep 26, 2024

prepare v0.12.1 #620

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix unique scheduling issues #619

Fix unique scheduling issues #619

bgentry commented Sep 26, 2024

Fix unique scheduling issues #619

Fix unique scheduling issues #619

Conversation

bgentry commented Sep 26, 2024