Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix for intermittency in unique job tests that use ByPeriod #395

Merged
merged 1 commit into from
Jun 19, 2024

Conversation

brandur
Copy link
Contributor

@brandur brandur commented Jun 14, 2024

Fix for #392 in which under rare cases, a test can start right up
against the boundary of a period in use for unique opts, cross over that
period, and produce the wrong result as a new row is inserted.

For example, using a period of 15 minutes like in db_unique_test.go,
if the tests run at 14:59:59.998, there's a good chance that they'll
finish after 15:00:00, thereby encapsulating two separate 15 minutes and
allowing separate jobs to be inserted. This is more likely to happen in
CI where tests take long to run.

The fix is to use a more stable time for this time of test. We use the
current hour, but assign a fixed minute and second that's right in the
middle and nowhere near any boundaries.

I gave client_test.go's unique tests the same treatment since it's
possible for this problem to happen there too, although they're using a
24 hour period so it was already very unlikely. (The test case would
have to start at exactly the wrong moment before a UTC day boundary.)

TimeGenerator

When I applied the fix above, I ran into another problem, which is that
the timestamps assigned to newly inserted jobs through a client don't
respect a stubbed time in the archetype. The fix is allow for an optional
created_at to be inserted on the fast path, but the previous behavior
where the canonical source of a timestamps was always the database was
kind of nice in that would be less prone to clock drifts between
machines doing insertion, so timestamps would be more consistent.

I put in a check that only sets created_at if the time is currently
stubbed, but to accomplish that I had to change the way time stubbing
works because previously, TimeNotUTC would always have a value
regardless of whether it's stubbed or not. I settled on putting together
a TimeGenerator interface that's implemented by a TimeStub in tests,
and an UnstubbableTimeGenerator otherwise. The interface provides a
function TimeUTCOrNil that can be used for paths like this one on
insertion to return a stubbed time if it is stubbed, but otherwise get
back a nil that can be passed to the database to have it choose a time
for us.

Fixes #392.

@brandur brandur marked this pull request as draft June 14, 2024 02:34
@brandur brandur force-pushed the brandur-unique-intermittency-fix branch from 7f3f235 to 8430755 Compare June 19, 2024 18:02
@brandur
Copy link
Contributor Author

brandur commented Jun 19, 2024

@bgentry Can you take a look at this one?

It got a little more complicated than I hoped. I believe the intermittency problem is what I described above in the pull request description, but I found when I went to push a fix that I was getting a massively failing test matrix.

Looking into it, that turned out to be because although our time stubs work to a certain extent, they're ignored completely when jobs are being inserted.

I was able to get this fixed up by adding created_at to inserts with a fall back to now(), but doing so makes the time stub system a little more elaborate. I tried a number of approaches, and this is the best one I could come up with, but it does change the time stub from a simple function to an interface that can be (1) stubbed, and (2) checked to see if it is stubbed.

It does add a little more code, but all the alternatives I can think of are worse. It's fairly important in specific cases that time be stubbable, and if job insertion always ignores a stub, then it's not really stubbable. I could possibly fix the test by inserting the job and the updating its created_at with a different query, but that seems pretty bad.

The good news is that we take three different time stub systems (one in limiter_test.go, the func approach in riverinternaltest, and the StubTime helper in riverinternaltest), and unify them into a single system.

@brandur brandur force-pushed the brandur-unique-intermittency-fix branch 2 times, most recently from 4b3db8a to 51687d6 Compare June 19, 2024 18:24
Copy link
Contributor

@bgentry bgentry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice debugging! This seems like a reasonable enough way to solve the issue IMO.

@@ -1112,7 +1119,7 @@ func (c *Client[TTx]) ID() string {
return c.config.ID
}

func insertParamsFromConfigArgsAndOptions(config *Config, args JobArgs, insertOpts *InsertOpts) (*riverdriver.JobInsertFastParams, *dbunique.UniqueOpts, error) {
func insertParamsFromConfigArgsAndOptions(archetype *baseservice.Archetype, config *Config, args JobArgs, insertOpts *InsertOpts) (*riverdriver.JobInsertFastParams, *dbunique.UniqueOpts, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that we're passing in a couple of fields from Client to this (neither of which was needed 1mo ago), it might be cleaner and less confusing to make it a method on Client instead. Borderline 🤷‍♂️

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah true.

I took a shot at it, but in the end, decided to leave as is.

The trouble is that insertParamsFromConfigArgsAndOptions gets called from a few other non-client components that make some annoying changes required:

  • periodic_job.go needs it in PeriodicJobBundle so it can insert periodic jobs. We could embed a client on PeriodicJobBundle instead of a client config, but then PeriodicJobBundle needs to take a generic parameter so it can put a type on Client[TTx], which is super annoying.
  • producer_test.go uses it. It's workable just to create a client there and call the function on that, but sort of annoying to add the extra client that's not really used for anything else.

Let's keep an eye on this one. It's definitely a refactor that's possible, and we should do it if things get anymore gnarly, but this might be the last addition it needs in a while, so maybe okay for now.

@brandur brandur force-pushed the brandur-unique-intermittency-fix branch 2 times, most recently from 36390dc to 98b8ad1 Compare June 19, 2024 18:55
Fix for #392 in which under rare cases, a test can start right up
against the boundary of a period in use for unique opts, cross over that
period, and produce the wrong result as a new row is inserted.

For example, using a period of 15 minutes like in `db_unique_test.go`,
if the tests run at 14:59:59.998, there's a good chance that they'll
finish after 15:00:00, thereby encapsulating two separate 15 minutes and
allowing separate jobs to be inserted. This is more likely to happen in
CI where tests take long to run.

The fix is to use a more stable time for this time of test. We use the
current hour, but assign a fixed minute and second that's right in the
middle and nowhere near any boundaries.

I gave `client_test.go`'s unique tests the same treatment since it's
possible for this problem to happen there too, although they're using a
24 hour period so it was already very unlikely. (The test case would
have to start at exactly the wrong moment before a UTC day boundary.)

Fixes #392.
@brandur brandur force-pushed the brandur-unique-intermittency-fix branch from 98b8ad1 to 4cd66eb Compare June 19, 2024 19:02
@brandur brandur marked this pull request as ready for review June 19, 2024 21:32
@brandur
Copy link
Contributor Author

brandur commented Jun 19, 2024

Thanks!

@brandur
Copy link
Contributor Author

brandur commented Jun 19, 2024

Oh, and I updated the commit message to describe some of the TimeGenerator additions.

@brandur brandur merged commit c9dd291 into master Jun 19, 2024
10 checks passed
@brandur brandur deleted the brandur-unique-intermittency-fix branch June 19, 2024 21:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Flaky unique inserter test
2 participants