BUGFIX: Remove from InProgress if we don't addToDead #92

davars · 2018-04-02T21:02:01Z

Tracked down why locks were being retained by jobs that weren't running to this branch. First commit is a simple fix. Second commit is a bit more involved and pulls removeJobFromInProgress out of any branches to ensure it always runs.

Commit Messages:

Remove from InProgress if we don't addToDead
removeJobFromInProgress critically needs to run when a job is dispatched regardless of the outcome; otherwise locks are retained for the lifetime of the worker. The easiest way to ensure it's called in every branch is to remove it from branches entirely. Its behavior can be modified selectively by passing in a closure, which can perform redis operations inside the same transaction as the common housekeeping ops.
Consistent error messages

…hed regardless of the outcome; otherwise locks are retained for the lifetime of the worker. The easiest way to ensure it's called in every branch is to remove it from branches entirely. Its behavior can be modified selectively by passing in a closure, which can perform redis operations inside the same transaction as the common housekeeping ops.

davars · 2018-04-03T12:24:03Z

I just realized I never ran before / after benchmarks, so caveat emptor. I plan to do that this morning.

davars · 2018-04-03T15:40:48Z

Not exactly statistically rigorous, but doesn't seem to be much difference (mine -> built with this PR, theirs -> built with 85f9368): https://gist.github.com/davars/f70ed4f9f2b2c90429d0b94054c55afa

shdunning · 2018-05-03T14:34:31Z

I plan to spend some time reviewing this today

shdunning

Other than my minor nitpick about changing the text of the error keys, this looks great.

shdunning · 2018-05-03T18:15:47Z

worker.go


-func (w *worker) addToRetry(job *Job, runErr error) {
+func terminateOnly(_ redis.Conn) { return }
+func terminateAndRetry(w *worker, jt *jobType, job *Job) terminateOp {
 	rawJSON, err := job.serialize()
 	if err != nil {
 		logError("worker.add_to_retry", err)


Plz change the error key to worker.terminate_and_retry.serialize.

shdunning · 2018-05-03T18:16:30Z

worker.go

 	rawJSON, err := job.serialize()
-
 	if err != nil {
 		logError("worker.add_to_dead.serialize", err)


Plz change error key to worker.terminate_and_dead.serialize.

shdunning · 2018-05-03T18:23:55Z

worker_pool.go

@@ -39,6 +40,13 @@ type jobType struct {
 	DynamicHandler reflect.Value
 }

+func (jt *jobType) calcBackoff(j *Job) int64 {


Nice! thanks for adding this.

shdunning · 2018-05-03T18:29:51Z

So @davars was the issue that when we exhausted retries the job and the job was marked SkipDead = true then it was never removed from the in progress queue?

It looks like the redis commands used by removeJobFromInProgress were also in addToDead and addToRetry, and the only path through the code where I can see the job might not be removed from the in progress queue is the aforementioned.

Aside from this being a nice refactor, I'm guessing you tracked down a specific path that lead to dead jobs not being cleaned up properly.

davars · 2018-05-03T18:46:47Z

You're correct. In my setup I have my cron jobs enqueued with MaxFails: 1 and SkipDead: true. I was observing those queues getting stuck after an error until that pool was shut down (when the reaper would clean up the lock, presumably).

Working on the error messages now.

Side note, there's a rare case where locks get stuck and aren't cleaned up by the reaper. I haven't tracked that one down yet. I suspect it may be due to concurrent reapers, since it looks like each pool runs reapers without coordination. Maybe the reaper task should be an internal job that's run by the scheduler. Then it would get a job id, lock, be run by one pool per period, etc.

shdunning

👍

davars added 2 commits April 2, 2018 16:11

Remove from InProgress if we don't addToDead

9f7d3c5

shdunning requested changes May 3, 2018

View reviewed changes

Consistent error messages

f416e6c

shdunning approved these changes May 3, 2018

View reviewed changes

shdunning merged commit 72c8f57 into gocraft:master May 3, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUGFIX: Remove from InProgress if we don't addToDead #92

BUGFIX: Remove from InProgress if we don't addToDead #92

davars commented Apr 2, 2018 •

edited by shdunning

Loading

davars commented Apr 3, 2018

davars commented Apr 3, 2018

shdunning commented May 3, 2018

shdunning left a comment

shdunning May 3, 2018

shdunning May 3, 2018

shdunning May 3, 2018

shdunning commented May 3, 2018

davars commented May 3, 2018

shdunning left a comment

BUGFIX: Remove from InProgress if we don't addToDead #92

BUGFIX: Remove from InProgress if we don't addToDead #92

Conversation

davars commented Apr 2, 2018 • edited by shdunning Loading

davars commented Apr 3, 2018

davars commented Apr 3, 2018

shdunning commented May 3, 2018

shdunning left a comment

Choose a reason for hiding this comment

shdunning May 3, 2018

Choose a reason for hiding this comment

shdunning May 3, 2018

Choose a reason for hiding this comment

shdunning May 3, 2018

Choose a reason for hiding this comment

shdunning commented May 3, 2018

davars commented May 3, 2018

shdunning left a comment

Choose a reason for hiding this comment

davars commented Apr 2, 2018 •

edited by shdunning

Loading