Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow custom job ids to be specified in the job options #335

Merged
merged 10 commits into from
Aug 22, 2016

Conversation

campriceaustin
Copy link

No description provided.

@@ -56,6 +56,13 @@ describe('Job', function(){
expect(storedJob.opts.testOpt).to.be('enabled');
});
});

it('should use the custom jobId if one is provided', function() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test case suggestion:

Create a job with a customId and test if it can be processed, in the process callback, verify that the custom jobId matches the one provided

@campriceaustin campriceaustin mentioned this pull request Aug 19, 2016
@bradvogel
Copy link
Contributor

Add a bit to the readme too

@manast
Copy link
Member

manast commented Aug 19, 2016

I have mixed feelings about this functionality. From one side it seems like something that should be nice to give to certain pro-users, but on the other hand, it may also create a new flora of issues if the user is not able to generate unique ids for every job. I haven't understood yet which consequences it would have to add several jobs using the same jobId... Isn't there a different way to achieve what you want without custom ids?

@campriceaustin
Copy link
Author

@manast The general use case is having a concurrency-safe way to add a job if it doesn't already exist, based on some unique value (in our case the user id).

In our case, we're dealing with hundreds of thousands of jobs per hour, and the load on Redis has been pretty intense. We needed a way to avoid adding a job to the queue if one already exists for the user.

I've deployed our branch running this code to production and the results have been great - we've gone from ~90% CPU utilisation to < 10%.

In terms of other possible ways to achieve the same thing: we could do a transaction script which loops through jobs looking for a match, and inserts it it doesn't find one, but that's slow.

We could also keep a secondary key as a kind of lock to indicate whether a job with that unique value already exists, but that introduces other issues: what happens if some failure occurs and the lock is left orphaned? We'd need some kind of TTL, and it all gets rather complicated.

Being able to override the key seemed the best solution to me. It lets us determine whether a job with that unique value already exists in constant-time, and doesn't have any of the issues around locking.

@manast
Copy link
Member

manast commented Aug 19, 2016

I am not sure I understand your use case completely, but what about this: when you call the addJob method you get the unique ID for that given job. If you stored that ID in a redis database using the userId as key, wouldn't you then be able to test in O(1) if the users' job exists already? You will of course need some code to update the database when jobs are completed/failed, removed and so on.

@campriceaustin
Copy link
Author

@manast Unfortunately, that wouldn't be concurrency safe.

We'd need to:

  1. Lookup to see if the user's job exists already
  2. If not, add it. If so, ignore it.

Another thread could create or remove the job in between 1 and 2.

We'd have the same problem when removing the lock at the end of the job too, plus the possibility that the job processes but some failure interferes with removing the lock, hence preventing all future jobs from processing for that user, requiring us to build a more complex system of TTLs etc.

@manast
Copy link
Member

manast commented Aug 19, 2016

ok. But how do you avoid points 1 and 2 using custom ids in bull?

@campriceaustin
Copy link
Author

@manast We wrap 1 and 2 in a transaction. Because we have a known key, we can perform step 1 using an EXISTS command (L#84 in scripts.js), then conditionally insert depending on the result.

(Note: I've just realised we'll need to execute these within a multi call to guarantee atomicity - I'll commit that soon).

@manast
Copy link
Member

manast commented Aug 19, 2016

why do you need multi? If the jobId exists already it just returns false or -1 for example, and since it is run in a script it will be atomic.

@campriceaustin
Copy link
Author

@manast Oh, are scripts atomic by default? I thought they needed to be wrapped in a multi call. If not, then great! No need to make that change then.

})

var script = [
'local jobId = redis.call("INCR", KEYS[5])',
'local jobId = ARGV[2] == "" and redis.call("INCR", KEYS[5]) or ARGV[2]',
'redis.call("HMSET", ARGV[1] .. jobId' + argvs.join('') + ')',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use if/else here for readability ?

@campriceaustin
Copy link
Author

@manast Cool, good suggestions. I have implemented them.

@manast manast merged commit 333d007 into OptimalBits:master Aug 22, 2016
@bradvogel bradvogel mentioned this pull request Aug 22, 2016
@TomKaltz
Copy link

TomKaltz commented Sep 18, 2016

@pricj004 @manast I will be using this feature to make sure duplicate jobs don't get added to my queue. When submitting a new job with the same ID as a previously FAILED job, the new job is not added to the queue. Is this desired behavior for most? I would really like it to retry/rerun the failed job if I submit it again. Can someone explain why the current behavior was chosen?

@manast
Copy link
Member

manast commented Sep 19, 2016

@TomKaltz couldn't you just use the retry functionality? or listen to the fail event and delete the job in that case?

@TomKaltz
Copy link

@manast your suggestion would work I guess. I just think it would be a good feature for duplicate submitted jobId to automatically retry if it had previously failed. Currently job is ignored if ID exists and has previously failed. I see more use-case for my feature request than current functionality of ignoring the job in that state. Is there any reason why bull can't adopt this functionality?

@manast
Copy link
Member

manast commented Sep 20, 2016

@TomKaltz mostly that it requires some extra work and I guess nobody has time to do it right now...

@bradvogel bradvogel deleted the cameron/allow-custom-job-ids branch October 3, 2016 22:51
@spiritinlife spiritinlife mentioned this pull request Oct 15, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants