Skip to content

Conversation

@mark-rushakoff
Copy link

We found a race condition where two workers would occasionally both complete the same job. The first worker to finish the job finishes successfully, but the second worker would try to delete the job (thinking it finished successfully) and that worker process would crash as a result.

This pull request addresses this race condition.

@JonathanTron
Copy link
Member

Hi @mark-rushakoff, thanks a lot for this pull-request!

I've checked the reason of the spec failure and despite your changes look great it does not cover the full delayed_job's expectations about Delayed::Job.reserve. For instance a job can be reserved again even when reserved by another worker if it took longer than the Delayed::Worker.max_run_time.

I think the problem lies somewhere else, I will push a change I made thanks to your PR, it would be great if you can test it and tell me if it fixes your problem.

JonathanTron added a commit that referenced this pull request Oct 29, 2013
@cf-frameworks
Copy link

We are still seeing errors with your newest code: Please see PR: #3 for tests that show the error.

@JonathanTron
Copy link
Member

Thanks a lot for the test case, I will check it out.

Out of curiosity what db are you using?

@mark-rushakoff
Copy link
Author

We've been testing with (and seeing issues in production with) MySQL.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants