-
Notifications
You must be signed in to change notification settings - Fork 4.5k
Fix RQ wrongly moving jobs to FailedJobRegistry #7186
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix RQ wrongly moving jobs to FailedJobRegistry #7186
Conversation
df94793
to
2a7789a
Compare
Something changed in python-rq and the old code was behaving in a way that if a job ran for longer than 2 min it would be automatically set as failed, but it would continue running. This causes a problem in the UI because it is as if the job stopped, but it actually didn't. getredash#7186
@thiagogds can you rebase this change? We fixed the restyled app error by replacing it with a github action. |
Something changed in python-rq and the old code was behaving in a way that if a job ran for longer than 2 min it would be automatically set as failed, but it would continue running. This causes a problem in the UI because it is as if the job stopped, but it actually didn't
2a7789a
to
3003649
Compare
@eradman done :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have been manually testing this change, and it appears to be good.
It seems that to reproduce this problem multiple queries must be run simultaneously.
@eradman Cool! Just to add there, I've been running a fork of Redash with this fix for a few weeks and so far it's good and I don't have the problem anymore :) |
Merged. Thanks @thiagogds |
Something changed in python-rq and the old code was behaving in a way that if a job ran for longer than 2 min it would be automatically set as failed, but it would continue running. This causes a problem in the UI because it is as if the job stopped, but it actually didn't
What type of PR is this?
Description
Something changed in the newer version of python-rq, and because Redash was overwriting parts of it I started to see that if a job ran for longer than 2 minutes, it would be automatically set as failed but continue running until it completed as success.
It fails with the error:
Moved to FailedJobRegistry, due to AbandonedJobError
This causes a problem in the UI because it is as if the job failed, so the query results won't load.
I updated the code with the new version of rq and kept the changes from Redash. Given the changes in the code, it looks like rq changed how it sends a heartbeat for running queries.
How is this tested?
To be able to reproduce locally I needed to:
After the code changes it's now working.
Related Tickets & Documents