Skip to content

Fix RQ wrongly moving jobs to FailedJobRegistry #7186

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

thiagogds
Copy link
Contributor

What type of PR is this?

  • Bug Fix

Description

Something changed in the newer version of python-rq, and because Redash was overwriting parts of it I started to see that if a job ran for longer than 2 minutes, it would be automatically set as failed but continue running until it completed as success.

It fails with the error: Moved to FailedJobRegistry, due to AbandonedJobError

This causes a problem in the UI because it is as if the job failed, so the query results won't load.

I updated the code with the new version of rq and kept the changes from Redash. Given the changes in the code, it looks like rq changed how it sends a heartbeat for running queries.

How is this tested?

  • Manually

To be able to reproduce locally I needed to:

  • Create a second worker
  • Create 2 queries, one that would take more than 2 minutes and a faster one - I used PostgreSQL data source for the tests
  • While the long query was running I ran the faster query at around 1:46min and then the problem showed.

After the code changes it's now working.

Related Tickets & Documents

@thiagogds thiagogds force-pushed the feature/fix-redash-abandoned-error branch from df94793 to 2a7789a Compare October 9, 2024 08:35
@thiagogds thiagogds changed the title Update overwriten code with newest changes Fix RQ wrongly moving jobs to FailedJobRegistry Oct 9, 2024
@eradman eradman mentioned this pull request Oct 15, 2024
2 tasks
eradman added a commit to StarfishStorage/redash that referenced this pull request Oct 15, 2024
Something changed in python-rq and the old code was behaving in a way
that if a job ran for longer than 2 min it would be automatically set as
failed, but it would continue running.

This causes a problem in the UI because it is as if the job stopped, but
it actually didn't.

getredash#7186
@eradman
Copy link
Collaborator

eradman commented Oct 16, 2024

@thiagogds can you rebase this change? We fixed the restyled app error by replacing it with a github action.

Something changed in python-rq and the old code was behaving in a way that if a job ran for longer than 2 min it would be automatically set as failed, but it would continue running.

This causes a problem in the UI because it is as if the job stopped, but it actually didn't
@thiagogds thiagogds force-pushed the feature/fix-redash-abandoned-error branch from 2a7789a to 3003649 Compare October 17, 2024 13:51
@thiagogds
Copy link
Contributor Author

@eradman done :)

Copy link
Collaborator

@eradman eradman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have been manually testing this change, and it appears to be good.
It seems that to reproduce this problem multiple queries must be run simultaneously.

@thiagogds
Copy link
Contributor Author

@eradman Cool! Just to add there, I've been running a fork of Redash with this fix for a few weeks and so far it's good and I don't have the problem anymore :)

@eradman eradman merged commit 04a25f4 into getredash:master Oct 17, 2024
11 checks passed
@eradman
Copy link
Collaborator

eradman commented Oct 17, 2024

Merged. Thanks @thiagogds

harveyrendell pushed a commit to pushpay/redash that referenced this pull request Jan 8, 2025
Something changed in python-rq and the old code was behaving in a way that if a job ran for longer than 2 min it would be automatically set as failed, but it would continue running.

This causes a problem in the UI because it is as if the job stopped, but it actually didn't
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants