Fix RQ wrongly moving jobs to FailedJobRegistry #7186

thiagogds · 2024-10-09T08:21:35Z

What type of PR is this?

Bug Fix

Description

Something changed in the newer version of python-rq, and because Redash was overwriting parts of it I started to see that if a job ran for longer than 2 minutes, it would be automatically set as failed but continue running until it completed as success.

It fails with the error: Moved to FailedJobRegistry, due to AbandonedJobError

This causes a problem in the UI because it is as if the job failed, so the query results won't load.

I updated the code with the new version of rq and kept the changes from Redash. Given the changes in the code, it looks like rq changed how it sends a heartbeat for running queries.

How is this tested?

Manually

To be able to reproduce locally I needed to:

Create a second worker
Create 2 queries, one that would take more than 2 minutes and a faster one - I used PostgreSQL data source for the tests
While the long query was running I ran the faster query at around 1:46min and then the problem showed.

After the code changes it's now working.

Related Tickets & Documents

Something changed in python-rq and the old code was behaving in a way that if a job ran for longer than 2 min it would be automatically set as failed, but it would continue running. This causes a problem in the UI because it is as if the job stopped, but it actually didn't. getredash#7186

eradman · 2024-10-16T13:45:57Z

@thiagogds can you rebase this change? We fixed the restyled app error by replacing it with a github action.

Something changed in python-rq and the old code was behaving in a way that if a job ran for longer than 2 min it would be automatically set as failed, but it would continue running. This causes a problem in the UI because it is as if the job stopped, but it actually didn't

thiagogds · 2024-10-17T13:52:01Z

@eradman done :)

eradman

I have been manually testing this change, and it appears to be good.
It seems that to reproduce this problem multiple queries must be run simultaneously.

thiagogds · 2024-10-17T14:31:17Z

@eradman Cool! Just to add there, I've been running a fork of Redash with this fix for a few weeks and so far it's good and I don't have the problem anymore :)

eradman · 2024-10-17T17:30:20Z

Merged. Thanks @thiagogds

Something changed in python-rq and the old code was behaving in a way that if a job ran for longer than 2 min it would be automatically set as failed, but it would continue running. This causes a problem in the UI because it is as if the job stopped, but it actually didn't

thiagogds force-pushed the feature/fix-redash-abandoned-error branch from df94793 to 2a7789a Compare October 9, 2024 08:35

thiagogds changed the title ~~Update overwriten code with newest changes~~ Fix RQ wrongly moving jobs to FailedJobRegistry Oct 9, 2024

eradman mentioned this pull request Oct 15, 2024

Move restyled to a github action #7191

Merged

2 tasks

thiagogds force-pushed the feature/fix-redash-abandoned-error branch from 2a7789a to 3003649 Compare October 17, 2024 13:51

eradman approved these changes Oct 17, 2024

View reviewed changes

eradman merged commit 04a25f4 into getredash:master Oct 17, 2024
11 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix RQ wrongly moving jobs to FailedJobRegistry #7186

Fix RQ wrongly moving jobs to FailedJobRegistry #7186

thiagogds commented Oct 9, 2024

eradman commented Oct 16, 2024

thiagogds commented Oct 17, 2024

eradman left a comment

thiagogds commented Oct 17, 2024

eradman commented Oct 17, 2024

Fix RQ wrongly moving jobs to FailedJobRegistry #7186

Fix RQ wrongly moving jobs to FailedJobRegistry #7186

Conversation

thiagogds commented Oct 9, 2024

What type of PR is this?

Description

How is this tested?

Related Tickets & Documents

eradman commented Oct 16, 2024

thiagogds commented Oct 17, 2024

eradman left a comment

Choose a reason for hiding this comment

thiagogds commented Oct 17, 2024

eradman commented Oct 17, 2024