Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Job data not passed to worker, all queue jobs removed and excessive amount of requests sent to redis server #2763

Open
1 task done
samundrak opened this issue Sep 7, 2024 · 5 comments

Comments

@samundrak
Copy link

Version

v3.10.3

Platform

NodeJS

What happened?

I have been facing this issue for a week now. A few changes I made included increasing the delay and adding a limiter. After these changes, I started encountering issues (Not sure if this is the reason), such as the job data not being passed to the worker—it was basically empty. To fix this temporarily, I removed a few queues, but another problem arose: all the queue data was suddenly being removed, which is causing serious issues in production.

I couldn't pinpoint the problem, so I switched from AWS ElastiCache to a self-hosted Redis to ensure the settings were configured correctly according to the documentation. It worked well for a few days, but then the issue of the queue being automatically removed started again. I did some debugging, checked the logs using RedisInsight, and discovered that an excessive number of requests were being sent to the Redis server.

Framework: NestJS ^9.0.0

Screen.Recording.2024-09-07.at.19.03.10.mov

How to reproduce.

Not able to reproduce it on dev environment or in local environment.

Relevant log output

No response

Code of Conduct

  • I agree to follow this project's Code of Conduct
@samundrak samundrak added the bug Something isn't working label Sep 7, 2024
@manast
Copy link
Contributor

manast commented Sep 7, 2024

The number of requests are probably just normal.
Seems like you having several issues and you are conflating them which makes it more difficult to solve them. I suggest you take every issue as separate things. For example, jobs with empty data, that would be one thing, try to isolate the problem, most likely this issue is in your own code, set debug logs an try to figure out if you really are setting the data before adding the jobs.
All queue data removed also sounds like you have some code that is removing the queues, maybe some test/debug leftovers code, or maybe you are not configuring the maxmemory policy of your queue appropriately, although this normally would not result in all data removed.

@manast manast removed the bug Something isn't working label Sep 7, 2024
@samundrak
Copy link
Author

samundrak commented Sep 7, 2024

@manast
Thank you for the response.

So far, I don't have any explicit code to remove items from the queue, and the only queue removal setting is the default one, which is done after completion or failure by Bull. The job data being removed might have been resolved after I refactored my code from the NestJS process decorator to an explicit worker class, but the issue of queue data getting removed remains frequent. We can't add any items to the queue, as they are removed immediately. I thought the excessive number of requests could also be the cause, as it was sending many DEL commands to the Redis server when I used throttle/limiter.
Additionally, the amount of requests seems normal when I check my development environment. However, even when there's no load, the number of requests sent to Redis in production is still quite high.

// Request sent to Redis when throttled
image

// Worker Implementation
image

// Queue settings

    queue: {
      removeOnComplete: {
        age: 3600 * 12, // keep up to 1 hour
        count: 1000, // keep up to 1000 jobs
      },
      removeOnFail: {
        age: 48 * 3600, // keep up to 48 hours
      },
      delay: 5000,
      attempts: 3,
      backoff: {
        type: 'exponential',
        delay: 60000,
      },
    },

@manast
Copy link
Contributor

manast commented Sep 8, 2024

I am quite confident the issue with missing data is not a bug in BullMQ.

@roggervalf
Copy link
Collaborator

Hi @samundrak, you are using quite old version, could you pls try to use the latest one and let us know

@samundrak
Copy link
Author

Thank you for the response

@manast I was thinking same for most of the the time, but I couldn't find place in the implementation that could be the reason for all queue data being removed. One thing I did recently was to remove the throttle in worker settings and since then it hasn't occurred but am still giving some time to confirm.

@roggervalf
I am not sure about the old version being an issue but I think now I should work on updating it.

I will update you on the status once I update the versions.
Thank you for the help

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants