Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Job Queue #585

Open
qianl15 opened this issue Aug 12, 2024 · 1 comment
Open

Job Queue #585

qianl15 opened this issue Aug 12, 2024 · 1 comment
Labels
enhancement New feature or request

Comments

@qianl15
Copy link
Member

qianl15 commented Aug 12, 2024

A feature request from Discord: https://discord.com/channels/1156433345631232100/1272247538556080281

Ideally I wouldn't need a job queue at all and do everything in parallel, but rate limits on api's and limits like 256 instances pr. service on Unikraft makes this problematic down the road. Lambda has a 15 minute timeout, and I'd rather not deal with AWS Batch and ECS or EC2. One might also not be able to deploy ML inference services in a serverless way, requiring a job queue to make sure they don't get overloaded.

Some of the managed products for this are hatchet.run and riverqueue.com. They both support workflows (albeit in a limited fashion), but they're also job queues that can limit concurrency. They're quite expensive compared to the resources you get: Hatchet is 150$/month for 4 concurrent workers, and I could probably get 200 for that price on Unikraft.

Take this one for example https://github.com/uwiger/jobs with the following features:

  • Job scheduling: A job is scheduled according to certain constraints. For instance, you may want to define that no more than 9 jobs of a certain type can execute simultaneously and the maximal rate at which you can start such jobs are 300 per second.
  • Job queueing: When load is higher than the scheduling limits additional jobs are queued by the system to be run later when load clears. Certain rules govern queues: are they dequeued in FIFO or LIFO order? How many jobs can the queue take before it is full? Is there a deadline after which jobs should be rejected. When we hit the queue limits we reject the job. This provides a feedback mechanism on the client of the queue so you can take action.
  • Sampling and dampening: Periodic samples of the Erlang VM can provide information about the health of the system in general. If we have high CPU load or high memory usage, we apply dampening to the scheduling rules: we may lower the concurrency count or the rate at which we execute jobs. When the health problem clears, we remove the dampener and run at full speed again.
@qianl15 qianl15 added the enhancement New feature or request label Aug 12, 2024
@chuck-dbos
Copy link
Collaborator

Some of these use cases are covered now (in 1.24), but not all of them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants