-
Notifications
You must be signed in to change notification settings - Fork 58
Optimize job dequeue logic and indexing for massive throughput and CP… #136
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Please use consistent formatting with the surrounding code. |
Define consistent formatting. I see at least 3 different sql formatted styles in the existing codebase: q{sql} Do you want me to remove the leading and trailing spaces and/or newlines? |
Those are chosen based on characters in the SQL string.
Yes. We want it to look like the same person wrote the whole file. |
lib/Minion/Backend/Pg.pm
Outdated
RETURNING id, args, retries, task}, $id, $options->{id}, $options->{min_priority}, | ||
$options->{queues} || ['default'], [keys %{$self->minion->tasks}] | ||
q{ | ||
-- Set lock timeout to prevent long waits (50ms) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The commend doesn't really add any information the code doesn't already contain.
WHERE state = 'inactive'; | ||
|
||
CREATE INDEX ON minion_jobs USING GIN (parents) | ||
WHERE state = 'inactive'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pretty sure this can just be one line. And move the newline above back below this line.
Looks promising, once tests pass i'll do a full review. |
I modified the formatting. I can't tell where the perl tidy tests are failing. As an update in my environment, the modifications continue to deliver excellent results: the queue is healthy, throughput is high, and the system is stable and efficient under sustained load. Performance Update (Apr 23, 2025):
I'd also like to note that the response times for other utility queries used in the admin ui for stats and history have dramatically improved. This is likely due to the reduced resource usage on the database. |
Just click on the failing test, the output shows all the perltidy errors. The perltidyrc we use is included in the repo. |
And please squash your commits. |
The test output is very clear about what needs to be fixed: |
…U efficiency Refactored the job dequeue SQL to use a CTE with FOR UPDATE SKIP LOCKED, reducing lock contention and improving concurrency. Introduced a composite partial index on (state, queue, priority DESC, delayed, id) for state='inactive', dramatically increasing index scan efficiency for dequeue queries. Added targeted GIN index on parents[] for state='inactive', accelerating parent dependency checks. Set a local lock timeout to prevent worker stalls on lock contention. Result: CPU usage dropped from ~90% to ~35% immediately after deployment, while job throughput increased by 81%. Queue backlog and job latency both fell sharply, with worker utilization now near optimal. These changes enable the system to process >140K jobs/hour on existing hardware, with headroom for further scaling.
The first test was very clear, I fixed that test issue there. The second test I don't find very clear. |
Just run perltidy with the perltidyrc from this repo. |
The commands to do it are right in the workflow. |
I don't think these perl tidy issues are related to my PR. I checked out a completely untouched version from this repo and those tests still fail:
These tests all appear to be related to the UI. |
There is really no point arguing about who should fix what. Just make the tests pass. This will take a whole lot more time if you wait for someone else to do it, that PR getting merged, and you having to rebase afterwards... |
Optimize job dequeue logic and indexing for massive throughput and CPU efficiency
FOR UPDATE SKIP LOCKED
, reducing lock contention and improving concurrency.(state, queue, priority DESC, delayed, id)
forstate='inactive'
, dramatically increasing index scan efficiency for dequeue queries.parents[]
forstate='inactive'
, accelerating parent dependency checks.Summary
This PR restructures the Minion job queue’s PostgreSQL backend to eliminate critical bottlenecks in high-throughput environments. By migrating from a subquery-based dequeue pattern to a CTE-driven approach with strategic indexing, we achieved over 3x greater resource efficiency on a 20-core Intel Xeon Gold 6338 server running PostgreSQL 14.15. The optimizations reduced CPU utilization from 88.7% to 37.7% while simultaneously increasing job processing rates from 78K to 141K jobs/hour. Queue backlogs (inactive/delayed jobs) decreased by 60%, and worker utilization improved from 22% to 87.5%, all without hardware changes.
Motivation
Prior to these changes, the system exhibited severe inefficiencies:
These optimizations were critical to avoid costly horizontal scaling and ensure reliable job processing for high-priority workloads. The CTE-based locking and composite indexes directly address the root cause of contention, enabling the system to leverage existing hardware fully.
Results and Evidence
Database Server Architecture
Performance Metrics
Hourly Throughput History (Selected 24h Window)
References
Conclusion:
This PR delivers a transformative improvement in queue throughput, latency, and efficiency, enabling the system to process over 140K jobs/hour at less than 40% CPU utilization on current hardware, while leaving substantial headroom for further scaling.