Description
Is your feature request related to a problem? Please describe.
My application is a batch-oriented ETL pipeline. There are different async activities that interact with a different third-party API throughout the workflow. For example, the geocoding
activity uses Google Places API, another activity might use the OpenAI API, etc.
Each activity is subject to external rate limits.
While I can control rate limiting within my code, I can't easily control the concurrency of activities executed by the worker. This means when multiple activities of the same type are running, they are in contention for the rate limit, and simply take longer to finish or worse the activity can end up timing out due to the global throttling.
My solution so far has been to deploy a separate worker for each activity and set the max_concurrent_activities
worker option to control the concurrency and allow one or two activities to complete as quick as possible. However, this has created a lot of DevOps overhead and it can be hard for people on the team to set the worker up correctly when they add a new activity that has a rate limit.
Describe the solution you'd like
I saw the experimental WorkerTuner
option added in this PR: #559.
It would be great for this to support tuning for specific activities, e.g. defining the number of slots for a given activity name. This would allow a single async worker to handle all of my activities but only let a controllable number of each type run at any given time.
Additional context
I saw there is a Go- (and Java I think) based example for creating a Mutex Workflow that could control concurrency globally. This might be a better overall solution, but I struggled to apply this pattern in Python. I wouldn't expect the worker tuning to prevent contention if you scaled out the number of workers. However, in my solution, I have specifically disabled autoscaling on this worker for this reason, so controlling this at the worker level would be a lot more simple.