You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When the scheduler send task to celery, if there is only 1 task in the current cycle, the task will be sent to the main thread; if there are multiple tasks, a thread pool will be created based on the number of CPU cores, and then all tasks will be consumed by the thread pool.
There are some problems with the current implementation. The scheduler creates a thread pool every time it schedules, which will bring a very large performance overhead. In fact, the thread pool can be reused.
When I tested, sometimes it would take almost 4 seconds to consume 32 tasks.
Apache Airflow version
2.9.1
If "Other Airflow 2 version" selected, which one?
No response
What happened?
When the scheduler send task to celery, if there is only 1 task in the current cycle, the task will be sent to the main thread; if there are multiple tasks, a thread pool will be created based on the number of CPU cores, and then all tasks will be consumed by the thread pool.
There are some problems with the current implementation. The scheduler creates a thread pool every time it schedules, which will bring a very large performance overhead. In fact, the thread pool can be reused.
When I tested, sometimes it would take almost 4 seconds to consume 32 tasks.
What you think should happen instead?
Reuse
ProcessPoolExecutor
inCeleryExecutor
How to reproduce
always
Operating System
macOS
Versions of Apache Airflow Providers
No response
Deployment
Official Apache Airflow Helm Chart
Deployment details
No response
Anything else?
No response
Are you willing to submit PR?
Code of Conduct
The text was updated successfully, but these errors were encountered: