Skip to content

DagProcessor Performance Regression #30884

@george-zubrienko

Description

@george-zubrienko

Apache Airflow version

2.5.3

What happened

Upgrading from 2.4.3 to 2.5.3 caused a significant increase in dag processing time on standalone dag processor (~1-2s to 60s):

/opt/airflow/dags/ecco_airflow/dags/image_processing/product_image_load.py                                                0          -1  56.68s          2023-04-26T12:56:15
/opt/airflow/dags/ecco_airflow/dags/known_consumers/known_consumers.py                                                    0          -1  56.64s          2023-04-26T12:56:15
/opt/airflow/dags/ecco_airflow/dags/monitoring/row_counts.py                                                              0          -1  56.67s          2023-04-26T12:56:15
/opt/airflow/dags/ecco_airflow/dags/omnichannel/base.py                                                                   0          -1  56.66s          2023-04-26T12:56:15
/opt/airflow/dags/ecco_airflow/dags/omnichannel/oc_data.py                                                                0          -1  56.67s          2023-04-26T12:56:15
/opt/airflow/dags/ecco_airflow/dags/omnichannel/oc_stream.py                                                              0          -1  56.52s          2023-04-26T12:56:15
/opt/airflow/dags/ecco_airflow/dags/reporting/reporting_data_foundation.py                                                0          -1  56.63s          2023-04-26T12:56:15
/opt/airflow/dags/ecco_airflow/dags/retail_analysis/retail_analysis_dbt.py                                                0          -1  56.66s          2023-04-26T12:56:15
/opt/airflow/dags/ecco_airflow/dags/rfm_segments/rfm_segments.py                                                          0          -1  56.02s          2023-04-26T12:56:15
/opt/airflow/dags/ecco_airflow/utils/airflow.py                                                                           0          -1  56.65s          2023-04-26T12:56:15
/opt/airflow/dags/ecco_airflow/dags/bronze/aad_users_listing.py                                                           1           0  55.51s          2023-04-26T12:56:15
/opt/airflow/dags/ecco_airflow/dags/bronze/funnel_io.py                                                                   1           0  56.13s          2023-04-26T12:56:15
/opt/airflow/dags/ecco_airflow/dags/bronze/iar_param.py                                                                   1           0  56.50s          2023-04-26T12:56:15
/opt/airflow/dags/ecco_airflow/dags/bronze/sfmc_copy.py                                                                   1           0  56.59s          2023-04-26T12:56:15
/opt/airflow/dags/ecco_airflow/dags/bronze/us_legacy_datawarehouse.py                                                     1           0  55.15s          2023-04-26T12:56:15
/opt/airflow/dags/ecco_airflow/dags/cdp/ecco_cdp_auditing.py                                                              1           0  56.54s          2023-04-26T12:56:15
/opt/airflow/dags/ecco_airflow/dags/cdp/ecco_cdp_budget_daily_phasing.py                                                  1           0  56.63s          2023-04-26T12:56:15
/opt/airflow/dags/ecco_airflow/dags/cdp/ecco_cdp_gold_rm_tests.py                                                         1           0  55.00s          2023-04-26T12:56:15
/opt/airflow/dags/ecco_airflow/dags/consumer_entity_matching/graph_entity_matching.py                                     1           0  56.67s          2023-04-26T12:56:15
/opt/airflow/dags/ecco_airflow/dags/data_backup/data_backup.py                                                            1           0  56.69s          2023-04-26T12:56:15
/opt/airflow/dags/ecco_airflow/dags/hive/adhoc_entity_publish.py                                                          1           0  55.33s          2023-04-26T12:56:15
/opt/airflow/dags/ecco_airflow/dags/image_regression/train.py                                                             1           0  56.63s          2023-04-26T12:56:15
/opt/airflow/dags/ecco_airflow/dags/maintenance/db_maintenance.py                                                         1           0  56.58s          2023-04-26T12:56:15

Also seeing messages like these

[2023-04-26T12:56:15.322+0000] {manager.py:979} DEBUG - Processor for /opt/airflow/dags/ecco_airflow/dags/bronze/us_legacy_datawarehouse.py finished
[2023-04-26T12:56:15.323+0000] {processor.py:296} DEBUG - Waiting for <ForkProcess name='DagFileProcessor68-Process' pid=116 parent=7 stopped exitcode=0>
[2023-04-26T12:56:15.323+0000] {manager.py:979} DEBUG - Processor for /opt/airflow/dags/ecco_airflow/dags/cdp/ecco_cdp_gold_rm_tests.py finished
[2023-04-26T12:56:15.323+0000] {processor.py:296} DEBUG - Waiting for <ForkProcess name='DagFileProcessor69-Process' pid=122 parent=7 stopped exitcode=0>
[2023-04-26T12:56:15.324+0000] {manager.py:979} DEBUG - Processor for /opt/airflow/dags/ecco_airflow/dags/bronze/streaming/sap_inventory_feed.py finished
[2023-04-26T12:56:15.324+0000] {processor.py:314} DEBUG - Waiting for <ForkProcess name='DagFileProcessor70-Process' pid=128 parent=7 stopped exitcode=-SIGKILL>
[2023-04-26T12:56:15.324+0000] {manager.py:986} ERROR - Processor for /opt/airflow/dags/ecco_airflow/dags/bronze/streaming/sap_inventory_feed.py exited with return code -9.

In 2.4.3:

/opt/airflow/dags/ecco_airflow/dags/image_regression/train.py                                                             1           0  1.34s           2023-04-26T14:19:08
/opt/airflow/dags/ecco_airflow/dags/known_consumers/known_consumers.py                                                    1           0  1.12s           2023-04-26T14:19:00
/opt/airflow/dags/ecco_airflow/dags/maintenance/db_maintenance.py                                                         1           0  0.63s           2023-04-26T14:18:27
/opt/airflow/dags/ecco_airflow/dags/monitoring/row_counts.py                                                              1           0  3.74s           2023-04-26T14:18:45
/opt/airflow/dags/ecco_airflow/dags/omnichannel/oc_data.py                                                                1           0  1.21s           2023-04-26T14:18:47
/opt/airflow/dags/ecco_airflow/dags/omnichannel/oc_stream.py                                                              1           0  1.22s           2023-04-26T14:18:30
/opt/airflow/dags/ecco_airflow/dags/reporting/reporting_data_foundation.py                                                1           0  1.39s           2023-04-26T14:19:08
/opt/airflow/dags/ecco_airflow/dags/retail_analysis/retail_analysis_dbt.py                                                1           0  1.32s           2023-04-26T14:18:51
/opt/airflow/dags/ecco_airflow/dags/rfm_segments/rfm_segments.py                                                          1           0  1.20s           2023-04-26T14:18:34

What you think should happen instead

Dag processing time remains unchanged

How to reproduce

Provision Airflow with the following settings:

Airflow 2.5.3

  • K8s 1.25.6
  • Kubernetes executor
  • Postgres backend (Postgres 11.0)
  • Deploy using Airflow Helm v1.9.0 with image 2.5.3-python3.9
    • pgbouncer enabled
    • standalone dag processort with 3500m cpu / 4000Mi memory, single replica
    • dags and logs mounted from RWM volume (Azure files)

Airflow 2.4.3

  • K8s 1.25.6
  • Kubernetes executor
  • Postgres backend (Postgres 11.0)
  • Deploy using Airflow Helm v1.7.0 with image 2.4.3-python3.9
    • pgbouncer enabled
    • standalone dag processort with 2500m cpu / 2000Mi memory, single replica
    • dags and logs mounted from RWM volume (Azure files)

Image modifications

We use image built from apache/airflow:2.4.3-python3.9, with some dependencies added/reinstalled with different versions.

Poetry dependency spec:

For 2.5.3:

[tool.poetry.dependencies]
python = ">=3.9,<3.11"
authlib = "~1.0.1"
adapta = { version = "==2.2.3", extras = ["azure", "storage"] }
numpy = "==1.23.3"
db-dtypes = "~1.0.4"
gevent = "^21.12.0"
sqlalchemy = ">=1.4,<2.0"
snowflake-sqlalchemy = ">=1.4,<2.0"
esd-services-api-client = "~0.6.0"
apache-airflow-providers-common-sql = "~1.3.1"
apache-airflow-providers-databricks = "~3.1.0"
apache-airflow-providers-google = "==8.4.0"
apache-airflow-providers-microsoft-azure = "~5.2.1"
apache-airflow-providers-datadog = "~3.0.0"
apache-airflow-providers-snowflake = "~3.3.0"
apache-airflow = "==2.5.3"
dataclasses-json = ">=0.5.7,<0.6"

For 2.4.3:

[tool.poetry.dependencies]
python = ">=3.9,<3.11"
authlib = "~1.0.1"
adapta = { version = "==2.2.3", extras = ["azure", "storage"] }
numpy = "==1.23.3"
db-dtypes = "~1.0.4"
gevent = "^21.12.0"
sqlalchemy = ">=1.4,<2.0"
snowflake-sqlalchemy = ">=1.4,<2.0"
esd-services-api-client = "~0.6.0"
apache-airflow-providers-common-sql = "~1.3.1"
apache-airflow-providers-databricks = "~3.1.0"
apache-airflow-providers-google = "==8.4.0"
apache-airflow-providers-microsoft-azure = "~5.2.1"
apache-airflow-providers-datadog = "~3.0.0"
apache-airflow-providers-snowflake = "~3.3.0"
apache-airflow = "==2.4.3"
dataclasses-json = ">=0.5.7,<0.6"

Operating System

Container OS: Debian GNU/Linux 11 (bullseye)

Versions of Apache Airflow Providers

apache-airflow-providers-amazon==6.0.0
apache-airflow-providers-celery==3.0.0
apache-airflow-providers-cncf-kubernetes==4.4.0
apache-airflow-providers-common-sql==1.3.4
apache-airflow-providers-databricks==3.1.0
apache-airflow-providers-datadog==3.0.0
apache-airflow-providers-docker==3.2.0
apache-airflow-providers-elasticsearch==4.2.1
apache-airflow-providers-ftp==3.3.1
apache-airflow-providers-google==8.4.0
apache-airflow-providers-grpc==3.0.0
apache-airflow-providers-hashicorp==3.1.0
apache-airflow-providers-http==4.3.0
apache-airflow-providers-imap==3.1.1
apache-airflow-providers-microsoft-azure==5.2.1
apache-airflow-providers-mysql==3.2.1
apache-airflow-providers-odbc==3.1.2
apache-airflow-providers-postgres==5.2.2
apache-airflow-providers-redis==3.0.0
apache-airflow-providers-sendgrid==3.0.0
apache-airflow-providers-sftp==4.1.0
apache-airflow-providers-slack==6.0.0
apache-airflow-providers-snowflake==3.3.0
apache-airflow-providers-sqlite==3.3.2
apache-airflow-providers-ssh==3.2.0

Deployment

Official Apache Airflow Helm Chart

Deployment details

See How-to-reproduce section

Anything else

Occurs by upgrading the helm chart from 1.7.0/2.4.3 to 1.9.0/2.5.3 installation.

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions