Skip to content

Commit 2c26b15

Browse files
authored
Make pandas an optional core dependency (#17575)
We only use `pandas` in `DbApiHook.get_pandas_df`. Not all users use it, plus while `pandas` now supports many pre-compiled packages it still can take forever where it needs to be compiled. So for first-time users this can be a turn off. If pandas is already installed this will work fine, but if not users have an option to run `pip install apache-airflow[pandas]` closes #12500
1 parent e7eeaa6 commit 2c26b15

File tree

11 files changed

+49
-22
lines changed

11 files changed

+49
-22
lines changed

BREEZE.rst

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1315,8 +1315,8 @@ This is the current syntax for `./breeze <./breeze>`_:
13151315
13161316
Production image:
13171317
async,amazon,celery,cncf.kubernetes,docker,dask,elasticsearch,ftp,grpc,hashicorp,
1318-
http,ldap,google,google_auth,microsoft.azure,mysql,postgres,redis,sendgrid,sftp,
1319-
slack,ssh,statsd,virtualenv
1318+
http,ldap,google,google_auth,microsoft.azure,mysql,pandas,postgres,redis,sendgrid,
1319+
sftp,slack,ssh,statsd,virtualenv
13201320
13211321
--image-tag TAG
13221322
Additional tag in the image.
@@ -1914,8 +1914,8 @@ This is the current syntax for `./breeze <./breeze>`_:
19141914
19151915
Production image:
19161916
async,amazon,celery,cncf.kubernetes,docker,dask,elasticsearch,ftp,grpc,hashicorp,
1917-
http,ldap,google,google_auth,microsoft.azure,mysql,postgres,redis,sendgrid,sftp,
1918-
slack,ssh,statsd,virtualenv
1917+
http,ldap,google,google_auth,microsoft.azure,mysql,pandas,postgres,redis,sendgrid,
1918+
sftp,slack,ssh,statsd,virtualenv
19191919
19201920
--image-tag TAG
19211921
Additional tag in the image.
@@ -2501,8 +2501,8 @@ This is the current syntax for `./breeze <./breeze>`_:
25012501
25022502
Production image:
25032503
async,amazon,celery,cncf.kubernetes,docker,dask,elasticsearch,ftp,grpc,hashicorp,
2504-
http,ldap,google,google_auth,microsoft.azure,mysql,postgres,redis,sendgrid,sftp,
2505-
slack,ssh,statsd,virtualenv
2504+
http,ldap,google,google_auth,microsoft.azure,mysql,pandas,postgres,redis,sendgrid,
2505+
sftp,slack,ssh,statsd,virtualenv
25062506
25072507
--image-tag TAG
25082508
Additional tag in the image.

CONTRIBUTING.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -593,8 +593,8 @@ devel_all, devel_ci, devel_hadoop, dingding, discord, doc, docker, druid, elasti
593593
facebook, ftp, gcp, gcp_api, github_enterprise, google, google_auth, grpc, hashicorp, hdfs, hive,
594594
http, imap, jdbc, jenkins, jira, kerberos, kubernetes, ldap, leveldb, microsoft.azure,
595595
microsoft.mssql, microsoft.psrp, microsoft.winrm, mongo, mssql, mysql, neo4j, odbc, openfaas,
596-
opsgenie, oracle, pagerduty, papermill, password, pinot, plexus, postgres, presto, qds, qubole,
597-
rabbitmq, redis, s3, salesforce, samba, segment, sendgrid, sentry, sftp, singularity, slack,
596+
opsgenie, oracle, pagerduty, pandas, papermill, password, pinot, plexus, postgres, presto, qds,
597+
qubole, rabbitmq, redis, s3, salesforce, samba, segment, sendgrid, sentry, sftp, singularity, slack,
598598
snowflake, spark, sqlite, ssh, statsd, tableau, telegram, trino, vertica, virtualenv, webhdfs,
599599
winrm, yandex, zendesk
600600

Dockerfile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,7 @@
3434
# much smaller.
3535
#
3636
ARG AIRFLOW_VERSION="2.2.0.dev0"
37-
ARG AIRFLOW_EXTRAS="async,amazon,celery,cncf.kubernetes,docker,dask,elasticsearch,ftp,grpc,hashicorp,http,ldap,google,google_auth,microsoft.azure,mysql,postgres,redis,sendgrid,sftp,slack,ssh,statsd,virtualenv"
37+
ARG AIRFLOW_EXTRAS="async,amazon,celery,cncf.kubernetes,docker,dask,elasticsearch,ftp,grpc,hashicorp,http,ldap,google,google_auth,microsoft.azure,mysql,pandas,postgres,redis,sendgrid,sftp,slack,ssh,statsd,virtualenv"
3838
ARG ADDITIONAL_AIRFLOW_EXTRAS=""
3939
ARG ADDITIONAL_PYTHON_DEPS=""
4040

INSTALL

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -97,8 +97,8 @@ devel_all, devel_ci, devel_hadoop, dingding, discord, doc, docker, druid, elasti
9797
facebook, ftp, gcp, gcp_api, github_enterprise, google, google_auth, grpc, hashicorp, hdfs, hive,
9898
http, imap, jdbc, jenkins, jira, kerberos, kubernetes, ldap, leveldb, microsoft.azure,
9999
microsoft.mssql, microsoft.psrp, microsoft.winrm, mongo, mssql, mysql, neo4j, odbc, openfaas,
100-
opsgenie, oracle, pagerduty, papermill, password, pinot, plexus, postgres, presto, qds, qubole,
101-
rabbitmq, redis, s3, salesforce, samba, segment, sendgrid, sentry, sftp, singularity, slack,
100+
opsgenie, oracle, pagerduty, pandas, papermill, password, pinot, plexus, postgres, presto, qds,
101+
qubole, rabbitmq, redis, s3, salesforce, samba, segment, sendgrid, sentry, sftp, singularity, slack,
102102
snowflake, spark, sqlite, ssh, statsd, tableau, telegram, trino, vertica, virtualenv, webhdfs,
103103
winrm, yandex, zendesk
104104

UPDATING.md

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -73,6 +73,19 @@ https://developers.google.com/style/inclusive-documentation
7373
7474
-->
7575

76+
### `pandas` is now an optional dependency
77+
78+
Previously `pandas` was a core requirement so when you run `pip install apache-airflow` it looked for `pandas`
79+
library and installed it if it does not exist.
80+
81+
If you want to install `pandas` compatible with Airflow, you can use `[pandas]` extra while
82+
installing Airflow, example for Python 3.8 and Airflow 2.1.2:
83+
84+
```shell
85+
pip install -U "apache-airflow[pandas]==2.1.2" \
86+
--constraint https://raw.githubusercontent.com/apache/airflow/constraints-2.1.2/constraints-3.8.txt"
87+
```
88+
7689
### Dummy trigger rule has been deprecated
7790
7891
`TriggerRule.DUMMY` is replaced by `TriggerRule.ALWAYS`.

airflow/executors/celery_executor.py

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -183,14 +183,18 @@ def on_celery_import_modules(*args, **kwargs):
183183
doesn't matter, but for short tasks this starts to be a noticeable impact.
184184
"""
185185
import jinja2.ext # noqa: F401
186-
import numpy # noqa: F401
187186

188187
import airflow.jobs.local_task_job
189188
import airflow.macros
190189
import airflow.operators.bash
191190
import airflow.operators.python
192191
import airflow.operators.subdag # noqa: F401
193192

193+
try:
194+
import numpy # noqa: F401
195+
except ImportError:
196+
pass
197+
194198
try:
195199
import kubernetes.client # noqa: F401
196200
except ImportError:

airflow/hooks/dbapi.py

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -129,7 +129,10 @@ def get_pandas_df(self, sql, parameters=None, **kwargs):
129129
:param kwargs: (optional) passed into pandas.io.sql.read_sql method
130130
:type kwargs: dict
131131
"""
132-
from pandas.io import sql as psql
132+
try:
133+
from pandas.io import sql as psql
134+
except ImportError:
135+
raise Exception("pandas library not installed, run: pip install 'apache-airflow[pandas]'.")
133136

134137
with closing(self.get_conn()) as conn:
135138
return psql.read_sql(sql, con=conn, params=parameters, **kwargs)

airflow/utils/json.py

Lines changed: 8 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -19,9 +19,13 @@
1919
from datetime import date, datetime
2020
from decimal import Decimal
2121

22-
import numpy as np
2322
from flask.json import JSONEncoder
2423

24+
try:
25+
import numpy as np
26+
except ImportError:
27+
np = None
28+
2529
try:
2630
from kubernetes.client import models as k8s
2731
except ImportError:
@@ -51,7 +55,7 @@ def _default(obj):
5155
# Technically lossy due to floating point errors, but the best we
5256
# can do without implementing a custom encode function.
5357
return float(obj)
54-
elif isinstance(
58+
elif np is not None and isinstance(
5559
obj,
5660
(
5761
np.int_,
@@ -68,9 +72,9 @@ def _default(obj):
6872
),
6973
):
7074
return int(obj)
71-
elif isinstance(obj, np.bool_):
75+
elif np is not None and isinstance(obj, np.bool_):
7276
return bool(obj)
73-
elif isinstance(
77+
elif np is not None and isinstance(
7478
obj, (np.float_, np.float16, np.float32, np.float64, np.complex_, np.complex64, np.complex128)
7579
):
7680
return float(obj)

docs/apache-airflow/extra-packages-ref.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -62,6 +62,8 @@ python dependencies for the provided package.
6262
+---------------------+-----------------------------------------------------+----------------------------------------------------------------------------+
6363
| leveldb | ``pip install 'apache-airflow[leveldb]'`` | Required for use leveldb extra in google provider |
6464
+---------------------+-----------------------------------------------------+----------------------------------------------------------------------------+
65+
| pandas | ``pip install 'apache-airflow[pandas]'`` | Install Pandas library compatible with Airflow |
66+
+---------------------+-----------------------------------------------------+----------------------------------------------------------------------------+
6567
| password | ``pip install 'apache-airflow[password]'`` | Password authentication for users |
6668
+---------------------+-----------------------------------------------------+----------------------------------------------------------------------------+
6769
| rabbitmq | ``pip install 'apache-airflow[rabbitmq]'`` | RabbitMQ support as a Celery backend |

setup.cfg

Lines changed: 0 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -126,9 +126,6 @@ install_requires =
126126
numpy;python_version>="3.7"
127127
# Required by vendored-in connexion
128128
openapi-spec-validator>=0.2.4
129-
# Pandas stopped releasing 3.6 binaries for 1.2.* series.
130-
pandas>=0.17.1, <1.2;python_version<"3.7"
131-
pandas>=0.17.1, <2.0;python_version>="3.7"
132129
pendulum~=2.0
133130
pep562~=1.0;python_version<"3.7"
134131
psutil>=4.2.0, <6.0.0

0 commit comments

Comments
 (0)