Skip to content

Conversation

@kaxil
Copy link
Member

@kaxil kaxil commented Aug 12, 2021

We only use pandas in DbApiHook.get_pandas_df. Not all users use it, plus
while pandas now supports many pre-compiled packages it still can take forever where
it needs to be compiled.

So for first-time users this can be a turn off. If pandas is already installed this
will work fine, but if not users have an option to run pip install apache-airflow[pandas]

Related #12500


^ Add meaningful description above

Read the Pull Request Guidelines for more information.
In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.

Copy link
Member

@potiuk potiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huge yes.

@github-actions
Copy link

The PR most likely needs to run full matrix of tests because it modifies parts of the core of Airflow. However, committers might decide to merge it quickly and take the risk. If they don't merge it quickly - please rebase it to the latest main at your convenience, or amend the last commit of the PR, and push it with --force-with-lease.

@github-actions github-actions bot added the full tests needed We need to run full set of tests for this PR to merge label Aug 12, 2021
Copy link
Member

@potiuk potiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just realized -we also need to add it to set of extras in production Dockerfile (to keep it behave the way it did).

ARG AIRFLOW_EXTRAS="async,amazon,celery,cncf.kubernetes,docker,dask,elasticsearch,ftp,grpc,hashicorp,http,ldap,google,google_auth,microsoft.azure,mysql,postgres,redis,sendgrid,sftp,slack,ssh,statsd,virtualenv"

@potiuk
Copy link
Member

potiuk commented Aug 12, 2021

And the Dockerfile will continue to work for old airflow versions as well. AFAIK when you specifiy extra that is missing, it will be silently ignored, so it will work for pre 2.2 Airflow versions as well.

@ashb
Copy link
Member

ashb commented Aug 12, 2021

I think this closes #12500 - are there any other uses of numpy in the code base?

@potiuk
Copy link
Member

potiuk commented Aug 12, 2021

I think this closes #12500 - are there any other uses of numpy in the code base?

Not yet. It is used at least in AirflowJsonEncoder 😱

setup.py Outdated
Comment on lines 399 to 401
Copy link
Member

@uranusjr uranusjr Aug 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Pandas stopped releasing 3.6 binaries for 1.2.* series.
'pandas>=0.17.1, <1.2;python_version<"3.7"',
'pandas>=0.17.1, <2.0;python_version>="3.7"',
'pandas>=0.17.1, <2.0',

Pandas 1.2+ provides the Requires-Python metadata, so 1.2+ versions will be automatically excluded when pip install on Python 3.6:

$ python3.6 -m venv x
$ . x/bin/activate
(x) $ pip install pandas
Collecting pandas
  Using cached https://.../pandas-1.1.5-cp36-cp36m-manylinux1_x86_64.whl
Collecting python-dateutil>=2.7.3 (from pandas)
...

So the python_version<"3.7" line is not needed. (Although it should not have a negative impact either.)

Note: Python 3.6.0 ships with pip 9.1 by default, which supports Requires-Python, so it’s highly unlikely someone on 3.6 running on pip version not supporting this metadata.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, updated

We only use `pandas` in `DbApiHook.get_pandas_df`. Not all users use it, plus
while `pandas` now supports many pre-compiled packages it still can take forever where
it needs to be compiled.

So for first-time users this can be a turn off. If pandas is already installed this
will work fine, but if not users have an option to run `pip install apache-airflow[pandas]`
@kaxil kaxil requested review from potiuk and uranusjr August 12, 2021 20:15
@kaxil kaxil requested a review from XD-DENG as a code owner August 12, 2021 20:26
@kaxil
Copy link
Member Author

kaxil commented Aug 12, 2021

398ebfe should make numpy optional as well

@kaxil kaxil merged commit 2c26b15 into apache:main Aug 12, 2021
@kaxil kaxil deleted the pandas-optional branch August 12, 2021 23:07
kaxil added a commit to astronomer/airflow that referenced this pull request Aug 13, 2021
Missed removing ``numpy`` from `setup.cfg` in apache#17575. It was only added in setup.cfg in https://github.com/apache/airflow/pull/15209/files#diff-380c6a8ebbbce17d55d50ef17d3cf906

numpy already has `python_requires` metadata: https://github.com/numpy/numpy/blob/v1.20.3/setup.py#L473

so we don't need to set `numpy<1.20;python_version<"3.7"`
kaxil added a commit that referenced this pull request Aug 13, 2021
Missed removing ``numpy`` from `setup.cfg` in #17575. It was only added in setup.cfg in https://github.com/apache/airflow/pull/15209/files#diff-380c6a8ebbbce17d55d50ef17d3cf906

numpy already has `python_requires` metadata: https://github.com/numpy/numpy/blob/v1.20.3/setup.py#L473

so we don't need to set `numpy<1.20;python_version<"3.7"`
@edwardwang888
Copy link
Contributor

edwardwang888 commented Sep 19, 2021

@kaxil Should this import also be put inside a try block, if numpy is not a dependency required in the devel extra?

import numpy as np

@kaxil
Copy link
Member Author

kaxil commented Sep 19, 2021

@kaxil Should this import also be put inside a try block, if numpy is not a dependency required in the devel extra?

import numpy as np

We install that in Dockerfile so we have it while running tests

@potiuk
Copy link
Member

potiuk commented Sep 19, 2021

We install that in Dockerfile so we have it while running tests

Agree. The "test" environment has all the deps in - including pandas and numpy. The "test" dependency ([devel_all] extra) defines all things needed to run tests.

I think while numpy is not needed as "airflow" dependency, it's perfectly fine to have it as "test airflow" dependency.

@edwardwang888
Copy link
Contributor

edwardwang888 commented Sep 19, 2021

We install that in Dockerfile so we have it while running tests

Agree. The "test" environment has all the deps in - including pandas and numpy. The "test" dependency ([devel_all] extra) defines all things needed to run tests.

I think while numpy is not needed as "airflow" dependency, it's perfectly fine to have it as "test airflow" dependency.

@kaxil @potiuk Thanks for the feedback. The reason I ask is because of the discussion here: #18343 (comment). For the test environment there is no issue as you've mentioned, so I guess the concern there is about the local development environment? Of course, if you run a test locally and numpy is missing, you can just install it.

@potiuk
Copy link
Member

potiuk commented Sep 19, 2021

I guess the concern there is about the local development environment? Of course, if you run a test locally and numpy is missing, you can just install it.

We have PLENTY of tests that will fail when some extra is not installed. That's why we have https://github.com/apache/airflow/blob/main/BREEZE.rst with all deps installed in the docker image (which is 1-1 replica of CI test environment), and https://github.com/apache/airflow/blob/main/LOCAL_VIRTUALENV.rst#creating-a-local-virtualenv where we explain that you need to add "devel" extra if you want to install local virtualenv for development.

So I think all is good here

@edwardwang888
Copy link
Contributor

edwardwang888 commented Sep 19, 2021

We have PLENTY of tests that will fail when some extra is not installed. That's why we have https://github.com/apache/airflow/blob/main/BREEZE.rst with all deps installed in the docker image (which is 1-1 replica of CI test environment), and https://github.com/apache/airflow/blob/main/LOCAL_VIRTUALENV.rst#creating-a-local-virtualenv where we explain that you need to add "devel" extra if you want to install local virtualenv for development.

So I think all is good here

@potiuk Thanks. One clarification - from #18343 (comment), I got the impression that devel extra contains the libraries needed for core testing (in local virtualenv). Does test_json.py count as core testing? I think that's where I'm confused about why numpy only needs to be installed in devel_all, but not devel.

@potiuk
Copy link
Member

potiuk commented Sep 19, 2021

Does test_json.py count as core testing?

Depends what you mean by "core". There is no mentioning of "core" in the extras doc:

From https://airflow.apache.org/docs/apache-airflow/stable/extra-packages-ref.html#bundle-extras

extra install command enables
all pip install 'apache-airflow[all]' All Airflow user facing features (no devel and doc requirements)
all_dbs pip install 'apache-airflow[all_dbs]' All databases integrations
devel pip install 'apache-airflow[devel]' Minimum dev tools requirements (without providers)
devel_hadoop pip install 'apache-airflow[devel_hadoop]' Same as devel + dependencies for developing the Hadoop stack
devel_all pip install 'apache-airflow[devel_all]' Everything needed for development (devel_hadoop + providers)
devel_ci pip install 'apache-airflow[devel_ci]' All dependencies required for CI build.

It's clear that devel is for all tests except providers. the test.json is not in providers. Do you think it's ambiguous? maybe you would like to improve the docs if things are not clear enough ? We have more than 1700 contibutors and clarifications from users like you who have doubts, are most welcome. Do you think this can be improved in the docs? How?

@edwardwang888
Copy link
Contributor

Depends what you mean by "core". There is no mentioning of "core" in the extras doc:

Ah okay, that is a good point.

It's clear that devel is for all tests except providers. the test.json is not in providers. Do you think it's ambiguous? maybe you would like to improve the docs if things are not clear enough ? We have more than 1700 contibutors and clarifications from users like you who have doubts, are most welcome. Do you think this can be improved in the docs? How?

Sounds good. If I come up with something concrete, I'll submit a PR.

Thanks for clarifying my questions!

@eladkal eladkal mentioned this pull request Sep 22, 2021
leahecole pushed a commit to GoogleCloudPlatform/composer-airflow that referenced this pull request Mar 10, 2022
Missed removing ``numpy`` from `setup.cfg` in apache/airflow#17575. It was only added in setup.cfg in https://github.com/apache/airflow/pull/15209/files#diff-380c6a8ebbbce17d55d50ef17d3cf906

numpy already has `python_requires` metadata: https://github.com/numpy/numpy/blob/v1.20.3/setup.py#L473

so we don't need to set `numpy<1.20;python_version<"3.7"`

GitOrigin-RevId: 6b327c14f773e265d9c54551d9ca273ac7350fdb
leahecole pushed a commit to GoogleCloudPlatform/composer-airflow that referenced this pull request Jun 4, 2022
Missed removing ``numpy`` from `setup.cfg` in apache/airflow#17575. It was only added in setup.cfg in https://github.com/apache/airflow/pull/15209/files#diff-380c6a8ebbbce17d55d50ef17d3cf906

numpy already has `python_requires` metadata: https://github.com/numpy/numpy/blob/v1.20.3/setup.py#L473

so we don't need to set `numpy<1.20;python_version<"3.7"`

GitOrigin-RevId: 6b327c14f773e265d9c54551d9ca273ac7350fdb
kosteev pushed a commit to GoogleCloudPlatform/composer-airflow that referenced this pull request Jul 10, 2022
Missed removing ``numpy`` from `setup.cfg` in apache/airflow#17575. It was only added in setup.cfg in https://github.com/apache/airflow/pull/15209/files#diff-380c6a8ebbbce17d55d50ef17d3cf906

numpy already has `python_requires` metadata: https://github.com/numpy/numpy/blob/v1.20.3/setup.py#L473

so we don't need to set `numpy<1.20;python_version<"3.7"`

GitOrigin-RevId: 6b327c14f773e265d9c54551d9ca273ac7350fdb
leahecole pushed a commit to GoogleCloudPlatform/composer-airflow that referenced this pull request Aug 27, 2022
Missed removing ``numpy`` from `setup.cfg` in apache/airflow#17575. It was only added in setup.cfg in https://github.com/apache/airflow/pull/15209/files#diff-380c6a8ebbbce17d55d50ef17d3cf906

numpy already has `python_requires` metadata: https://github.com/numpy/numpy/blob/v1.20.3/setup.py#L473

so we don't need to set `numpy<1.20;python_version<"3.7"`

GitOrigin-RevId: 6b327c14f773e265d9c54551d9ca273ac7350fdb
leahecole pushed a commit to GoogleCloudPlatform/composer-airflow that referenced this pull request Oct 4, 2022
Missed removing ``numpy`` from `setup.cfg` in apache/airflow#17575. It was only added in setup.cfg in https://github.com/apache/airflow/pull/15209/files#diff-380c6a8ebbbce17d55d50ef17d3cf906

numpy already has `python_requires` metadata: https://github.com/numpy/numpy/blob/v1.20.3/setup.py#L473

so we don't need to set `numpy<1.20;python_version<"3.7"`

GitOrigin-RevId: 6b327c14f773e265d9c54551d9ca273ac7350fdb
aglipska pushed a commit to GoogleCloudPlatform/composer-airflow that referenced this pull request Oct 7, 2022
Missed removing ``numpy`` from `setup.cfg` in apache/airflow#17575. It was only added in setup.cfg in https://github.com/apache/airflow/pull/15209/files#diff-380c6a8ebbbce17d55d50ef17d3cf906

numpy already has `python_requires` metadata: https://github.com/numpy/numpy/blob/v1.20.3/setup.py#L473

so we don't need to set `numpy<1.20;python_version<"3.7"`

GitOrigin-RevId: 6b327c14f773e265d9c54551d9ca273ac7350fdb
leahecole pushed a commit to GoogleCloudPlatform/composer-airflow that referenced this pull request Dec 7, 2022
Missed removing ``numpy`` from `setup.cfg` in apache/airflow#17575. It was only added in setup.cfg in https://github.com/apache/airflow/pull/15209/files#diff-380c6a8ebbbce17d55d50ef17d3cf906

numpy already has `python_requires` metadata: https://github.com/numpy/numpy/blob/v1.20.3/setup.py#L473

so we don't need to set `numpy<1.20;python_version<"3.7"`

GitOrigin-RevId: 6b327c14f773e265d9c54551d9ca273ac7350fdb
leahecole pushed a commit to GoogleCloudPlatform/composer-airflow that referenced this pull request Jan 27, 2023
Missed removing ``numpy`` from `setup.cfg` in apache/airflow#17575. It was only added in setup.cfg in https://github.com/apache/airflow/pull/15209/files#diff-380c6a8ebbbce17d55d50ef17d3cf906

numpy already has `python_requires` metadata: https://github.com/numpy/numpy/blob/v1.20.3/setup.py#L473

so we don't need to set `numpy<1.20;python_version<"3.7"`

GitOrigin-RevId: 6b327c14f773e265d9c54551d9ca273ac7350fdb
kosteev pushed a commit to kosteev/composer-airflow-test-copybara that referenced this pull request Sep 12, 2024
Missed removing ``numpy`` from `setup.cfg` in apache/airflow#17575. It was only added in setup.cfg in https://github.com/apache/airflow/pull/15209/files#diff-380c6a8ebbbce17d55d50ef17d3cf906

numpy already has `python_requires` metadata: https://github.com/numpy/numpy/blob/v1.20.3/setup.py#L473

so we don't need to set `numpy<1.20;python_version<"3.7"`

GitOrigin-RevId: 6b327c14f773e265d9c54551d9ca273ac7350fdb
kosteev pushed a commit to GoogleCloudPlatform/composer-airflow that referenced this pull request Sep 17, 2024
Missed removing ``numpy`` from `setup.cfg` in apache/airflow#17575. It was only added in setup.cfg in https://github.com/apache/airflow/pull/15209/files#diff-380c6a8ebbbce17d55d50ef17d3cf906

numpy already has `python_requires` metadata: https://github.com/numpy/numpy/blob/v1.20.3/setup.py#L473

so we don't need to set `numpy<1.20;python_version<"3.7"`

GitOrigin-RevId: 6b327c14f773e265d9c54551d9ca273ac7350fdb
kosteev pushed a commit to GoogleCloudPlatform/composer-airflow that referenced this pull request Nov 7, 2024
Missed removing ``numpy`` from `setup.cfg` in apache/airflow#17575. It was only added in setup.cfg in https://github.com/apache/airflow/pull/15209/files#diff-380c6a8ebbbce17d55d50ef17d3cf906

numpy already has `python_requires` metadata: https://github.com/numpy/numpy/blob/v1.20.3/setup.py#L473

so we don't need to set `numpy<1.20;python_version<"3.7"`

GitOrigin-RevId: 6b327c14f773e265d9c54551d9ca273ac7350fdb
kosteev pushed a commit to GoogleCloudPlatform/composer-airflow that referenced this pull request May 2, 2025
Missed removing ``numpy`` from `setup.cfg` in apache/airflow#17575. It was only added in setup.cfg in https://github.com/apache/airflow/pull/15209/files#diff-380c6a8ebbbce17d55d50ef17d3cf906

numpy already has `python_requires` metadata: https://github.com/numpy/numpy/blob/v1.20.3/setup.py#L473

so we don't need to set `numpy<1.20;python_version<"3.7"`

GitOrigin-RevId: 6b327c14f773e265d9c54551d9ca273ac7350fdb
kosteev pushed a commit to GoogleCloudPlatform/composer-airflow that referenced this pull request May 22, 2025
Missed removing ``numpy`` from `setup.cfg` in apache/airflow#17575. It was only added in setup.cfg in https://github.com/apache/airflow/pull/15209/files#diff-380c6a8ebbbce17d55d50ef17d3cf906

numpy already has `python_requires` metadata: https://github.com/numpy/numpy/blob/v1.20.3/setup.py#L473

so we don't need to set `numpy<1.20;python_version<"3.7"`

GitOrigin-RevId: 6b327c14f773e265d9c54551d9ca273ac7350fdb
kosteev pushed a commit to GoogleCloudPlatform/composer-airflow that referenced this pull request Sep 17, 2025
Missed removing ``numpy`` from `setup.cfg` in apache/airflow#17575. It was only added in setup.cfg in https://github.com/apache/airflow/pull/15209/files#diff-380c6a8ebbbce17d55d50ef17d3cf906

numpy already has `python_requires` metadata: https://github.com/numpy/numpy/blob/v1.20.3/setup.py#L473

so we don't need to set `numpy<1.20;python_version<"3.7"`

GitOrigin-RevId: 6b327c14f773e265d9c54551d9ca273ac7350fdb
kosteev pushed a commit to GoogleCloudPlatform/composer-airflow that referenced this pull request Oct 16, 2025
Missed removing ``numpy`` from `setup.cfg` in apache/airflow#17575. It was only added in setup.cfg in https://github.com/apache/airflow/pull/15209/files#diff-380c6a8ebbbce17d55d50ef17d3cf906

numpy already has `python_requires` metadata: https://github.com/numpy/numpy/blob/v1.20.3/setup.py#L473

so we don't need to set `numpy<1.20;python_version<"3.7"`

GitOrigin-RevId: 6b327c14f773e265d9c54551d9ca273ac7350fdb
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:dev-tools full tests needed We need to run full set of tests for this PR to merge kind:documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants