Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PIP 20.3 might break Airflow installation #12838

Closed
potiuk opened this issue Dec 5, 2020 · 40 comments
Closed

PIP 20.3 might break Airflow installation #12838

potiuk opened this issue Dec 5, 2020 · 40 comments
Labels
kind:bug This is a clearly a bug priority:medium Bug that should be fixed before next release but would not block a release

Comments

@potiuk
Copy link
Member

potiuk commented Dec 5, 2020

UPDATE 15.12.2020 6pm CET:

After releasing PIP 20.3.3 today we were able to make 2.0 compatible with the new PIP and 1.10.14 almost works (papermill extra is problematic when installing airflow using the new PIP). We will try to address it in case we release 1.10.15 but if you want to install papermill extra, please downgrade pip or use legacy resolver.

While with 2.0 it seems that airflow can be installed with new PIP following our recommended practice, in case you see any installation problem please report them as issues and downgrade to pip 20.2.4 as a workaround.

Thanks again to the PyPI team for the fast resolution (just in time for the 2.0 release).

We leave the issue open for a while but we updated the description and lowered the priority. We will close it once we have observed installations from our users after 2.0 is released and confirm that the problem is solved for our users.


UPDATE 15.12.2020 11am CET:

Seems that with the latest 20.3.3 release and fishing pyarrow dependency we are back in business with 2.0.0rc3.

Once we confirm it and verify 1.10.14 we will be able to close that one!

Thanks to the PYPI team for quick solving it.


I am adding this issue to keep track of the on-going problems with new PIP 20.3 released 30th of November.

There are multiple issues with the new PIP that makes it breaks with Airflow's dependency set.

The first blocking issue is pypa/pip#9203 and pypa/pip#9232.

The latest version of PIP @master is still not usable with Airflow:

Even when those is solved we already know we are affected by a few other problems:

We've raised the issue to the PIP team and they struggle with fixing a number of other teething problems.

We keep fingers crossed that they will manage to fix the issues promptly and that they will not be overwhelmed with putting out the fire.

There is no resolution yet, so for the time being downgrading PIP to 20.2.4 version is the best thing you can do.

pip install --upgrade pip==20.2.4

We raised the issue pypa/pip#9231 with the proposal of change to PYPI to add an exclusion list to PyPI and we are waiting for their response.

UPDATE! Tested the current master version of PIP (which has been yesterday announced as candidate to 20.3.2) but it still does not solve installation problems with airflow:

Three new issues created:

@potiuk potiuk added the kind:bug This is a clearly a bug label Dec 5, 2020
@potiuk potiuk pinned this issue Dec 5, 2020
@potiuk potiuk changed the title PIP 20.3 breaks Airflow installation! ❗ PIP 20.3 breaks Airflow installation❗ Dec 5, 2020
potiuk added a commit to PolideaInternal/airflow that referenced this issue Dec 5, 2020
potiuk added a commit to PolideaInternal/airflow that referenced this issue Dec 5, 2020
@vikramkoka
Copy link
Contributor

Thanks @potiuk I am really glad you added this to the installation instructions too!

@potiuk
Copy link
Member Author

potiuk commented Dec 6, 2020

FYI @vikramkoka (and @eladkal @paolaperaza) the "upgrade to newer dependencies" and "full tests needed" are special labels that can be added to PRs to change the scope of PR builds:

See: https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#step-4-prepare-pr

  • "upgrade to newer dependencies" causes an automated upgrade to latest dependencies using the "eager" upgrade strategy:
  • "full tests needed" cause that full "matrix" of tests is run for our tests rather than one combination.

Maybe we need some special prefixes for those to distinguish from "regular" labels. If we decide to do that, we will have to update our workflows to handle the new names.

@vikramkoka
Copy link
Contributor

FYI @vikramkoka (and @eladkal @paolaperaza) the "upgrade to newer dependencies" and "full tests needed" are special labels that can be added to PRs to change the scope of PR builds:

See: https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#step-4-prepare-pr

  • "upgrade to newer dependencies" causes an automated upgrade to latest dependencies using the "eager" upgrade strategy:
  • "full tests needed" cause that full "matrix" of tests is run for our tests rather than one combination.

Maybe we need some special prefixes for those to distinguish from "regular" labels. If we decide to do that, we will have to update our workflows to handle the new names.

Sorry about that @potiuk . I did not know that. Will avoid using this "upgrade to new dependencies" label in the future.

@potiuk
Copy link
Member Author

potiuk commented Dec 6, 2020

Sorry about that @potiuk . I did not know that. Will avoid using this "upgrade to new dependencies" label in the future.

No problem :). that was mainly to explain what they are and probably ad them as exclusions in the description of the triage process.

@potiuk potiuk added this to the Airflow 2.0 milestone Dec 7, 2020
@potiuk potiuk added the priority:critical Showstopper bug that should be patched immediately label Dec 7, 2020
@mik-laj
Copy link
Member

mik-laj commented Dec 8, 2020

I confirmed that one problem was solved.
Now it is possible to install .[google], but .[google, devel] still doesn't work.
pypa/pip#9241

@mik-laj
Copy link
Member

mik-laj commented Dec 8, 2020

I tried to install almost all extra packages with the above patch and it worked. I have the impression that when a new version of pip is released the problem will not occur or it will be marginal.

Extra Status
amazon SUCCESS
apache.atlas SUCCESS
apache.beam SUCCESS
apache.cassandra SUCCESS
apache.druid SUCCESS
apache.hdfs SUCCESS
apache.kylin SUCCESS
apache.livy SUCCESS
apache.pig SUCCESS
apache.pinot SUCCESS
apache.spark SUCCESS
apache.sqoop SUCCESS
async SUCCESS
atlas SUCCESS
azure SUCCESS
cassandra SUCCESS
celery SUCCESS
cgroups SUCCESS
cloudant SUCCESS
cncf.kubernetes SUCCESS
crypto SUCCESS
dask SUCCESS
databricks SUCCESS
datadog SUCCESS
dingding SUCCESS
discord SUCCESS
doc SUCCESS
docker SUCCESS
druid SUCCESS
elasticsearch SUCCESS
exasol SUCCESS
facebook SUCCESS
ftp SUCCESS
gcp SUCCESS
github_enterprise SUCCESS
google SUCCESS
grpc SUCCESS
hashicorp SUCCESS
hdfs SUCCESS
http SUCCESS
imap SUCCESS
jdbc SUCCESS
jenkins SUCCESS
jira SUCCESS
kubernetes SUCCESS
ldap SUCCESS
microsoft.azure SUCCESS
microsoft.mssql SUCCESS
microsoft.winrm SUCCESS
mongo SUCCESS
mssql SUCCESS
openfaas SUCCESS
opsgenie SUCCESS
oracle SUCCESS
pagerduty SUCCESS
papermill SUCCESS
password SUCCESS
pinot SUCCESS
plexus SUCCESS
postgres SUCCESS
presto SUCCESS
qds SUCCESS
qubole SUCCESS
rabbitmq SUCCESS
redis SUCCESS
s3 SUCCESS
salesforce SUCCESS
samba SUCCESS
segment SUCCESS
sendgrid SUCCESS
sentry SUCCESS
sftp SUCCESS
singularity SUCCESS
slack SUCCESS
snowflake SUCCESS
spark SUCCESS
sqlite SUCCESS
ssh SUCCESS
statsd SUCCESS
tableau SUCCESS
vertica SUCCESS
virtualenv SUCCESS
winrm SUCCESS
yandex SUCCESS
zendesk SUCCESS

@mik-laj
Copy link
Member

mik-laj commented Dec 8, 2020

Pypi may have problems installing the master version because we have references to an unreleased package - apache-airflow-providers-telegram.

I haven't tested the extras below. They may or may not work.

	# all
	# all_dbs
	# aws
	# devel
	# devel_all
	# devel_ci
	# devel_hadoop
	# gcp_api
	# google_auth
	apache.hive
	apache.webhdfs
	gcp
	hive
	kerberos
	mysql
	odbc
	s3
	telegram
	webhdfs

@potiuk
Copy link
Member Author

potiuk commented Dec 8, 2020

Cool. Good job @mik-laj !. I will take a look tomorrow as well and try to run all the automation we run on CI. Until this gets released in 20.3.2 we still keep the warning in our docs but this looks very promising !

@mik-laj
Copy link
Member

mik-laj commented Dec 8, 2020

I updated the pip version to the newest master and trigger the build on my CI. Cross fingers. 🤞🏻
mik-laj@4bb280b

@mik-laj
Copy link
Member

mik-laj commented Dec 8, 2020

I found the source of the problem. We have a conflicting constraints entry.

pyarrow==0.17.1

google-cloud-bigquery[bqstorage,pandas] 2.4.0 depends on pyarrow<3.0dev and >=1.0.0

Without this entry, the Airflow installation works.
https://github.com/mik-laj/airflow/runs/1519148881?check_suite_focus=true
mik-laj@b573f2a

@mik-laj
Copy link
Member

mik-laj commented Dec 8, 2020

According to this PR, this entry is not needed.
#12683

@mik-laj
Copy link
Member

mik-laj commented Dec 8, 2020

This piece of code looks interesting to me. Should we also add a similar check to our project?
https://github.com/apache/beam/blob/545db7386b69eb3c61690172c575dc025d91cca7/sdks/python/setup.py#L99-L107

@potiuk
Copy link
Member Author

potiuk commented Dec 8, 2020

Which version and which constraints you are comparing?

The constraints are automatically generated after installing the requirements using pip 20.2.4

Do you have any error reported by PIP check?

And which installation combination you are testing (which extras etc.?)

@pradyunsg
Copy link

Well, pip's master branch is now 20.3.2, so... test against that! :)

@notatallshaw
Copy link
Contributor

@pradyunsg I'm not on the Airflow team and I don't have as deep of an understanding as @potiuk but I gave installing Airflow 1.10.14 with all dependencies using the new resolver with pip 20.3.2.

I'm not sure how much is Airflow fixes and how much is 20.3.2s improvements but I am able to successfully run pip install apache-airflow[all] with no errors 😄. Thanks to both teams!

@potiuk
Copy link
Member Author

potiuk commented Dec 15, 2020

I also had the "dependency solving" session anda just discussed witth PIP team and experimented a bit and it seems we managed to pin-point the PIP 20.2.4 bug that generated bad pyarrow dependency. I updated it manually and seems that we are able to make it works for 2.0 as well. 🤞 for quick 20.3.3 release (20.3.2 was considered bad and removed in the meantime)

@pradyunsg
Copy link

20.3.3 is out. I think it solves all the issues that broke Airflow's installation mechanisms. If @potiuk can confirm that, I'm guessing we can go ahead and close this. ;)

@potiuk
Copy link
Member Author

potiuk commented Dec 15, 2020

Yep. Confirmed it works for 2.0.

I need to do a few more checks and verify 1.10.14 as well and I will close that one,

Thanks A LOT @pradyunsg -> just in time for 2.0.0 of Airflow ;).

@potiuk potiuk changed the title ❗ PIP 20.3 breaks Airflow installation❗ PIP 20.3 might breaks Airflow installation Dec 15, 2020
@potiuk potiuk changed the title PIP 20.3 might breaks Airflow installation PIP 20.3 might break Airflow installation Dec 15, 2020
@potiuk potiuk added priority:medium Bug that should be fixed before next release but would not block a release and removed priority:critical Showstopper bug that should be patched immediately labels Dec 15, 2020
@kaustubhharapanahalli
Copy link

Hello, does the installation fail with the latest version of pip 21.0.1?

@potiuk
Copy link
Member Author

potiuk commented Feb 3, 2021

Hello, does the installation fail with the latest version of pip 21.0.1?

Can you please try and tell us ?

We did not have time to make some checks with it - it is likely that main problems have been solved already and improve upon.

@kaustubhharapanahalli
Copy link

Can you please try and tell us ?

We did not have time to make some checks with it - it is likely that main problems have been solved already and improve upon.

Hi @potiuk I ran this command pip install apache-airflow. And it worked. My setup:

  • OS: Windows10
  • Python version = 3.9.1
  • pip version = 21.0.1

Is there any test that I can run to confirm proper installation?

@pradyunsg
Copy link

Closing based on earlier comments:

I'm guessing we can go ahead and close this. ;)

@pradyunsg
Copy link

Oh wait, this isn’t pypa/pip. Nvm me. I blame the lack of a breakfast. :P

@ashb
Copy link
Member

ashb commented Mar 25, 2021

Closing as a new version of pip has been released.

@ashb ashb closed this as completed Mar 25, 2021
@uranusjr
Copy link
Member

uranusjr commented Mar 25, 2021

FWIW I tried pip install apache-airflow[all] and it works correctly. Or rather, it correctly did not work due to #14994.

potiuk added a commit to potiuk/airflow that referenced this issue Mar 26, 2021
The initial problem with the new PIP resolved in 20.3 version have
been successfully solved. Seems that version 21.* is much more
stable and actually works in all cases, so we are switching back
to it.

Also changed pip and wheel dependencies to ~= (compatible) version
hoping that the experience of backwards incomptible release in
major version update have been adopted by the PIP team with
21* series release.

Closes apache#12838
@potiuk potiuk unpinned this issue Mar 27, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind:bug This is a clearly a bug priority:medium Bug that should be fixed before next release but would not block a release
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants