Skip to content

Commit

Permalink
[AIRFLOW-7067] Pinned version of Apache Airflow (#7730)
Browse files Browse the repository at this point in the history
  • Loading branch information
potiuk authored Mar 22, 2020
1 parent d9ea57b commit 8c56388
Show file tree
Hide file tree
Showing 24 changed files with 962 additions and 161 deletions.
1 change: 1 addition & 0 deletions .dockerignore
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,7 @@
!MANIFEST.in
!NOTICE
!.github
!requirements.txt

# Avoid triggering context change on README change (new companies using Airflow)
# So please do not uncomment this line ;)
Expand Down
14 changes: 14 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -223,6 +223,13 @@ repos:
files: ^airflow/providers/.*\.py$|^tests/providers/.*\.py$
pass_filenames: false
require_serial: true
- id: update-extras
name: Update extras in documentation
entry: "./scripts/ci/pre_commit_update_extras.sh"
language: system
files: ^setup.py$|^INSTALL$|^CONTRIBUTING.rst$
pass_filenames: false
require_serial: true
- id: pydevd
language: pygrep
name: Check for pydevd debug statements accidentally left
Expand Down Expand Up @@ -286,6 +293,13 @@ repos:
language: system
always_run: true
pass_filenames: false
- id: generate-requirements
name: Generate requirements
entry: "./scripts/ci/pre_commit_generate_requirements.sh"
language: system
files: ^setup.py$
pass_filenames: false
require_serial: true
- id: check-apache-license
name: Check if licenses are OK for Apache
entry: "./scripts/ci/pre_commit_check_license.sh"
Expand Down
6 changes: 3 additions & 3 deletions .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ jobs:
env: >-
PYTHON_VERSION=3.6
AIRFLOW_MOUNT_SOURCE_DIR_FOR_STATIC_CHECKS="true"
SKIP=pylint-tests
SKIP=pylint-tests,generate-requirements
- name: "Static checks - pylint tests only"
stage: pre-test
script: ./scripts/ci/ci_run_static_checks_pylint_tests.sh
Expand Down Expand Up @@ -133,10 +133,10 @@ jobs:
ENABLED_INTEGRATIONS="kerberos"
script: "./scripts/ci/ci_run_airflow_testing.sh --ignore=tests/providers"
stage: test
- name: "Prepare backport packages"
- name: "Prepare packages and generate requirements"
before_install: pip install bowler
stage: test
script: ./scripts/ci/ci_prepare_backport_packages.sh
script: ./scripts/ci/ci_prepare_packages.sh && ./scripts/ci/ci_generate_requirements.sh
before_install:
- ./scripts/ci/ci_before_install.sh
script: ./scripts/ci/ci_run_airflow_testing.sh
3 changes: 2 additions & 1 deletion BREEZE.rst
Original file line number Diff line number Diff line change
Expand Up @@ -711,7 +711,8 @@ This is the current syntax for `./breeze <./breeze>`_:
****************************************************************************************************
breeze [FLAGS] initialize-local-virtualenv -- <EXTRA_ARGS>
Initializes locally created virtualenv installing all dependencies of Airflow.
Initializes locally created virtualenv installing all dependencies of Airflow
taking into account the frozen requirements from requirements.txt.
This local virtualenv can be used to aid autocompletion and IDE support as
well as run unit tests directly from the IDE. You need to have virtualenv
activated before running this command.
Expand Down
66 changes: 66 additions & 0 deletions CONTRIBUTING.rst
Original file line number Diff line number Diff line change
Expand Up @@ -302,6 +302,71 @@ Limitations:
They are optimized for repeatability of tests, maintainability and speed of building rather
than production performance. The production images are not yet officially published.

Extras
------

There are a number of extras that can be specified when installing Airflow. Those
extras can be specified after the usual pip install - for example
``pip install -e .[gcp]``. For development purpose there is a ``devel`` extra that
installs all development dependencies. There is also ``devel_ci`` that installs
all dependencies needed in CI envioronment.

This is the full list of those extras:

.. START EXTRAS HERE
all, all_dbs, async, atlas, aws, azure, cassandra, celery, cgroups, cloudant, dask, databricks,
datadog, devel, devel_ci, devel_hadoop, doc, docker, druid, elasticsearch, gcp, gcp_api,
github_enterprise, google_auth, grpc, hdfs, hive, hvac, jdbc, jira, kerberos, kubernetes, ldap,
mongo, mssql, mysql, odbc, oracle, pagerduty, papermill, password, pinot, postgres, presto, qds,
rabbitmq, redis, salesforce, samba, segment, sendgrid, sentry, singularity, slack, snowflake, ssh,
statsd, tableau, vertica, webhdfs, winrm, yandexcloud

.. END EXTRAS HERE
Pinned Airflow requirements.txt file
------------------------------------

Airflow is not a standard python project. Most of the python projects fall into one of two types -
application or library. As described in
[StackOverflow Question](https://stackoverflow.com/questions/28509481/should-i-pin-my-python-dependencies-versions)
decision whether to pin (freeze) requirements for a python project depdends on the type. For
applications, dependencies should be pinned, but for libraries, they should be open.

For application, pinning the dependencies makes it more stable to install in the future - because new
(even transitive) dependencies might cause installation to fail. For libraries - the dependencies should
be open to allow several different libraries with the same requirements to be installed at the same time.

The problem is that Apache Airflow is a bit of both - application to install and library to be used when
you are developing your own operators and DAGs.

This - seemingly unsolvable - puzzle is solved as follows:

* by default when you install ``apache-airflow`` package - the dependencies are as open as possible while
still allowing the apache-airflow to install. This means that 'apache-airflow' package might fail to
install in case a direct or transitive dependency is released that breaks the installation. In such case
when installing ``apache-airflow``, you might need to provide additional constraints (for
example ``pip install apache-airflow==1.10.2 Werkzeug<1.0.0``)

* we have ``requirements.txt`` file generated automatically based on the set of all latest working
and tested requirement versions. You can also use that file as a constraints file when installing
apache airflow - either from the sources ``pip install -e . --constraint requirements.txt`` or
from the pypi package ``pip install apache-airflow --constraint requirements.txt``. Note that
this will also work with extras for example ``pip install .[gcp] --constraint requirements.txt`` or
``pip install apache-airflow[gcp] --constraint requirements.txt``

The ``requirements.txt`` file should be updated automatically via pre-commit whenever you update dependencies
It reflects the current set of dependencies installed in the CI image of Apache Airflow.
The same set of requirements will be used to produce the production image.

If you do not use pre-commits and the CI builds fails / you need to regenerate it, you can do it manually:
``pre-commit run generate-requirements --all-files`` or via script
``./scripts/ci/ci_generate_requirements.sh``.
This will try to regenerate the requirements.txt file with the latest requirements matching
the setup.py constraints.


Backport providers packages
---------------------------

Expand Down Expand Up @@ -713,6 +778,7 @@ useful for "bisecting" when looking for a commit that introduced some bugs.

First of all - you can read about rebase workflow here:
`Merging vs. rebasing <https://www.atlassian.com/git/tutorials/merging-vs-rebasing>`_ - this is an
`Merging vs. rebasing <https://www.atlassian.com/git/tutorials/merging-vs-rebasing>`_ - this is an
excellent article that describes all ins/outs of rebase. I recommend reading it and keeping it as reference.

The goal of rebasing your PR on top of ``apache/master`` is to "transplant" your change on top of
Expand Down
12 changes: 11 additions & 1 deletion Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -368,10 +368,20 @@ COPY airflow/version.py ${AIRFLOW_SOURCES}/airflow/version.py
COPY airflow/__init__.py ${AIRFLOW_SOURCES}/airflow/__init__.py
COPY airflow/bin/airflow ${AIRFLOW_SOURCES}/airflow/bin/airflow

COPY requirements.txt ${AIRFLOW_SOURCES}/requirements.txt

ENV UPGRADE_TO_LATEST_REQUIREMENTS_IN_DOCKER_BUILD=${UPGRADE_TO_LATEST_REQUIREMENTS_IN_DOCKER_BUILD}
# The goal of this line is to install the dependencies from the most current setup.py from sources
# This will be usually incremental small set of packages in CI optimized build, so it will be very fast
# In non-CI optimized build this will install all dependencies before installing sources.
RUN pip install -e ".[${AIRFLOW_EXTRAS}]"
# Usually we will install versions constrained to the current requirements.txt
# But in cron job we will install latest versions matching setup.py to see if there is no breaking change
RUN \
if [[ "${UPGRADE_TO_LATEST_REQUIREMENTS_IN_DOCKER_BUILD}" == "true" ]]; then \
pip install -e ".[${AIRFLOW_EXTRAS}]" --upgrade; \
else \
pip install -e ".[${AIRFLOW_EXTRAS}]" --constraint ${AIRFLOW_SOURCES}/requirements.txt ; \
fi

# Copy all the www/ files we need to compile assets. Done as two separate COPY
# commands so as otherwise it copies the _contents_ of static/ in to www/
Expand Down
45 changes: 39 additions & 6 deletions INSTALL
Original file line number Diff line number Diff line change
@@ -1,7 +1,20 @@
# INSTALL / BUILD instructions for Apache Airflow

# [required] fetch the tarball and untar the source
# change into the directory that was untarred.
This ia a generic installation method that requires a number of dependencies to be installed.

Depending on your system you might need different prerequisites, but the following
systems/prerequisites are known to work:

Linux (Debian Buster and Linux Mint Tricia):

sudo apt install build-essentials python3.6-dev python3.7-dev python-dev openssl \
sqlite sqlite-dev default-libmysqlclient-dev libmysqld-dev postgresq

MacOS (Mojave/Catalina):

brew install sqlite mysql postgresql

# [required] fetch the tarball and untar the source move into the directory that was untarred.

# [optional] run Apache RAT (release audit tool) to validate license headers
# RAT docs here: https://creadur.apache.org/rat/. Requires Java and Apache Rat
Expand All @@ -11,12 +24,32 @@ java -jar apache-rat.jar -E ./.rat-excludes -d .
# to connect to other services. You might want to test or run Airflow
# from a virtual env to make sure those dependencies are separated
# from your system wide versions
python -m my_env
source my_env/bin/activate

# [required] building and installing
# by pip (preferred)
python3 -m venv PATH_TO_YOUR_VENV
source PATH_TO_YOUR_VENV/bin/activate

# [required] building and installing by pip (preferred)
pip install .

# or directly
python setup.py install

# You can also install recommended version of the dependencies by using requirements.txt
# as constraint file. This is needed in case you have problems with installint latest
# requirements.

pip install . --constraint requirements.txt

# You can also install Airflow with extras specified. The list of available extras:
# START EXTRAS HERE

all, all_dbs, async, atlas, aws, azure, cassandra, celery, cgroups, cloudant, dask, databricks,
datadog, devel, devel_ci, devel_hadoop, doc, docker, druid, elasticsearch, gcp, gcp_api,
github_enterprise, google_auth, grpc, hdfs, hive, hvac, jdbc, jira, kerberos, kubernetes, ldap,
mongo, mssql, mysql, odbc, oracle, pagerduty, papermill, password, pinot, postgres, presto, qds,
rabbitmq, redis, salesforce, samba, segment, sendgrid, sentry, singularity, slack, snowflake, ssh,
statsd, tableau, vertica, webhdfs, winrm, yandexcloud

# END EXTRAS HERE

# For installing Airflow in development environments - see CONTRIBUTING.rst
5 changes: 3 additions & 2 deletions breeze
Original file line number Diff line number Diff line change
Expand Up @@ -177,7 +177,7 @@ function initialize_virtualenv() {
echo
pushd "${MY_DIR}"
set +e
pip install -e ".[devel]"
pip install -e ".[devel]" --constraint requirements.txt
RES=$?
set -e
popd
Expand Down Expand Up @@ -954,7 +954,8 @@ prepare_usage() {
Explains in detail all the flags that can be used with breeze.
"
export DETAILED_USAGE_INITIALIZE_LOCAL_VIRTUALENV="
Initializes locally created virtualenv installing all dependencies of Airflow.
Initializes locally created virtualenv installing all dependencies of Airflow
taking into account the frozen requirements from requirements.txt.
This local virtualenv can be used to aid autocompletion and IDE support as
well as run unit tests directly from the IDE. You need to have virtualenv
activated before running this command.
Expand Down
1 change: 1 addition & 0 deletions common/_files_for_rebuild_check.sh
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@
FILES_FOR_REBUILD_CHECK=(
"setup.py"
"setup.cfg"
"requirements.txt"
"Dockerfile"
".dockerignore"
"airflow/version.py"
Expand Down
Loading

0 comments on commit 8c56388

Please sign in to comment.