Skip to content

Latest commit

 

History

History
96 lines (55 loc) · 4.57 KB

README.md

File metadata and controls

96 lines (55 loc) · 4.57 KB

docker-airflow

CircleCI branch Docker Build Status

Docker Hub Docker Pulls Docker Stars

This repository contains Dockerfile of apache-airflow for Docker's automated build published to the public Docker Hub Registry.

Informations

/!\ If you want to use Airflow using Python 2, use TAG 1.8.1

Installation

Pull the image from the Docker repository.

    docker pull puckel/docker-airflow

Build

For example, if you need to install Extra Packages, edit the Dockerfile and then build it.

    docker build --rm -t puckel/docker-airflow .

Usage

By default, docker-airflow runs Airflow with SequentialExecutor :

    docker run -d -p 8080:8080 puckel/docker-airflow

If you want to run another executor, use the other docker-compose.yml files provided in this repository.

For LocalExecutor :

    docker-compose -f docker-compose-LocalExecutor.yml up -d

For CeleryExecutor :

    docker-compose -f docker-compose-CeleryExecutor.yml up -d

NB : If you don't want to have DAGs example loaded (default=True), you've to set the following environment variable :

LOAD_EX=n

    docker run -d -p 8080:8080 -e LOAD_EX=n puckel/docker-airflow

If you want to use Ad hoc query, make sure you've configured connections: Go to Admin -> Connections and Edit "postgres_default" set this values (equivalent to values in airflow.cfg/docker-compose*.yml) :

  • Host : postgres
  • Schema : airflow
  • Login : airflow
  • Password : airflow

For encrypted connection passwords (in Local or Celery Executor), you must have the same fernet_key. By default docker-airflow generates the fernet_key at startup, you have to set an environment variable in the docker-compose (ie: docker-compose-LocalExecutor.yml) file to set the same key accross containers. To generate a fernet_key :

    python -c "from cryptography.fernet import Fernet; FERNET_KEY = Fernet.generate_key().decode(); print FERNET_KEY"

Configurating Airflow

It is possible to set any configuration value for Airflow from environment variables, which are used over values from the airflow.cfg. The general rule is the environment variable should be named AIRFLOW__<section>__<key>, for example AIRFLOW__CORE__SQL_ALCHEMY_CONN sets the sql_alchemy_conn config option in the [core] section.

Check out the Airflow documentation for more details

You can also define connections via environment variables by prefixing them with AIRFLOW_CONN_ - for example AIRFLOW_CONN_POSTGRES_MASTER=postgres://user:password@localhost:5432/master for a connection called "postgres_master". The value is parsed as a URI. This will work for hooks etc, but won't show up in the "Ad-hoc Query" section unless an (empty) connection is also created in the DB

Install custom python package

  • Create a file "requirements.txt" with the desired python modules
  • Mount this file as a volume -v $(pwd)/requirements.txt:/requirements.txt
  • The entrypoint.sh script execute the pip install command (with --user option)

UI Links

Scale the number of workers

Easy scaling using docker-compose:

    docker-compose scale worker=5

This can be used to scale to a multi node setup using docker swarm.

Wanna help?

Fork, improve and PR. ;-)