This repo is a template for setting up a modular, testable DAG infrastructure.
Directory for local development
This directory is meant for reuseable code that can be used for any DAG.
Includes operators and task groups.
Contains any reusable constants for dags
- datasets
- distribution lists / emails
Environment Configs
- defaults.py: configs for variables / configs consistent across all envs
- non: a catchall for any variables that are the same across all lower environments
- prod: for Airflow's prod environment
All DAGs go in this directory.
Custom pre-commit hook code. This helps us catch import errors, configuration errors etc.
NOTE: this is in the airflow-plugins and we will leverage the plugin instead of our version in the near future
env_dag() is wrapper for the dag() decorator that will:
- Automatically determine whether it's running in Airflow non or prod and use the correct environment-based vars from config/
- create a DAG instance for every lower environment specified if it's running in non.
- Each DAG instance has access to an
envparameter that is an instance of theEnvironmentclass. All environment specific variables are accessed as attributes of this object.
env = Environment('uat') # this will pull variables from uat.py or non.py if none found.
env.MY_VAR # access vars as properties
For any DAG, valid lower environments can be specified by passing a lower_envs parameter into env_dag() with a list of environments. It will default to ['dev', 'sit', 'uat'].
See the code for more details.
The DAG repo is meant to just contain the logic of job workflows. Business logic should ideally reside in python packages installed on a docker image.
However, any reusable code written to aid in the creation of job workflows should be tested.
Tests are automatically run with the pre-commit hooks