4 basic exercises that demonstrate how we can orchestrate Google Cloud services through Apache Airflow.
The repository contains 4 DAGs:
storage_dag.py: exercise to perform operations between directories in the Google Cloud Storage service.functions_dag.py: exercise to invoke Cloud Functions instances from Airflow.bigquery_dag.py: exercise on extracting, loading data, and running jobs in BigQuery.dataproc_dag.py: exercise to run PySpark jobs in the Dataproc Batch service.
- The Airflow version used is
Airflow 2.8.1. - The dependencies and variables used are located in the
./airflow/folder,requirements.txt, andairflow-variables.json, respectively.
- The
./resources/gcloud/folder contains the commands to deploy all the resources that the DAGs use. - In
vars.sh, specify the variables as needed, referring toairflow-variables.json. - Run in the GCP Cloud Shell:
bash apply.shto create the cloud resources. - Run in the GCP Cloud Shell:
bash destroy.shto clean up the resources deployed in the previous step.
The execution of the DAGs was validated as of: February 13th, 2024.