Skip to content

drodrigo7/e16-apache-airflow-gcp

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 

Repository files navigation

GCP with Apache Airflow

Description

4 basic exercises that demonstrate how we can orchestrate Google Cloud services through Apache Airflow.

The repository contains 4 DAGs:

  1. storage_dag.py: exercise to perform operations between directories in the Google Cloud Storage service.
  2. functions_dag.py: exercise to invoke Cloud Functions instances from Airflow.
  3. bigquery_dag.py: exercise on extracting, loading data, and running jobs in BigQuery.
  4. dataproc_dag.py: exercise to run PySpark jobs in the Dataproc Batch service.

Airflow

  • The Airflow version used is Airflow 2.8.1.
  • The dependencies and variables used are located in the ./airflow/ folder, requirements.txt, and airflow-variables.json, respectively.

Google Cloud

  • The ./resources/gcloud/ folder contains the commands to deploy all the resources that the DAGs use.
  • In vars.sh, specify the variables as needed, referring to airflow-variables.json.
  • Run in the GCP Cloud Shell: bash apply.sh to create the cloud resources.
  • Run in the GCP Cloud Shell: bash destroy.sh to clean up the resources deployed in the previous step.

The execution of the DAGs was validated as of: February 13th, 2024.

About

No description or website provided.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published