Airflow Pipeline

ETL created using Airflow. This project load data from S3 buckets and write the data into staging, fact and dimension tables. Using PostgresOperator and PythonOperator we are able to create Fact and Dimension tables for a star-schema.

This project runs data quality check into dimension and fact tables checking null values and empty tables.

Graph View

Main Dag

Schedule:

default_args = {
    'owner': 'airflow',
    'start_date': datetime(2018, 11, 1),
    'end_date:': datetime(2018, 11, 30),
    'email_on_retry': False,
    'email_on_failure': False,
    'retries': 3,
    'retry_delay': timedelta(minutes=5),
    'depends_on_past': False
}

Beside default parameters above, this DAG runs hourly with a max 1 active run at the same time.

start_date context is used to load data from S3 (CSV files). All hooks are created with the goal of being flexible. This means that you can define is you want to delete or just append the data into dimension and fact tables. You can use different connection id too. Just make sure that the Hook fit your purpose

Subdag

Subdag created to write all dimension tables. The entire subdag use LocalExecutor() method to runs all taks at the same time.

Development

Want to contribute? Great! please feel free to open issues and push.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
dags		dags
plugins		plugins
.gitignore		.gitignore
README.md		README.md
create_tables.sql		create_tables.sql

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Airflow Pipeline

Graph View

Main Dag

Subdag

Development

About

Releases

Packages

Languages

kennycontreras/airflow-pipeline

Folders and files

Latest commit

History

Repository files navigation

Airflow Pipeline

Graph View

Main Dag

Subdag

Development

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages