Skip to content

Hive Project- Understand the various types of SCDs and implement these slowly changing dimesnsion in Hadoop Hive and Spark.

Notifications You must be signed in to change notification settings

melwinmpk/SCD_in_Warehouse

Repository files navigation

SCD_in_Warehouse

Understand the various types of SCDs and implement these slowly changing dimesnsion in Hadoop Hive and Spark.

Complete Overview

Execution Flow

Link

SCD 1

This method overwrites old with new data, and therefore does not track historical data.

SCD 2

This method tracks historical data by creating multiple records for a given natural key in the dimensional tables with separate surrogate keys and/or different version numbers. Unlimited history is preserved for each insert.
In this Project we have used flag method

  1. Copy all new record from the source which is not present in the target, copy all updated records from the source to the temp table, copy all not updated records from source to temp ( set all the flag as true)​
  2. Copy all records from target (which are updated in the source record) set flag as false, Copy all the record which is not present in the source-target set the flag as true​
  3. Finally after step 1 & 2 override the customer_temp to the store.customer(target)

SCD 4

SCD type 4 provides a solution to handle the rapid changes in the dimension tables. The concept lies in creating a junk dimension or a small dimension table with all the possible values of the rapid growing attributes of the dimension.

The Type 4 method is usually referred to as using "history tables", where one table keeps the current data, and an additional table is used to keep a record of some or all changes. Both the surrogate keys are referenced in the fact table to enhance query performance.

Reference Link1
Reference Link2

Manual Triggering

Link

Airflow Output

About

Hive Project- Understand the various types of SCDs and implement these slowly changing dimesnsion in Hadoop Hive and Spark.

Topics

Resources

Stars

Watchers

Forks

Contributors 5