- The
DummyOperatorimplementation ofstart_operatorwas replaced with aPostgresOperator. The latter callscreate_tables.sql, which was moved to the same folder asudac_example_dag.pyand hence gets called with every execution of the DAG. - The
CREATEstatements increate_tables.sqlwere expanded withIF NOT EXISTSin order to be callable repeatedly without conflicts. - The
LoadDimensionOperatorwas implemented with a flagappend=Falseand aprimary_key=""parameter:- if
append=False, the original table is deleted and the entire data will be replaced with the new data - if
append=True, only the rows from the original table with duplicate primary keys will be deleted. This roughly corresponds to anON CONFLICT DO UPDATEcall (which is not available in the Postgres version that Redshift is using).
- if
- The data quality operator is used to check if there are any rows with null value of
artistidin theartiststable. This is an exemplary check and many other checks might be performed here as well.
The following solution was used as guidance: