Skip to content

Latest commit

 

History

History
18 lines (13 loc) · 2.03 KB

dbt_setup.md

File metadata and controls

18 lines (13 loc) · 2.03 KB

Data Transformation in DBT CLI/Core

DBT was used to perform SQL transformations on our data, including changing column data types and renaming the columns. Additionally, the large staging table was split into fact and dimension tables as part of the data modeling process.

Setup

  • Navigate to the dbt folder inside the project repo: cd crypto_analytics_engineering/dbt
  • Edit the profile.yml file to specify your GCP project id and save
  • Also, navigate to the staging files directory: cd crypto_analytics_engineering/dbt/models/staging and edit the source.yml file to specify your GCP project id as the BigQuery databse name

Notes on running DBT inside Airflow

In this project, the DBT job was configured to run within Airflow, allowing Airflow to orchestrate the entire extraction, loading, and transformation process. However, the DBT transformation job could also be separated from the extraction and loading, as DBT Cloud can schedule models and tests, execute them in the correct order, and send notifications upon failure, all without Airflow.

In practice, a better approach might involve first dockerizing the DBT transformations and pushing the image to a container registry such as Google Artifact Registry. Then, the KubernetesPodOperator in Airflow can pull the image and run the dockerized transformations. This method resolves dependency conflicts, isolates the code, and modularizes the infrastructure components. For more details on this approach, you can read the full article here.


Resources: