- Create python virtual environment
python -m venv venv,source venv/bin/activate - Install dependencies
pip install -r requirements.txt pip freeze > requirements.txt
- use docker - edit
docer-compose.yml docker-compose up -d- verify
docker ps - connect to DB
- open shell in container
docker exec -it postgres_db bash psql -U postgres -d weather- optionally add pgadmin service and restart the services - access pgadmin at
http://localhost:5050
pip install dbt-postgres- setup/modify dbt folder structure
- Configure dbt Connection to PostgreSQL
- create/edit
~/.dbt/profiles.yml - matching postgres in docker setup
- Alternatively, reference .env variables to avoid hardcoding
- Update Docker setup to mount the dbt folder - NOT DONE YET
- test
- start docker
docker-compose up -d source venv/bin/activate- test dbt connection
dbt debug - generate models
dbt run
- Create a staging model (typically a view)
- inside
dbt/models/staging/stg_weather.sql
- Define source
models/staging/sources.yml
- Update dbt_project.yml to include models
- Run and test staging model
- cd into dbt folder
- run
dbt run --select stg_weather - test
dbt test --select stg_weather- confirm model working correctly
- Calculate seasonal averages of temperatures for each city.
- Prepare the data for trend analysis (e.g., year-over-year changes).
- Create clean, aggregated data ready for the final star schema.
- Create intermediate model
- update dbt_project.yml to include this model
- run dbt to create table
- update schema.yml (testing and doc)
- use seed to create mock table/data for testing
- create csv file in seeds folder(path specified in dbt_project.yml)
- run
dbt seed
- run dbt pipeline with
dbt build(seed -> run -> test) - generate doc (inside airflow container, make sure to mount the correct port)
dbt docs generate- serve
dbt docs serve --port 8081 --host 0.0.0.0
- Add airflow to docker-compose (make sure credentials and DB match postgres)
- set up
docker-compose up airflow-init- only runs once for set up - start everything
docker-compose up -d - log on to
localhost:8080
- write the DAG file
- create custom airflow container to install dependencies
- Airflow needs proper path setup (sys.path.append(...) for importing scripts).
- Database connections should be modular (db_connection.py for reuse).
- Docker mounts must be correct (dbt/, scripts/, and data/ inside containers).
- Airflow DAGs should be structured cleanly (task dependencies, retries, logging).
- dbt needs the correct project and profiles path (--project-dir and --profiles-dir).