Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New mart NTD service models #3517

Merged
merged 6 commits into from
Oct 30, 2024
Merged

New mart NTD service models #3517

merged 6 commits into from
Oct 30, 2024

Conversation

erikamov
Copy link
Contributor

@erikamov erikamov commented Oct 28, 2024

Description

This PR replaces the existing model dim_annual_ntd_agency_service as requested on ticket #3396 to use the new 2022 external NTD Annual Data.

Following these details sent by @tiffanychu90:

Screenshot 2024-10-28 at 4 09 20 PM

I created these new models, with documentation and basic test:

  • dim_annual_service_agencies
  • dim_annual_service_mode_time_periods
  • fct_annual_service_modes

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation

How has this been tested?

The new models and documentation were tested locally an created on staging.

❯ poetry run dbt run -s +"models/mart/ntd/dim_annual_service_agencies.sql"
23:23:04  Running with dbt=1.5.1
23:23:05  Found 422 models, 967 tests, 0 snapshots, 0 analyses, 852 macros, 0 operations, 12 seed files, 175 sources, 4 exposures, 0 metrics, 0 groups
23:23:05
23:23:08  Concurrency: 8 threads (target='dev')
23:23:08
23:23:08  1 of 2 START sql view model erika_staging.stg_ntd_annual_data__2022__service_by_agency  [RUN]
23:23:09  1 of 2 OK created sql view model erika_staging.stg_ntd_annual_data__2022__service_by_agency  [CREATE VIEW (0 processed) in 1.04s]
23:23:09  2 of 2 START sql table model erika_mart_ntd.dim_annual_service_agencies ........ [RUN]
23:23:13  2 of 2 OK created sql table model erika_mart_ntd.dim_annual_service_agencies ... [CREATE TABLE (2.2k rows, 7.9 MiB processed) in 3.27s]
23:23:13
23:23:13  Finished running 1 view model, 1 table model in 0 hours 0 minutes and 7.38 seconds (7.38s).
23:23:13
23:23:13  Completed successfully
23:23:13
23:23:13  Done. PASS=2 WARN=0 ERROR=0 SKIP=0 TOTAL=2


❯ poetry run dbt test -s models/mart/ntd/dim_annual_service_agencies.sql
23:32:57  1 of 3 START test accepted_values_dim_annual_service_agencies_report_year__2022  [RUN]
23:32:57  2 of 3 START test not_null_dim_annual_service_agencies_key ..................... [RUN]
23:32:57  3 of 3 START test unique_dim_annual_service_agencies_key ....................... [RUN]
23:32:58  2 of 3 PASS not_null_dim_annual_service_agencies_key ........................... [PASS in 0.74s]
23:32:58  1 of 3 PASS accepted_values_dim_annual_service_agencies_report_year__2022 ...... [PASS in 0.75s]
23:32:58  3 of 3 PASS unique_dim_annual_service_agencies_key ............................. [PASS in 0.76s]
23:32:58
23:32:58  Finished running 3 tests in 0 hours 0 minutes and 1.90 seconds (1.90s).
23:32:58
23:32:58  Completed successfully
23:32:58
23:32:58  Done. PASS=3 WARN=0 ERROR=0 SKIP=0 TOTAL=3
❯ poetry run dbt run -s +models/mart/ntd/dim_annual_service_mode_time_periods.sql
23:24:20  Running with dbt=1.5.1
23:24:21  Found 422 models, 967 tests, 0 snapshots, 0 analyses, 852 macros, 0 operations, 12 seed files, 175 sources, 4 exposures, 0 metrics, 0 groups
23:24:21
23:24:24  Concurrency: 8 threads (target='dev')
23:24:24
23:24:24  1 of 2 START sql view model erika_staging.stg_ntd_annual_data__2022__service_by_mode_and_time_period  [RUN]
23:24:25  1 of 2 OK created sql view model erika_staging.stg_ntd_annual_data__2022__service_by_mode_and_time_period  [CREATE VIEW (0 processed) in 0.94s]
23:24:25  2 of 2 START sql table model erika_mart_ntd.dim_annual_service_mode_time_periods  [RUN]
23:24:29  2 of 2 OK created sql table model erika_mart_ntd.dim_annual_service_mode_time_periods  [CREATE TABLE (9.2k rows, 50.6 MiB processed) in 4.33s]
23:24:29
23:24:29  Finished running 1 view model, 1 table model in 0 hours 0 minutes and 8.01 seconds (8.01s).
23:24:29
23:24:29  Completed successfully
23:24:29
23:24:29  Done. PASS=2 WARN=0 ERROR=0 SKIP=0 TOTAL=2


❯ poetry run dbt test -s models/mart/ntd/dim_annual_service_mode_time_periods.sql
23:36:07  1 of 4 START test accepted_values_dim_annual_service_mode_time_periods_report_year__2022  [RUN]
23:36:07  2 of 4 START test accepted_values_dim_annual_service_mode_time_periods_time_period__Average_Weekday_AM_Peak__Average_Weekday_Midday__Average_Weekday_PM_Peak__Average_Weekday_Other__Average_Typical_Weekday__Average_Typical_Saturday__Average_Typical_Sunday  [RUN]
23:36:07  3 of 4 START test not_null_dim_annual_service_mode_time_periods_key ............ [RUN]
23:36:07  4 of 4 START test unique_dim_annual_service_mode_time_periods_key .............. [RUN]
23:36:08  3 of 4 PASS not_null_dim_annual_service_mode_time_periods_key .................. [PASS in 0.95s]
23:36:08  1 of 4 PASS accepted_values_dim_annual_service_mode_time_periods_report_year__2022  [PASS in 0.95s]
23:36:08  2 of 4 PASS accepted_values_dim_annual_service_mode_time_periods_time_period__Average_Weekday_AM_Peak__Average_Weekday_Midday__Average_Weekday_PM_Peak__Average_Weekday_Other__Average_Typical_Weekday__Average_Typical_Saturday__Average_Typical_Sunday  [PASS in 0.95s]
23:36:08  4 of 4 PASS unique_dim_annual_service_mode_time_periods_key .................... [PASS in 1.00s]
23:36:08
23:36:08  Finished running 4 tests in 0 hours 0 minutes and 2.12 seconds (2.12s).
23:36:08
23:36:08  Completed successfully
23:36:08
23:36:08  Done. PASS=4 WARN=0 ERROR=0 SKIP=0 TOTAL=4
❯ poetry run dbt run -s +models/mart/ntd/fct_annual_service_modes.sql
23:25:02  Running with dbt=1.5.1
23:25:04  Found 422 models, 967 tests, 0 snapshots, 0 analyses, 852 macros, 0 operations, 12 seed files, 175 sources, 4 exposures, 0 metrics, 0 groups
23:25:04
23:25:07  Concurrency: 8 threads (target='dev')
23:25:07
23:25:07  1 of 2 START sql view model erika_staging.stg_ntd_annual_data__2022__service_by_mode  [RUN]
23:25:08  1 of 2 OK created sql view model erika_staging.stg_ntd_annual_data__2022__service_by_mode  [CREATE VIEW (0 processed) in 1.00s]
23:25:08  2 of 2 START sql table model erika_mart_ntd.fct_annual_service_modes ........... [RUN]
23:25:11  2 of 2 OK created sql table model erika_mart_ntd.fct_annual_service_modes ...... [CREATE TABLE (3.7k rows, 16.2 MiB processed) in 3.39s]
23:25:11
23:25:11  Finished running 1 view model, 1 table model in 0 hours 0 minutes and 7.51 seconds (7.51s).
23:25:11
23:25:11  Completed successfully
23:25:11
23:25:11  Done. PASS=2 WARN=0 ERROR=0 SKIP=0 TOTAL=2


❯ poetry run dbt test -s models/mart/ntd/fct_annual_service_modes.sql
23:36:56  1 of 4 START test accepted_values_fct_annual_service_modes_max_time_period__Annual_Total  [RUN]
23:36:56  2 of 4 START test accepted_values_fct_annual_service_modes_report_year__2022 ... [RUN]
23:36:56  3 of 4 START test not_null_fct_annual_service_modes_key ........................ [RUN]
23:36:56  4 of 4 START test unique_fct_annual_service_modes_key .......................... [RUN]
23:36:57  1 of 4 PASS accepted_values_fct_annual_service_modes_max_time_period__Annual_Total  [PASS in 0.72s]
23:36:57  3 of 4 PASS not_null_fct_annual_service_modes_key .............................. [PASS in 0.72s]
23:36:57  4 of 4 PASS unique_fct_annual_service_modes_key ................................ [PASS in 0.73s]
23:36:57  2 of 4 PASS accepted_values_fct_annual_service_modes_report_year__2022 ......... [PASS in 0.76s]
23:36:57
23:36:57  Finished running 4 tests in 0 hours 0 minutes and 1.92 seconds (1.92s).
23:36:57
23:36:57  Completed successfully
23:36:57
23:36:57  Done. PASS=4 WARN=0 ERROR=0 SKIP=0 TOTAL=4

Post-merge follow-ups

  • No action required
  • Actions required (specified below)

Confirm the correct creation of the models on cal-itp-data-infra.

Copy link

github-actions bot commented Oct 28, 2024

Warehouse report 📦

Checks/potential follow-ups

Checks indicate the following action items may be necessary.

  • For new models, do they all have a surrogate primary key that is tested to be not-null and unique?

New models 🌱

calitp_warehouse.mart.ntd.dim_annual_service_agencies

calitp_warehouse.mart.ntd.dim_annual_service_mode_time_periods

calitp_warehouse.mart.ntd.fct_annual_service_modes

DAG

Legend (in order of precedence)

Resource type Indicator Resolution
Large table-materialized model Orange Make the model incremental
Large model without partitioning or clustering Orange Add partitioning and/or clustering
View with more than one child Yellow Materialize as a table or incremental
Incremental Light green
Table Green
View White

- Reorder NTD external tables list Alphabetically
- Add Documentation and Tests for
- Remove deprecated dim_annual_ntd_agency_service

[#3396]
@tiffanychu90
Copy link
Member

tiffanychu90 commented Oct 29, 2024

  • Slack thread
  • I'll take a TODO to write up a GH issue to do some of the dbt docs + macros and start working on moving columns that are broadly present across many NTD tables into that
    • This PR should remain the mart NTD table creation, and a follow-up issue + PR can address docs
  • One remaining question: why are the columns read in from stg_ntd_annual_data__2022__service_by_agency have the max_ or sum_ prefix?

@tiffanychu90
Copy link
Member

jovyan@jupyter-tiffanychu90 ~/data-infra/warehouse (3396-ntd-agency) $ poetry run dbt run -s +"models/mart/ntd/dim_annual_service_agencies.sql"
22:03:36  Running with dbt=1.5.1
22:03:39  [WARNING]: Configuration paths exist in your dbt_project.yml file which do not apply to any resources.
There are 1 unused configuration paths:
- models.calitp_warehouse.mart.ad_hoc
22:03:40  Found 422 models, 973 tests, 0 snapshots, 0 analyses, 852 macros, 0 operations, 12 seed files, 175 sources, 4 exposures, 0 metrics, 0 groups
22:03:40  
22:03:44  Concurrency: 8 threads (target='dev')
22:03:44  
22:03:44  1 of 2 START sql view model tiffany_staging.stg_ntd_annual_data__2022__service_by_agency  [RUN]
22:03:45  1 of 2 OK created sql view model tiffany_staging.stg_ntd_annual_data__2022__service_by_agency  [CREATE VIEW (0 processed) in 1.07s]
22:03:45  2 of 2 START sql table model tiffany_mart_ntd.dim_annual_service_agencies ...... [RUN]
22:03:48  2 of 2 OK created sql table model tiffany_mart_ntd.dim_annual_service_agencies . [CREATE TABLE (2.2k rows, 7.9 MiB processed) in 3.61s]
22:03:48  
22:03:48  Finished running 1 view model, 1 table model in 0 hours 0 minutes and 8.41 seconds (8.41s).
22:03:49  
22:03:49  Completed successfully
jovyan@jupyter-tiffanychu90 ~/data-infra/warehouse (3396-ntd-agency) $ poetry run dbt run -s +models/mart/ntd/fct_annual_service_modes.sql
22:05:34  Running with dbt=1.5.1
22:05:37  [WARNING]: Configuration paths exist in your dbt_project.yml file which do not apply to any resources.
There are 1 unused configuration paths:
- models.calitp_warehouse.mart.ad_hoc
22:05:38  Found 422 models, 973 tests, 0 snapshots, 0 analyses, 852 macros, 0 operations, 12 seed files, 175 sources, 4 exposures, 0 metrics, 0 groups
22:05:38  
22:05:42  Concurrency: 8 threads (target='dev')
22:05:42  
22:05:42  1 of 2 START sql view model tiffany_staging.stg_ntd_annual_data__2022__service_by_mode  [RUN]
22:05:43  1 of 2 OK created sql view model tiffany_staging.stg_ntd_annual_data__2022__service_by_mode  [CREATE VIEW (0 processed) in 1.01s]
22:05:43  2 of 2 START sql table model tiffany_mart_ntd.fct_annual_service_modes ......... [RUN]
22:05:46  2 of 2 OK created sql table model tiffany_mart_ntd.fct_annual_service_modes .... [CREATE TABLE (3.7k rows, 16.2 MiB processed) in 3.59s]
22:05:46  
22:05:46  Finished running 1 view model, 1 table model in 0 hours 0 minutes and 8.36 seconds (8.36s).
22:05:47  
22:05:47  Completed successfully
22:05:47  

@erikamov
Copy link
Contributor Author

  • Slack thread

  • I'll take a TODO to write up a GH issue to do some of the dbt docs + macros and start working on moving columns that are broadly present across many NTD tables into that

    • This PR should remain the mart NTD table creation, and a follow-up issue + PR can address docs
  • One remaining question: why are the columns read in from stg_ntd_annual_data__2022__service_by_agency have the max_ or sum_ prefix?

The column names with SUM and MAX actually came from the original NTD API data. I just kept them like it came to us, but I agree with you that is better without it.
Thank you for all changes and information. 🤩

@erikamov erikamov merged commit 5c543ce into main Oct 30, 2024
4 checks passed
@erikamov erikamov deleted the 3396-ntd-agency branch October 30, 2024 00:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants