Schema-only dry-run feature - similar to a universal override of `materialize: view` or `LIMIT 0` #6989

aaronsteers · 2023-02-15T19:36:25Z

aaronsteers
Feb 15, 2023

Feature idea

Especially in CI and as quick end-to-end "compile check" for column name spellings, refactored dependencies, etc., it would be helpful to have a built-in dbt feature to build all table schemas - or at least test that they are correctly generated from the SQL provided.

This allows users to quickly confirm that their SQL definitions are correct, free from spelling issues, and that there are no circular loops or missing source tables, for instance - without waiting for the entire DW to rebuild itself with data.

In CI and also in local development, we want to fail as quickly as possible if column_a is mispelled as columm_a, for instance, and the priority here is to not wait 45 minutes (or longer) to find that compile-type SQL issue.

Why have as a first-class feature

A benefit of having this as a first-class feature is that specific adapters can implement their own platform-optimized implementation.

A generic implementation can simply apply materialize: view or materialize: ephemeral to all views, overriding the model-specific settings.
dbt-bigquery could use the BigQuery specific logic within dbt-dry-run, which leverages BigQueries native 'dry run' feature to do zero-cost tests that bypass the movement of data.
dbt-postgres or dbt-duckdb implementation might be able to run the entire dry-run locally without touching the actual remote server.
In theory, advanced implementations (DuckDB, SQLite, ?) could compile table schemas in memory, without writing anything to disk.

Integration with other dbt features

Integration with profiles

It should be assumed that this feature would be used primarily in CI testing and localdev "compiling" use cases, where we just want a quick check to find compile errors (if any) in the generated SQL or in mappings between models. As such, this could optionally be handled as a special type of profile - or as a setting against an existing profile type.

To use the bigquery example from below, it could be functionally equivalent to specifying a dry-run profile name, which passes all of its logic to https://github.com/autotraderuk/dbt-dry-run instead of sending to dbt-bigquery.

An advantage adding this as a feature at the profile level is that it then is very clear to the user that zero-data or view-only materializations are not going to affect other profiles like devtest - and then you could have cicd_nodata as well as cicd_withdata as separate profile names.

Integration with `defer`

Any of the above may be define themselves as compatible (or not) with defer feature, to skip over model definitions which are known to have not changed since last full build or since last dry-run.

Integration with tests

Tests would either fail (if zero rows) or be extremely slow (if materialize: view) - so they would not be viable in this scenario.

However, similar to models, the SQL generated for each test could be validated in a schema-only way, so if a test references a non-existent column (for instance) we can fail the dry-run on the basis of that test query being invalid.

Compatibility with "pivot" logic

Any dynamic SQL code which pivots row data into column headers would require (1) that the models still compile without any data or (2) that a default behavior is provided in cases where 'dry-run' is detected, or no data exists.

Workarounds / Alternative Implementations

A hacky solution is to inject a special handling logic into the macro that determines the materialize setting, and to let certain vars or env vars toggle everything to materialize: view.

A similar workaround approach is to inject a LIMIT 0 feature into the macro that generates the SQL text - and have the limit only toggle on when a specific var or env var is detected.

Related links

Note, this is different from:

dbt should have a Dry-Run mode #4456 - which only focuses on printing the SQL statements to the user

There's a BigQuery-specific solution here:

https://github.com/autotraderuk/dbt-dry-run

ghost · 2023-02-16T13:57:11Z

ghost
Feb 16, 2023

I have thought before about how to make the approach used in dbt-dry-run work for other adapters. I'm not very familiar with the internals of how adapters are implemented but a good start could be to create another method in the adapter interface called predict_schema which would take a SQL query and return a schema description of what that query will return without necessarily running it. The BigQuery implementation of this method would be more or less the same as what dbt-dry-run does here but other adapters that don't support dry run queries could probably achieve similar (But not as performant) results by adding a LIMIT 0 on the query.

I spiked this out for a Postgres adapter here where we execute the query and then just fetch the first row. It seemed like it could work but ran out of time to take it any further.

Once you have the predicted schema of your sources in a dbt project you then don't really need the dry run functionality that Big Query offers. You can just replace the ref macro with the SELECT literals that dbt-dry-run generates from the predicted schema:

-- my_test_view
select *
FROM {{ ref('my_staging_model') }}
-- Compiles to
select *
FROM `at-example-project.training.my_staging_model`
-- Replace with a select literal and dry run this
select *
-- built from query_job.schema of upstream dry run
FROM (select "any_string" as `first_name`, 42 as `age`)

1 reply

jtcohen6 Feb 16, 2023
Maintainer

This is quite similar to what we're after in #6751! We need that for "model contracts," to validate that the user-provided modeling logic (in SQL) matches up with the user-defined columns spec (in yaml)

graciegoheen · 2025-05-22T18:56:09Z

graciegoheen
May 22, 2025
Maintainer

We added a new --empty flag in v1.8 to cover this use case -> https://docs.getdbt.com/docs/dbt-versions/core-upgrade/upgrading-to-v1.8#the---empty-flag

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Schema-only dry-run feature - similar to a universal override of `materialize: view` or `LIMIT 0` #6989

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 2 comments 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Schema-only dry-run feature - similar to a universal override of materialize: view or LIMIT 0 #6989

Uh oh!

Uh oh!

aaronsteers Feb 15, 2023

Feature idea

Why have as a first-class feature

Integration with other dbt features

Integration with profiles

Integration with defer

Integration with tests

Compatibility with "pivot" logic

Workarounds / Alternative Implementations

Related links

Replies: 2 comments · 1 reply

Uh oh!

ghost Feb 16, 2023

Uh oh!

jtcohen6 Feb 16, 2023 Maintainer

Uh oh!

graciegoheen May 22, 2025 Maintainer

Schema-only dry-run feature - similar to a universal override of `materialize: view` or `LIMIT 0` #6989

aaronsteers
Feb 15, 2023

Integration with `defer`

Replies: 2 comments 1 reply

ghost
Feb 16, 2023

jtcohen6 Feb 16, 2023
Maintainer

graciegoheen
May 22, 2025
Maintainer