Schema-only dry-run feature - similar to a universal override of materialize: view or LIMIT 0
#6989
Replies: 2 comments 1 reply
-
|
I have thought before about how to make the approach used in I spiked this out for a Postgres adapter here where we execute the query and then just fetch the first row. It seemed like it could work but ran out of time to take it any further. Once you have the predicted schema of your |
Beta Was this translation helpful? Give feedback.
-
|
We added a new |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Feature idea
Especially in CI and as quick end-to-end "compile check" for column name spellings, refactored dependencies, etc., it would be helpful to have a built-in dbt feature to build all table schemas - or at least test that they are correctly generated from the SQL provided.
This allows users to quickly confirm that their SQL definitions are correct, free from spelling issues, and that there are no circular loops or missing source tables, for instance - without waiting for the entire DW to rebuild itself with data.
In CI and also in local development, we want to fail as quickly as possible if
column_ais mispelled ascolumm_a, for instance, and the priority here is to not wait 45 minutes (or longer) to find that compile-type SQL issue.Why have as a first-class feature
A benefit of having this as a first-class feature is that specific adapters can implement their own platform-optimized implementation.
materialize: viewormaterialize: ephemeralto all views, overriding the model-specific settings.dbt-bigquerycould use the BigQuery specific logic withindbt-dry-run, which leverages BigQueries native 'dry run' feature to do zero-cost tests that bypass the movement of data.dbt-postgresordbt-duckdbimplementation might be able to run the entire dry-run locally without touching the actual remote server.Integration with other dbt features
Integration with profiles
It should be assumed that this feature would be used primarily in CI testing and localdev "compiling" use cases, where we just want a quick check to find compile errors (if any) in the generated SQL or in mappings between models. As such, this could optionally be handled as a special type of profile - or as a setting against an existing profile type.
To use the bigquery example from below, it could be functionally equivalent to specifying a
dry-runprofile name, which passes all of its logic to https://github.com/autotraderuk/dbt-dry-run instead of sending todbt-bigquery.An advantage adding this as a feature at the profile level is that it then is very clear to the user that zero-data or view-only materializations are not going to affect other profiles like
devtest- and then you could havecicd_nodataas well ascicd_withdataas separate profile names.Integration with
deferAny of the above may be define themselves as compatible (or not) with
deferfeature, to skip over model definitions which are known to have not changed since last full build or since last dry-run.Integration with tests
Tests would either fail (if zero rows) or be extremely slow (if materialize: view) - so they would not be viable in this scenario.
However, similar to models, the SQL generated for each test could be validated in a schema-only way, so if a test references a non-existent column (for instance) we can fail the dry-run on the basis of that test query being invalid.
Compatibility with "pivot" logic
Any dynamic SQL code which pivots row data into column headers would require (1) that the models still compile without any data or (2) that a default behavior is provided in cases where 'dry-run' is detected, or no data exists.
Workarounds / Alternative Implementations
A hacky solution is to inject a special handling logic into the macro that determines the
materializesetting, and to let certainvarsorenv varstoggle everything tomaterialize: view.A similar workaround approach is to inject a
LIMIT 0feature into the macro that generates the SQL text - and have the limit only toggle on when a specific var or env var is detected.Related links
Note, this is different from:
There's a BigQuery-specific solution here:
Beta Was this translation helpful? Give feedback.
All reactions