Add pseudo selectors that select models based on artifact states #2465
Description
Describe the feature
This change would be in support of:
- Improved dev experiences
- Slimmer CI builds
If dbt is provided artifacts (manifest, run_results) produced from a previous run of dbt, then dbt will be able to determine:
- New nodes
- Changed nodes
- Nodes that failed to build in a previous invocation
Here are some high-level example usage scenarios:
# Run new and changed models (and their descendants) in a CI build
$ dbt --state prod-target/ run --models @state:modified
# Re-run failed models and their children in development (or, re-run a prod job that failed)
$ dbt --state target/ run --models build:error+
# Re-run failed models and their children in development
# Note: --state is implied to be target/ here
$ dbt run --models build:error+
Implementation details
dbt is going to need to point to the artifacts from a previous invocation to compare manifests or determine build statuses from a previous run. To accomplish this, we could add a flag like --state
which should point to a folder containing the manifest and run_results from a previous invocation of dbt. It will be the users responsibility to make sure these artifacts are present in their environment.
--state flag:
- This flag probably makes the most sense as a flag to dbt, as it will apply to many subcommands (eg.
compile
,run
,test
,seed
,snapshot
, andls
). It can definitely be a flag to subcommands (or both) if that makes sense - The default value should be
target/
- If the expected state files are not present, dbt should run successfully, but selectors based on this state information should fail if used.
- eg.
dbt run --models build:error
will fail with an appropriate error if thetarget/
dir does not exist
- eg.
Selectors:
state:modified
: Will select any nodes whose hashes have changed compared to the value present in the manifest artifactstate:new
: Will select any nodes which are present in the project but are not present in the manifest artifact- We'll probably want to provide some shorthand that selects new & changed files for local dev
build:error
: Will select any nodes which errored or were skipped in run_results state artifactbuild:success
: I don't know that there's a concrete use-case for something like this, but it seems sensible to implement selectors for different states
Determining nodes that have changed
This is a tricky problem! A very simple version of this functionality can be implemented with a git diff --name-only
. That will get you pretty far, but it will not account for:
- models that should be considered changed because they reference a macro that has changed
- schema.yml files (it's tough to correlate .yml file changes to dbt nodes, at least as far as git is concerned)
- the global impacts of changes to specific macros (eg.
generate_schema_name
) or thedbt_project.yml
file
Describe alternatives you've considered
- Git trickery: This is an incomplete solution and won't fare super well in CI envs, but might be hackable in local dev work
Who will this benefit?
- People who run dbt jobs in their CI envs
- People who are making iterative changes in development
- We could add a "Rerun from failed" button in dbt Cloud, and folks running dbt in their own prod envs could do something similar (eg. in an Airflow error handler) for intermittent build failures