Description
Is this your first time submitting a feature request?
- I have read the expectations for open source contributors
- I have searched the existing issues, and I could not find an existing issue for this feature
- I am requesting a straightforward extension of existing dbt functionality, rather than a Big Idea better suited to a discussion
Describe the feature
Scenario:
- Have a recent run in production
- Pull the artifact from that run
- Change Model A → rerun only Model A without needing to build all upstream references
- Change Model B (without changing Model A) → you should be able to just rerun Model B, while still using the production artifact to know the locations of all other upstream models
A challenge today is that dbt-core
expects to use the same "previous state" manifest for both:
- State comparison: What’s changed? What should be selected to run? (=
--select state:modified+
) - "Defer": rewriting upstream unbuilt references, for anything that doesn’t exist in this schema, wherever it exists in prod (=
--defer
)
Those are related concepts, and it’s often reasonable to provide the same input to both, but the conflation does make it hard for us to pursue a more-advanced use case.
Ideally, we might be able to do something more like:
- Given two project states, parse both projects, and get the "diff" between them (
state:modified
) - Always use a stable production artifact for deferral (rewriting upstream references). This is how we know where the models actually live (as views/tables) in the prod environments.
I think the change could be as simple as, dbt-core
gaining the ability to use a different manifest for each of:
- (1) Stateful selection (
state:
,result:
,source_status:
) - (2) Deferral / cloning ([CT-2348] [Feature]
dbt clone
command #7256)
If a custom --defer-diff-state
path is not specified, deferral should keep using the same --state
path by default. I expect this will continue to make sense for ~80% of cases, and it's a reasonable out-of-the-box behavior.
These are tricky concepts, and so the naming matters a lot! I'm not thrilled with this breakdown, and so I'm very open to thoughts/feedback:
--defer
(boolean)--state
(path)--defer-diff-state
(path)
Describe alternatives you've considered
Not doing this. In theory, you could keep using the previous run's manifest for slimmer and slimmer state comparison. For any nodes that were deferred, it will be the "production" version of those nodes in your manifest.
I don't think this would work in the case where you change a model, and then remove the changes (revert it to main
/ prod state). It feels like a leaky approach, when we'd be better off providing a clearer delineation.
Who will this benefit?
Users/applications pursuing ever-slimmer CI
Are you interested in contributing this feature?
No response
Anything else?
Currently, we load up the --state
manifest once, into the previous_state
container:
dbt-core/core/dbt/task/runnable.py
Lines 79 to 83 in 7045e11
Then we pass the same previous_state.manifest
into both node selection and deferral:
dbt-core/core/dbt/task/compile.py
Lines 94 to 106 in 7045e11
The idea here would be, allowing users to configure different previous-state manifests for use in one versus the other.