Skip to content

[CT-2385] [Feature] Use different "state" for state comparison versus deferral #7300

Closed
@jtcohen6

Description

@jtcohen6

Is this your first time submitting a feature request?

  • I have read the expectations for open source contributors
  • I have searched the existing issues, and I could not find an existing issue for this feature
  • I am requesting a straightforward extension of existing dbt functionality, rather than a Big Idea better suited to a discussion

Describe the feature

Scenario:

  • Have a recent run in production
  • Pull the artifact from that run
  • Change Model A → rerun only Model A without needing to build all upstream references
  • Change Model B (without changing Model A) → you should be able to just rerun Model B, while still using the production artifact to know the locations of all other upstream models

A challenge today is that dbt-core expects to use the same "previous state" manifest for both:

  1. State comparison: What’s changed? What should be selected to run? (= --select state:modified+)
  2. "Defer": rewriting upstream unbuilt references, for anything that doesn’t exist in this schema, wherever it exists in prod (= --defer)

Those are related concepts, and it’s often reasonable to provide the same input to both, but the conflation does make it hard for us to pursue a more-advanced use case.

Ideally, we might be able to do something more like:

  1. Given two project states, parse both projects, and get the "diff" between them (state:modified)
  2. Always use a stable production artifact for deferral (rewriting upstream references). This is how we know where the models actually live (as views/tables) in the prod environments.

I think the change could be as simple as, dbt-core gaining the ability to use a different manifest for each of:

If a custom --defer-diff-state path is not specified, deferral should keep using the same --state path by default. I expect this will continue to make sense for ~80% of cases, and it's a reasonable out-of-the-box behavior.

These are tricky concepts, and so the naming matters a lot! I'm not thrilled with this breakdown, and so I'm very open to thoughts/feedback:

  • --defer (boolean)
  • --state (path)
  • --defer-diff-state (path)

Describe alternatives you've considered

Not doing this. In theory, you could keep using the previous run's manifest for slimmer and slimmer state comparison. For any nodes that were deferred, it will be the "production" version of those nodes in your manifest.

I don't think this would work in the case where you change a model, and then remove the changes (revert it to main / prod state). It feels like a leaky approach, when we'd be better off providing a clearer delineation.

Who will this benefit?

Users/applications pursuing ever-slimmer CI

Are you interested in contributing this feature?

No response

Anything else?

Currently, we load up the --state manifest once, into the previous_state container:

def set_previous_state(self):
if self.args.state is not None:
self.previous_state = PreviousState(
path=self.args.state, current_path=Path(self.config.target_path)
)

Then we pass the same previous_state.manifest into both node selection and deferral:

def _get_deferred_manifest(self) -> Optional[WritableManifest]:
if not self.args.defer:
return None
state = self.previous_state
if state is None:
raise DbtRuntimeError(
"Received a --defer argument, but no value was provided to --state"
)
if state.manifest is None:
raise DbtRuntimeError(f'Could not find manifest in --state path: "{self.args.state}"')
return state.manifest

The idea here would be, allowing users to configure different previous-state manifests for use in one versus the other.

Metadata

Metadata

Assignees

Labels

artifactsenhancementNew feature or requeststateStateful selection (state:modified, defer)

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions