Skip to content

[async] Support microbatching when using ExecutionMode.AIRFLOW_ASYNC #1270

Open
@tatiana

Description

Context

Incremental models in dbt is a materialization strategy designed to efficiently update your data warehouse tables by only transforming and loading new or changed data since the last run. Instead of processing your entire dataset every time, incremental models append or update only the new rows, significantly reducing the time and resources required for your data transformations.

Even with all the benefits of incremental models as they exist today, there are limitations with this approach, such as:

  • burden is on YOU to calculate what’s “new” - what has already been loaded, what needs to be loaded, etc.
  • can be slow if you have many partitions to process (like when running in full-refresh mode) as it’s done in “one big” SQL statement - can time out, if it fails you end up needing to retry already successful partitions, etc.
  • if you want to specifically name a partition for your incremental model to process, you have to add additional “hack”y logic, likely using vars
    data tests run on your entire model, rather than just the "new" data

dbt-labs/dbt-core#10624

Acceptance criteria

  • ExecutionMode.AIRFLOW_ASYNC can leverage dbt microbatching strategies

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Labels

    area:executionRelated to the execution environment/mode, like Docker, Kubernetes, Local, VirtualEnv, etcdo-not-staleRelated to stale job and dosubot

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions