[async] Support microbatching when using ExecutionMode.AIRFLOW_ASYNC
#1270
Open
Description
opened on Oct 21, 2024
Context
Incremental models in dbt is a materialization strategy designed to efficiently update your data warehouse tables by only transforming and loading new or changed data since the last run. Instead of processing your entire dataset every time, incremental models append or update only the new rows, significantly reducing the time and resources required for your data transformations.
Even with all the benefits of incremental models as they exist today, there are limitations with this approach, such as:
- burden is on YOU to calculate what’s “new” - what has already been loaded, what needs to be loaded, etc.
- can be slow if you have many partitions to process (like when running in full-refresh mode) as it’s done in “one big” SQL statement - can time out, if it fails you end up needing to retry already successful partitions, etc.
- if you want to specifically name a partition for your incremental model to process, you have to add additional “hack”y logic, likely using vars
data tests run on your entire model, rather than just the "new" data
Acceptance criteria
-
ExecutionMode.AIRFLOW_ASYNC
can leverage dbt microbatching strategies
Activity