Feat!: Create physical tables as part of evaluation by izeigerman · Pull Request #5189 · TobikoData/sqlmesh

izeigerman · 2025-08-19T20:09:26Z

This update eliminates the need for a separate "creating physical tables" step in the plan application. The tables are now created as part of the model evaluation DAG.

This has the following benefits:

Fewer queries submitted to the target engine, since the table creation is coupled with model evaluation in a single CREATE OR REPLACE call
Better alignment with the execution model of dbt projects which assumes that for each model in the DAG all upstream model both created tables and inserted data into them
Only the necessary tables are created. Previously, SQLMesh would preemptively create both the dev and non-dev tables for certain model snapshots. These tables are now created lazily, as needed

Note, that the "creating physical tables" step will still show up if the plan was applied with either the --empty-backfill or the ---skip-backfill flag, since there are no evaluations to perform.

Breaking changes:

Macros relying on @IF(@runtime_stage = 'creating', ...) will run every time a table is created for the first time even if data is inserted as part of the creation process (i.e. CREATE OR REPLACE TABLE...). Similarly, macros that rely on the evaluating runtime stage will no longer run as part of the first evaluation that also creates the table for the first time. Previously, there was a clear distinction between empty table creation (creating stage) and data insertion (evaluating stage). Since the two are now coupled, we had to relax the definition of the creating runtime stage to preserve some level of backwards compatibility.

eakmanrq · 2025-08-19T21:11:22Z

sqlmesh/core/scheduler.py

+
+@dataclass(frozen=True)
+class EvaluateNode(BaseNode):
+    snapshot_name: str


This is already defined in BaseNode. Same applies for CreateNode and Dummy Node.

sqlmesh/core/scheduler.py

erindru

Looks good to me! I guess we will find out soon how many projects were relying on create first -> then evaluate

examples/sushi/config.py

erindru · 2025-08-19T20:32:43Z

sqlmesh/core/engine_adapter/base.py

                **kwargs,
            )
        if self_referencing:
+            assert target_columns_to_types is not None


Should this be an explicit check?

Just sharing this thread for some more context.

We already have an explicit check for this on line 455. This is our 2nd entry into the self_referencing check so at this point we indeed assert that target_columns_to_types is not None.

sqlmesh/core/snapshot/evaluator.py

erindru · 2025-08-19T20:51:28Z

sqlmesh/core/snapshot/evaluator.py

-        for schema_name, catalog in unique_schemas:
+        table_exprs = [(gateway, exp.to_table(t)) for gateway, t in gateway_table_pairs]
+        unique_schemas = {
+            (gateway, t.args["db"], t.args.get("catalog"))


Nit: aren't these available as t.db and t.catalog?

They are, but the property accessors return strings, whereas looking them up through args results in Expression instances. Given we used to do this:

unique_schemas = {(t.args["db"], t.args.get("catalog")) for t in table_exprs if t and t.db}

I'm assuming the Expression instances are necessary to preserve identifier quotes, etc.

erindru · 2025-08-20T01:22:55Z

sqlmesh/core/scheduler.py

-                                snapshot.name,
-                                (intervals[batch_idx_to_wait_for], batch_idx_to_wait_for),
-                            )
+                            EvaluateNode(


Making these implicit data structures explicit really helps when trying to understand what is going on

eakmanrq · 2025-08-20T16:19:23Z

sqlmesh/core/scheduler.py

+
+
+class SchedulingUnit(abc.ABC):
+    snapshot_name: str


If you make this a dataclass then you don't have to redefine snapshot_name in the classes that inherit from this.

I think when we talked you said you wanted this to be more of an interface but I don't really get the advantage we get from this framing. Either way works though.

I believe that a dataclass is a product type and we should be able to instantiate it. This directly conflicts with ABC, which is supposed to provide an interface and can't be instantiated. That's why I'm against mixing the two.

izeigerman added 15 commits August 19, 2025 12:56

Feat: Lazily create model tables during evaluation

81581a5

simplify create

727103b

propagate allow_destructive_snapshots into evaluate

79869df

simplify create 2

f4326da

refactor scheduler

1262d1a

improvements

8ace714

fix tests

d3717b3

move more of creation into the scheduler

3a5953d

more tests

d5e5acc

fix test

2cca48f

fix cli test

7effa65

fix more tests

ea49c00

fix plan stages tests

608bd46

more test fixes

088c38c

fix snapshot evaluator tests

8dadf8b

izeigerman requested a review from a team August 19, 2025 20:09

izeigerman added 3 commits August 19, 2025 13:29

fix the remaining test

58f5f9d

support concurrent creation of non-overlapping schemas

e139601

update docs

3317c3a

eakmanrq reviewed Aug 19, 2025

View reviewed changes

sqlmesh/core/scheduler.py Outdated Show resolved Hide resolved

izeigerman added 2 commits August 19, 2025 15:14

preserve the semantics of the 'creating' runtime stage

6336f43

remove BaseNode

76aff43

erindru approved these changes Aug 20, 2025

View reviewed changes

erindru mentioned this pull request Aug 20, 2025

how to create a table if not exists ? #5191

Closed

eakmanrq approved these changes Aug 20, 2025

View reviewed changes

comments

9f60b52

eakmanrq reviewed Aug 20, 2025

View reviewed changes

comments

4e5d28f

izeigerman merged commit 014fe6a into main Aug 20, 2025
27 of 29 checks passed

izeigerman deleted the feat-couple-create-and-evaluate-stages branch August 20, 2025 18:19

xardasos mentioned this pull request Aug 22, 2025

Fix(risingwave): Recreate materialized views #5195

Merged

themisvaltinos mentioned this pull request Aug 26, 2025

Fix: Move before all statements execution before snapshot creation logic #5229

Merged

This was referenced Aug 29, 2025

Fix: Regression in WAP support #5264

Merged

Fix: Evaluation of metadata snapshots with audit changes #5267

Merged

themisvaltinos mentioned this pull request Sep 19, 2025

Fix: Pass a copy of the properties to avoid mutation of actual dict #5417

Merged

anismiles mentioned this pull request Dec 14, 2025

Regression: Cron-based start date inference incorrect for maiden plans #5629

Open

Comments

Conversation

izeigerman commented Aug 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

erindru left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

izeigerman commented Aug 19, 2025 •

edited

Loading