Feat!: Create physical tables as part of evaluation#5189
Conversation
|
|
||
| @dataclass(frozen=True) | ||
| class EvaluateNode(BaseNode): | ||
| snapshot_name: str |
There was a problem hiding this comment.
This is already defined in BaseNode. Same applies for CreateNode and Dummy Node.
erindru
left a comment
There was a problem hiding this comment.
Looks good to me! I guess we will find out soon how many projects were relying on create first -> then evaluate
| **kwargs, | ||
| ) | ||
| if self_referencing: | ||
| assert target_columns_to_types is not None |
There was a problem hiding this comment.
Should this be an explicit check?
There was a problem hiding this comment.
Just sharing this thread for some more context.
There was a problem hiding this comment.
We already have an explicit check for this on line 455. This is our 2nd entry into the self_referencing check so at this point we indeed assert that target_columns_to_types is not None.
| for schema_name, catalog in unique_schemas: | ||
| table_exprs = [(gateway, exp.to_table(t)) for gateway, t in gateway_table_pairs] | ||
| unique_schemas = { | ||
| (gateway, t.args["db"], t.args.get("catalog")) |
There was a problem hiding this comment.
Nit: aren't these available as t.db and t.catalog?
There was a problem hiding this comment.
They are, but the property accessors return strings, whereas looking them up through args results in Expression instances. Given we used to do this:
unique_schemas = {(t.args["db"], t.args.get("catalog")) for t in table_exprs if t and t.db}I'm assuming the Expression instances are necessary to preserve identifier quotes, etc.
| snapshot.name, | ||
| (intervals[batch_idx_to_wait_for], batch_idx_to_wait_for), | ||
| ) | ||
| EvaluateNode( |
There was a problem hiding this comment.
Making these implicit data structures explicit really helps when trying to understand what is going on
|
|
||
|
|
||
| class SchedulingUnit(abc.ABC): | ||
| snapshot_name: str |
There was a problem hiding this comment.
If you make this a dataclass then you don't have to redefine snapshot_name in the classes that inherit from this.
There was a problem hiding this comment.
I think when we talked you said you wanted this to be more of an interface but I don't really get the advantage we get from this framing. Either way works though.
There was a problem hiding this comment.
I believe that a dataclass is a product type and we should be able to instantiate it. This directly conflicts with ABC, which is supposed to provide an interface and can't be instantiated. That's why I'm against mixing the two.
This update eliminates the need for a separate "creating physical tables" step in the plan application. The tables are now created as part of the model evaluation DAG.
This has the following benefits:
CREATE OR REPLACEcallNote, that the "creating physical tables" step will still show up if the plan was applied with either the
--empty-backfillor the---skip-backfillflag, since there are no evaluations to perform.Breaking changes:
@IF(@runtime_stage = 'creating', ...)will run every time a table is created for the first time even if data is inserted as part of the creation process (i.e.CREATE OR REPLACE TABLE...). Similarly, macros that rely on theevaluatingruntime stage will no longer run as part of the first evaluation that also creates the table for the first time. Previously, there was a clear distinction between empty table creation (creatingstage) and data insertion (evaluatingstage). Since the two are now coupled, we had to relax the definition of thecreatingruntime stage to preserve some level of backwards compatibility.