feat(duckdb): Add transpilation support for Snowflake STARTS WITH ... CONNECT BY by fivetran-kwoodbeck · Pull Request #7709 · tobymao/sqlglot

fivetran-kwoodbeck · 2026-06-04T20:04:57Z

Adds Snowflake to DuckDB transpilation for START WITH ... CONNECT BY PRIOR hierarchical queries.

DuckDB rejects CONNECT BY verbatim. This adds connect_by_to_recursive_cte registered as a preprocessor on exp.Select, which rewrites the query into a WITH RECURSIVE CTE at generation time:

Anchors START WITH rows from the source table
Recursive arm: source table aliased _child_row joined to the CTE aliased _parent_row on the PRIOR predicate
Outer SELECT re-projects from _rootcte, applying WHERE post-hierarchy and forwarding ORDER BY, LIMIT, OFFSET

Falls through unchanged for NOCYCLE, multiple PRIORs, or non-equality predicates. CTE name collision is avoided via find_new_name. WHERE columns absent from SELECT are carried through the CTE so the outer filter can see them. LEVEL is injected as a depth counter unless already projected.

Tests in sqlglot-integration-tests cover: SELECT *, explicit column list with ORDER BY, PRIOR on left side, WHERE carry-through, existing CTE preservation, and name collision.

github-actions · 2026-06-04T20:37:48Z

SQLGlot Integration Test Results

❌ 4 regressions — see details below

Comparing:

this branch (sqlglot:transpile/snowflake-startswith @ sqlglot 0481e32)
baseline (main @ sqlglot 87b1da8)

By Dialect

dialect	main	feature branch	transitions	links
databricks -> databricks	9982/11820 passed (84.5%)	1366/1370 passed (99.7%)	4 pass -> fail	full result / delta

Overall

main: 192441 total, 153536 passed (pass rate: 79.8%)

sqlglot:transpile/snowflake-startswith: 106860 total, 106856 passed (pass rate: 100.0%)

Transitions:
4 pass -> fail

Dialect pair changes: 0 previous results not found, 2 current results not found

❌ 4 regressions (view logs)

georgesittas · 2026-06-08T11:33:18Z

+    existing_where = expression.args.get("where")
+    existing_with = expression.args.get("with_")
+
+    # WHERE columns absent from SELECT must be carried through the CTE so the outer filter can see them.


What if you have SELECT * FROM x WHERE c > 1 START WITH ...? Does the extra_cols logic result in duplicating c in the projection list? That may cause ambiguity issues if that's the case. Let's check this.

The extra_cols block is guarded by if not has_star else {} above so when the query uses SELECT *, extra_cols is always empty, so c is not added to the projection list twice. There's a specific test to cover this case (SELECT * FROM (SELECT * FROM emp ...) WHERE c > 1) and verifies there's no duplication.

georgesittas · 2026-06-11T11:26:09Z

Let me know when this is ready for another review.

georgesittas · 2026-06-12T14:05:05Z

+    connect_pred = connect.args.get("connect")
+    if not connect_pred:
+        return expression


This is a required arg:

Suggested change

connect_pred = connect.args.get("connect")

if not connect_pred:

return expression

connect_pred = connect.args["connect"]

TypeError: 'dict' object is not callable, I assume you're suggesting to just remove the if?

I meant both removing if and switching to [] syntax instead of get().

georgesittas · 2026-06-15T12:06:38Z

+    priors = list(connect_pred.find_all(exp.Prior))
+    if len(priors) != 1:
+        return expression
+


What is the difficulty with transpiling multiple PRIORs? Isn't it a matter of combining the predicates with AND?

For multiple PRIOR, we can add support at this spot fairly easily, but it makes cycle detection below quite a bit more complicated. We have a few options: 1) Remove cycle detection (Snowflake doesn't support it, so it's not strictly necessary to match Snowflake's behavior), 2) Simple fix to add in multiple PRIOR but leave cycle detection on one column only, 3) do the full correct cycle detection, probably another 30-40 lines of code to an already overly complex function.

I'd vote for Option 1 and call cycle detection out of scope for this PR.

Snowflake doesn't support it, so it's not strictly necessary to match Snowflake's behavior

I was under the impression that it does support it, through the NOCYCLE specifier. If it doesn't, then the whole cycle detection trick we implemented can be cleaned up, otherwise, if what I say is correct, we should only emit that code when NOCYCLE is present to match Snowflake 100%.

I think it's worth to outline what the solution for handling >1 PRIOR clauses will look like, simply for future reference. Other than that, I'm happy to pause here and fall back to unsupported for now. We can always revisit this later.

georgesittas · 2026-06-15T12:07:17Z

+    prior = priors[0]
+    if not isinstance(prior.this, exp.Column):
+        return expression


Can this ever not be a Column and still be valid Snowflake? If not, then we don't need this, as the input itself would be invalid and sqlglot doesn't need to validate it or take it into account.

Yes it can be non Column. If we have PRIOR (x + 1), it will put Prior(Add(...)) in the AST. The existing tests fail if this if is removed.

georgesittas · 2026-06-15T12:10:57Z

+    def _is_level_ref(e: exp.Expr) -> bool:
+        col = e.this if isinstance(e, exp.Alias) else e
+        return isinstance(col, exp.Column) and col.name.upper() == "LEVEL" and not col.table
+
+    has_level = any(_is_level_ref(e) for e in base_select_exprs)


You can also have SELECT level + 1 ..., for example. I don't think you'd detect level in that case.

Good point, I'll make it find_all to look for a LEVEL.

georgesittas · 2026-06-15T12:11:24Z

+        return isinstance(col, exp.Column) and col.name.upper() == "LEVEL" and not col.table
+
+    has_level = any(_is_level_ref(e) for e in base_select_exprs)
+    has_star = any(e.is_star for e in base_select_exprs)


Suggested change

has_star = any(e.is_star for e in base_select_exprs)

has_star = expression.is_star

georgesittas · 2026-06-15T12:12:14Z

+    # With SELECT *, all columns are already included so no extras are needed.
+    select_names = {e.alias_or_name for e in base_select_exprs}


Suggested change

# With SELECT *, all columns are already included so no extras are needed.

select_names = {e.alias_or_name for e in base_select_exprs}

# With SELECT *, all columns are already included so no extras are needed.

select_names = expression.named_selects

georgesittas · 2026-06-15T12:13:49Z

+    extra_cols = (
+        dict.fromkeys(
+            col.name
+            for col in (base_where.find_all(exp.Column) if base_where else [])
+            if (not col.table or col.table == source_table_alias) and col.name not in select_names
+        )
+        if not has_star
+        else {}
+    )


I suggest rewriting this into a proper branch instead of a ternary expression, it's not easily readable now.

georgesittas · 2026-06-15T12:15:32Z

+    # Anchor: seed LEVEL at 1 and start the ancestor path with the root's parent key value.
+    anchor_projs = [e for e in base_select_exprs if not _is_level_ref(e)] + [
+        exp.column(c) for c in extra_cols
+    ]


There are quite a few traversals of base_select_exprs. I think I can count 6. Is it possible to merge some of those loops? Seems wasteful.

There's a couple of optimizations, yes, but they're mostly doing different things. Will update.

fivetran-kwoodbeck force-pushed the transpile/snowflake-startswith branch from 90cc2e1 to 9c55e84 Compare June 5, 2026 20:19

georgesittas self-requested a review June 8, 2026 10:16

georgesittas self-assigned this Jun 8, 2026

georgesittas requested changes Jun 8, 2026

View reviewed changes

fivetran-kwoodbeck changed the title ~~feat(duckdb): Add transpilation support for Snowflake STARTS WITH~~ feat(duckdb): Add transpilation support for Snowflake STARTS WITH ... CONNECT BY Jun 8, 2026

fivetran-kwoodbeck force-pushed the transpile/snowflake-startswith branch from 9c55e84 to bb9744a Compare June 9, 2026 14:04

fivetran-kwoodbeck force-pushed the transpile/snowflake-startswith branch from bb9744a to c1f4d7f Compare June 11, 2026 14:03

georgesittas reviewed Jun 15, 2026

View reviewed changes

fivetran-kwoodbeck added 5 commits June 15, 2026 12:54

changes so far

fbe6cd5

refactored

2020f36

reverted makefile

d165764

many updates, as per PR feedback

3d499bd

updates, PR feedback

09ac7c9

fivetran-kwoodbeck force-pushed the transpile/snowflake-startswith branch from c1f4d7f to 09ac7c9 Compare June 15, 2026 16:55

	has_star = any(e.is_star for e in base_select_exprs)
	has_star = expression.is_star

		# With SELECT *, all columns are already included so no extras are needed.
		select_names = {e.alias_or_name for e in base_select_exprs}

Conversation

fivetran-kwoodbeck commented Jun 4, 2026

Uh oh!

github-actions Bot commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

SQLGlot Integration Test Results

❌ 4 regressions — see details below

By Dialect

Overall

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

georgesittas commented Jun 11, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fivetran-kwoodbeck Jun 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

github-actions Bot commented Jun 4, 2026 •

edited

Loading

fivetran-kwoodbeck Jun 15, 2026 •

edited

Loading