Conversation
SQLGlot Integration Test ResultsComparing:
By Dialect
Overallmain: 14755 total, 13152 passed (pass rate: 89.1%), sqlglot version: sqlglot:snowflake_window_function_default_frame_transpilation: 14755 total, 13172 passed (pass rate: 89.3%), sqlglot version: Transitions: |
There was a problem hiding this comment.
Let's leave the logic of WINDOW_FUNCTIONS_WITH_NULL_ORDERING for a separate PR, and just focus on the snowflake -> duckdb.
@fivetran-felixhuang The logic we are using needs fixes (take a look at the regression tests).
- From the Snowflake Window frames , we can't arbitrarily remove the default window frame next to an
ORDER BY.
Valid SF
REGR_AVGX(a, b) OVER (PARTITION BY partition_id ORDER BY row_id ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
=>
Invalid SF
REGR_AVGX(a, b) OVER (PARTITION BY partition_id ORDER BY row_id
- From the Snowflake Window frames non-ranking funcs the default of
COUNTetc isRANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW, so by removing theROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWINGwe will create regressions between the results of sf -> sf and sf -> duckdb.
Let's make sure that we cover these cases ^ correctly and no regressions are presented.
the logic is now limited to NTH_VALUE, FIRST_VALUE and LAST_VALUE |
georgesittas
left a comment
There was a problem hiding this comment.
Besides Giorgos' comments this LGTM.
The default window frame in Snowflake is ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
https://docs.snowflake.com/en/sql-reference/functions-window-syntax#usage-notes-for-window-frames)
This is not the same as the default window frame in DuckDB (BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) https://duckdb.org/docs/stable/sql/functions/window_functions#framing
This can cause issue during transpilation.
We want to add the default Snowflake window frame to AST when none specified so we can use it to generate the transpiled query with the correct window frame. And when generating Snowflake query from AST, we can omit the generation of window frame clause if it matches the default window frame (omit ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING since it’s default)
In Snowflake, these window ranking functions allow and respect non-default window frames: FIRST_VALUE, LAST_VALUE, NTH_VALUE, DENSE_RANK, RANK. (Other window rankings like LEAD and LAG don't allow window frame other than the default one). For this PR, we will just work on FIRST_VALUE, LAST_VALUE and NTH_VALUE, since there is some further complication with DENSE_RANK and RANK (Snowflake respects the window frame of DENSE_RANK and RANK, but DuckDB ignores the frame, so in this sense these functions are only partially transpilable from Snowflake to Duckdb. We can handle this in a follow-up ticket if we need to support transpilation of these two)
We are introducing exp_supports_null_ordering and WINDOW_FUNCTIONS_WITH_NULL_ORDERING for BigQuery as the current implementation assumes BigQuery's window function with custom frame doesn't support NULL ordering, (NULLS FIRST/LAST) which is not a correct assumption. For example, the following is valid in BigQuery
FIRST_VALUE(val) OVER(
ORDER BY val ASC NULLS LAST
ROWS BETWEEN 1 PRECEDING AND 2 FOLLOWING
)
We have a follow-up ticket to investigate NULLS ordering further. For now exp_supports_null_ordering and WINDOW_FUNCTIONS_WITH_NULL_ORDERING are needed to avoid breaking logic around NULL ordering in ordered_sql in generator.py
/integration-tests