feat(snowflake)!: transpilation support MAP_INSERT#7190
feat(snowflake)!: transpilation support MAP_INSERT#7190fivetran-ashashankar wants to merge 1 commit intomainfrom
Conversation
SQLGlot Integration Test ResultsComparing:
By Dialect
Overallmain: 4003 total, 4003 passed (pass rate: 100.0%), sqlglot version: sqlglot:RD-1147669-transpile-MAP_INSERT: 4003 total, 4003 passed (pass rate: 100.0%), sqlglot version: Transitions: |
| # Snowflake's PARSE_JSON('NULL') uses uppercase, but DuckDB's JSON requires lowercase 'null' | ||
| if isinstance(arg, exp.Literal) and arg.is_string and arg.this == "NULL": | ||
| arg = exp.Literal.string("null") |
There was a problem hiding this comment.
We shouldn't handle static cases at parse time. For example, if the input arg is an identifier we easily miss the handling because we don't know the value at parse time. Also, this change is irrelevant to this PR, let's remove it.
| self.validate_all( | ||
| "SELECT MAP_CONCAT(CAST({'a': 1, 'b': 2} AS MAP(TEXT, DECIMAL(38, 0))), MAP {'c': 3})", | ||
| read={ | ||
| "snowflake": "SELECT MAP_INSERT({'a':1,'b':2}::MAP(VARCHAR,NUMBER),'c',3)", | ||
| }, | ||
| write={ | ||
| "duckdb": "SELECT MAP_CONCAT(CAST({'a': 1, 'b': 2} AS MAP(TEXT, DECIMAL(38, 0))), MAP {'c': 3})", | ||
| }, | ||
| ) | ||
| self.validate_all( | ||
| "SELECT MAP_CONCAT(CAST({'a': 1} AS MAP(TEXT, DECIMAL(38, 0))), MAP {'a': 99})", | ||
| read={ | ||
| "snowflake": "SELECT MAP_INSERT({'a':1}::MAP(VARCHAR,NUMBER),'a',99,TRUE)", | ||
| }, | ||
| write={ | ||
| "duckdb": "SELECT MAP_CONCAT(CAST({'a': 1} AS MAP(TEXT, DECIMAL(38, 0))), MAP {'a': 99})", | ||
| }, | ||
| ) | ||
| self.validate_all( | ||
| "SELECT id, MAP_CONCAT(attrs, MAP {'new_key': 'new_value'}) AS attrs_with_insert FROM demo_maps", | ||
| read={ | ||
| "snowflake": "SELECT id, MAP_INSERT(attrs, 'new_key', 'new_value') AS attrs_with_insert FROM demo_maps", | ||
| }, | ||
| write={ | ||
| "duckdb": "SELECT id, MAP_CONCAT(attrs, MAP {'new_key': 'new_value'}) AS attrs_with_insert FROM demo_maps", | ||
| }, | ||
| ) |
There was a problem hiding this comment.
D SELECT version();
┌─────────────┐
│ "version"() │
│ varchar │
├─────────────┤
│ v1.4.4 │
└─────────────┘
D SELECT MAP_CONCAT(CAST({'a': 1, 'b': 2} AS MAP(TEXT, DECIMAL(38, 0))), MAP {'c': 3});
Invalid Input Error:
'value' type of map differs between arguments, expected 'MAP(VARCHAR, DECIMAL(38,0))', found 'MAP(VARCHAR, INTEGER)' instead
D SELECT MAP_CONCAT(CAST({'a': 1} AS MAP(TEXT, DECIMAL(38, 0))), MAP {'a': 99});
Invalid Input Error:
'value' type of map differs between arguments, expected 'MAP(VARCHAR, DECIMAL(38,0))', found 'MAP(VARCHAR, INTEGER)' instead
These ^ don't work, let's always run the tests against the engine in order to determine if they are correct.
| # Get the type of the original map by traversing nested MAP_INSERT expressions | ||
| def get_map_type(expr: exp.Expression) -> t.Optional[exp.DataType]: | ||
| if expr.type: | ||
| return expr.type | ||
| # Traverse through nested MAP_INSERT to find the original typed expression | ||
| if isinstance(expr, exp.MapInsert): | ||
| return get_map_type(expr.this) | ||
| return None | ||
|
|
||
| map_type = get_map_type(map_arg) | ||
|
|
||
| if value and map_type and map_type.expressions and len(map_type.expressions) > 1: | ||
| # Extract the value type from MAP(key_type, value_type) | ||
| value_type = map_type.expressions[1] | ||
| # Cast NULL values and non-literals to avoid type conflicts | ||
| # Literal numbers/booleans/strings can be inferred correctly by DuckDB | ||
| if isinstance(value, exp.Null) or not isinstance(value, exp.Literal): | ||
| value = exp.cast(value, value_type) |
There was a problem hiding this comment.
This logic fails for:
Snowflake input:
WITH map_cte AS (
SELECT
1 AS id,
{'a':1,'b':2}::MAP(VARCHAR, NUMBER) AS data
)
SELECT
id,
data,
MAP_INSERT(data, 'c', 3) AS updated_map
FROM map_cte;
right ?
| # Get the type of the original map by traversing nested MAP_INSERT expressions | ||
| def get_map_type(expr: exp.Expression) -> t.Optional[exp.DataType]: | ||
| if expr.type: | ||
| return expr.type | ||
| # Traverse through nested MAP_INSERT to find the original typed expression | ||
| if isinstance(expr, exp.MapInsert): | ||
| return get_map_type(expr.this) | ||
| return None |
There was a problem hiding this comment.
Why do we need this? map_arg.type should give you what you want. I don't see why we treat nested MAP_INSERT calls as a special case.
|
|
||
| map_type = get_map_type(map_arg) | ||
|
|
||
| if value and map_type and map_type.expressions and len(map_type.expressions) > 1: |
There was a problem hiding this comment.
Moreover, I don't think this will be true in the general case. The reason is that we annotate Snowflake maps (and arrays, etc) as MAP, not MAP<K, V>, because these objects are polymorphic by default.
We should always add unit tests that cover the path where we infer types first, and then do sql(...), otherwise the code coverage is lacking.
snowflake to duckdb transpilation support to duckdb