Skip to content

feat(snowflake)!: transpilation support MAP_INSERT#7190

Open
fivetran-ashashankar wants to merge 1 commit intomainfrom
RD-1147669-transpile-MAP_INSERT
Open

feat(snowflake)!: transpilation support MAP_INSERT#7190
fivetran-ashashankar wants to merge 1 commit intomainfrom
RD-1147669-transpile-MAP_INSERT

Conversation

@fivetran-ashashankar
Copy link
Collaborator

snowflake to duckdb transpilation support to duckdb

@github-actions
Copy link
Contributor

github-actions bot commented Mar 2, 2026

SQLGlot Integration Test Results

Comparing:

  • this branch (sqlglot:RD-1147669-transpile-MAP_INSERT, sqlglot version: RD-1147669-transpile-MAP_INSERT)
  • baseline (main, sqlglot version: 29.0.2.dev29)

⚠️ Limited to dialects: duckdb

By Dialect

dialect main sqlglot:RD-1147669-transpile-MAP_INSERT transitions links
duckdb -> duckdb 4003/4003 passed (100.0%) 4003/4003 passed (100.0%) No change full result / delta

Overall

main: 4003 total, 4003 passed (pass rate: 100.0%), sqlglot version: 29.0.2.dev29

sqlglot:RD-1147669-transpile-MAP_INSERT: 4003 total, 4003 passed (pass rate: 100.0%), sqlglot version: RD-1147669-transpile-MAP_INSERT

Transitions:
No change

Copy link
Collaborator

@geooo109 geooo109 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's always test the queries on the engine, also let's include descriptions in the PRs.

Also, we should handle the updateFlag of MAP_INSERT right ? throwing an unsupported argument.

Comment on lines +2862 to +2864
# Snowflake's PARSE_JSON('NULL') uses uppercase, but DuckDB's JSON requires lowercase 'null'
if isinstance(arg, exp.Literal) and arg.is_string and arg.this == "NULL":
arg = exp.Literal.string("null")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't handle static cases at parse time. For example, if the input arg is an identifier we easily miss the handling because we don't know the value at parse time. Also, this change is irrelevant to this PR, let's remove it.

Comment on lines +2576 to +2602
self.validate_all(
"SELECT MAP_CONCAT(CAST({'a': 1, 'b': 2} AS MAP(TEXT, DECIMAL(38, 0))), MAP {'c': 3})",
read={
"snowflake": "SELECT MAP_INSERT({'a':1,'b':2}::MAP(VARCHAR,NUMBER),'c',3)",
},
write={
"duckdb": "SELECT MAP_CONCAT(CAST({'a': 1, 'b': 2} AS MAP(TEXT, DECIMAL(38, 0))), MAP {'c': 3})",
},
)
self.validate_all(
"SELECT MAP_CONCAT(CAST({'a': 1} AS MAP(TEXT, DECIMAL(38, 0))), MAP {'a': 99})",
read={
"snowflake": "SELECT MAP_INSERT({'a':1}::MAP(VARCHAR,NUMBER),'a',99,TRUE)",
},
write={
"duckdb": "SELECT MAP_CONCAT(CAST({'a': 1} AS MAP(TEXT, DECIMAL(38, 0))), MAP {'a': 99})",
},
)
self.validate_all(
"SELECT id, MAP_CONCAT(attrs, MAP {'new_key': 'new_value'}) AS attrs_with_insert FROM demo_maps",
read={
"snowflake": "SELECT id, MAP_INSERT(attrs, 'new_key', 'new_value') AS attrs_with_insert FROM demo_maps",
},
write={
"duckdb": "SELECT id, MAP_CONCAT(attrs, MAP {'new_key': 'new_value'}) AS attrs_with_insert FROM demo_maps",
},
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

D SELECT version();
┌─────────────┐
│ "version"() │
│   varchar   │
├─────────────┤
│ v1.4.4      │
└─────────────┘

D SELECT MAP_CONCAT(CAST({'a': 1, 'b': 2} AS MAP(TEXT, DECIMAL(38, 0))), MAP {'c': 3});
Invalid Input Error:
'value' type of map differs between arguments, expected 'MAP(VARCHAR, DECIMAL(38,0))', found 'MAP(VARCHAR, INTEGER)' instead

D SELECT MAP_CONCAT(CAST({'a': 1} AS MAP(TEXT, DECIMAL(38, 0))), MAP {'a': 99});
Invalid Input Error:
'value' type of map differs between arguments, expected 'MAP(VARCHAR, DECIMAL(38,0))', found 'MAP(VARCHAR, INTEGER)' instead

These ^ don't work, let's always run the tests against the engine in order to determine if they are correct.

Comment on lines +3723 to +3740
# Get the type of the original map by traversing nested MAP_INSERT expressions
def get_map_type(expr: exp.Expression) -> t.Optional[exp.DataType]:
if expr.type:
return expr.type
# Traverse through nested MAP_INSERT to find the original typed expression
if isinstance(expr, exp.MapInsert):
return get_map_type(expr.this)
return None

map_type = get_map_type(map_arg)

if value and map_type and map_type.expressions and len(map_type.expressions) > 1:
# Extract the value type from MAP(key_type, value_type)
value_type = map_type.expressions[1]
# Cast NULL values and non-literals to avoid type conflicts
# Literal numbers/booleans/strings can be inferred correctly by DuckDB
if isinstance(value, exp.Null) or not isinstance(value, exp.Literal):
value = exp.cast(value, value_type)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic fails for:

Snowflake input:
WITH map_cte AS (
    SELECT 
        1 AS id,
        {'a':1,'b':2}::MAP(VARCHAR, NUMBER) AS data
)

SELECT 
    id,
    data,
    MAP_INSERT(data, 'c', 3) AS updated_map
FROM map_cte;

right ?

Comment on lines +3723 to +3730
# Get the type of the original map by traversing nested MAP_INSERT expressions
def get_map_type(expr: exp.Expression) -> t.Optional[exp.DataType]:
if expr.type:
return expr.type
# Traverse through nested MAP_INSERT to find the original typed expression
if isinstance(expr, exp.MapInsert):
return get_map_type(expr.this)
return None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this? map_arg.type should give you what you want. I don't see why we treat nested MAP_INSERT calls as a special case.


map_type = get_map_type(map_arg)

if value and map_type and map_type.expressions and len(map_type.expressions) > 1:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moreover, I don't think this will be true in the general case. The reason is that we annotate Snowflake maps (and arrays, etc) as MAP, not MAP<K, V>, because these objects are polymorphic by default.

We should always add unit tests that cover the path where we infer types first, and then do sql(...), otherwise the code coverage is lacking.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants