Skip to content

Conversation

@peterxcli
Copy link
Member

Which issue does this PR close?

Closes: #3164

Rationale for this change

Comet does not currently support the Spark map_contains_key function, causing queries using this function to fall back to Spark's JVM execution instead of running natively on DataFusion.

The MapContainsKey expression checks whether a given key exists in a map. It is implemented as a runtime-replaceable expression that internally uses ArrayContains on the map's keys to perform the lookup.

Supporting this expression would allow more Spark workloads to benefit from Comet's native acceleration.

What changes are included in this PR?

How are these changes tested?

  • gen map type data and use that to verify the correctness of map_contains_key
  • Empty-map tests

empty map tests in spark:

Copilot AI review requested due to automatic review settings February 3, 2026 02:20
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@peterxcli peterxcli changed the title [Feature] Support Spark expression: map_contains_key feat: support map_contains_key expression Feb 3, 2026
@codecov-commenter
Copy link

codecov-commenter commented Feb 3, 2026

Codecov Report

❌ Patch coverage is 37.50000% with 5 lines in your changes missing coverage. Please review.
✅ Project coverage is 59.98%. Comparing base (f09f8af) to head (105cadd).
⚠️ Report is 918 commits behind head on main.

Files with missing lines Patch % Lines
...k/src/main/scala/org/apache/comet/serde/maps.scala 16.66% 5 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main    #3369      +/-   ##
============================================
+ Coverage     56.12%   59.98%   +3.85%     
- Complexity      976     1475     +499     
============================================
  Files           119      175      +56     
  Lines         11743    16172    +4429     
  Branches       2251     2681     +430     
============================================
+ Hits           6591     9701    +3110     
- Misses         4012     5119    +1107     
- Partials       1140     1352     +212     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.


val mapKeysExpr = scalarFunctionExprToProto("map_keys", mapExpr)

val mapContainsKeyExpr = scalarFunctionExprToProto("array_has", mapKeysExpr, keyExpr)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

100,
schemaGenOptions,
dataGenOptions)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor : why are we disabling comet here ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry I didnt read what it means, and I was just copying from "read map[int, int] from parquet" test.

but I think im going to remove this test and write sql test directly as #3369 (comment)


checkSparkAnswer(
spark.sql("SELECT map_contains_key(c14, element_at(map_keys(c14), 1)) FROM t1"))
checkSparkAnswer(spark.sql("SELECT map_contains_key(c14, 999999) FROM t1"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

checkSparkAnswerAndOperator checks if all the operators are comet native vs checkSparkAnswer which only validates the result DF. Any reason why we used the latter ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Copy link
Contributor

@coderfender coderfender left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left test related comments


// Empty map with string keys
checkSparkAnswerAndOperator(spark.sql(
"SELECT map_contains_key(map_from_arrays(CAST(array() AS array<string>), CAST(array() AS array<double>)), 'key') FROM t1"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might want to add additional tests to verify complex keys , literals and literals ?

Copy link
Member Author

@peterxcli peterxcli Feb 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying my best to replicate the scala test suit in spark:

but I notice they also have some sql test for this function, let me merge all of them into the slt

@mbutrovich
Copy link
Contributor

Thanks @peterxcli! Can we add/migrate the tests to the new framework?

https://datafusion.apache.org/comet/contributor-guide/sql-file-tests.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature] Support Spark expression: map_contains_key

5 participants