-
Notifications
You must be signed in to change notification settings - Fork 28.6k
[SPARK-30790][SQL] The dataType of map() should be map<null,null> #27542
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
[SPARK-30790]map() should return map<null,null>
ok to test |
docs/sql-migration-guide.md
Outdated
@@ -218,6 +218,8 @@ license: | | |||
|
|||
- Since Spark 3.0, when the `array` function is called without any parameters, it returns an empty array of `NullType`. In Spark version 2.4 and earlier, it returns an empty array of string type. To restore the behavior before Spark 3.0, you can set `spark.sql.legacy.arrayDefaultToStringType.enabled` to `true`. | |||
|
|||
- Since Spark 3.0, when the `map` function is called without any parameters, it returns an empty map of `NullType`. In Spark version 2.4 and earlier, it returns an empty map of string type. To restore the behavior before Spark 3.0, you can set `spark.sql.legacy.mapDefaultToStringType.enabled` to `true`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
an empty map with NullType
as key/value type.
val LEGACY_MAP_DEFAULT_TO_STRING = | ||
buildConf("spark.sql.legacy.mapDefaultToStringType.enabled") | ||
.internal() | ||
.doc("When set to true, it returns an empty map of string type when the `map` " + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
var expectedSchema = new StructType() | ||
.add("x", MapType(StringType, StringType, valueContainsNull = false), nullable = false) | ||
assert(ds.select(map().as("x")).schema == expectedSchema) | ||
test("SPARK-30790: Empty map of <NullType,NullType> for map function with no arguments") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
empty map with NullType as key/value type
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we move it closer to the array test?
docs/sql-migration-guide.md
Outdated
@@ -218,6 +218,8 @@ license: | | |||
|
|||
- Since Spark 3.0, when the `array` function is called without any parameters, it returns an empty array of `NullType`. In Spark version 2.4 and earlier, it returns an empty array of string type. To restore the behavior before Spark 3.0, you can set `spark.sql.legacy.arrayDefaultToStringType.enabled` to `true`. | |||
|
|||
- Since Spark 3.0, when the `map` function is called without any parameters, it returns an empty map of `NullType`. In Spark version 2.4 and earlier, it returns an empty map of string type. To restore the behavior before Spark 3.0, you can set `spark.sql.legacy.mapDefaultToStringType.enabled` to `true`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should say that map's keys and values have either NullType
or StringType
.
val schema = spark.range(1).select(map()).schema | ||
assert(schema.nonEmpty && schema.head.dataType.isInstanceOf[MapType]) | ||
val actualKeyType = schema.head.dataType.asInstanceOf[MapType].keyType | ||
val actualValueType = schema.head.dataType.asInstanceOf[MapType].valueType | ||
assert(actualKeyType === expectedType && actualValueType === expectedType) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
val schema = spark.range(1).select(map()).schema
val StructType(Array(StructField(_, MapType(keyType, valueType, _), _, _))) = schema
assert(keyType === expectedType)
assert(valueType === expectedType)
Test build #118254 has finished for PR 27542 at commit
|
Just to check; we need independent legacy configs for arrays and maps? |
That is exactly my doubt as well. As per my opinion we should have a single config for both of them. |
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala
Show resolved
Hide resolved
how about a single config |
Seems fine to me. I will update code. |
|
||
private lazy val keyToIndex = keyType match { | ||
// Binary type data is `byte[]`, which can't use `==` to check equality. | ||
case _: AtomicType | _: CalendarIntervalType if !keyType.isInstanceOf[BinaryType] => | ||
new java.util.HashMap[Any, Int]() | ||
case _: AtomicType | _: CalendarIntervalType | _: NullType |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is done to handle this ->
scala.MatchError: NullType (of class org.apache.spark.sql.types.NullType$)
docs/sql-migration-guide.md
Outdated
- Since Spark 3.0, when the `array` function is called without any parameters, it returns an empty array of `NullType`. In Spark version 2.4 and earlier, it returns an empty array of string type. To restore the behavior before Spark 3.0, you can set `spark.sql.legacy.arrayDefaultToStringType.enabled` to `true`. | ||
- Since Spark 3.0, when the `array` function is called without any parameters, it returns an empty array of `NullType`. In Spark version 2.4 and earlier, it returns an empty array of string type. To restore the behavior before Spark 3.0, you can set `spark.sql.legacy.createEmptyCollectionUsingStringType.enabled` to `true`. | ||
|
||
- Since Spark 3.0, when the `map` function is called without any parameters, it returns an empty map with `NullType` as key/value type. In Spark version 2.4 and earlier, it returns an empty map of string type. To restore the behavior before Spark 3.0, you can set `spark.sql.legacy.createEmptyCollectionUsingStringType.enabled` to `true`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we combine the 2 items?
Since Spark 3.0, when the `array`/`map` function is called without any parameters, it returns an empty collection with `NullType` as element type. ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Migration guide is updated.
Test build #118291 has finished for PR 27542 at commit
|
Test build #118296 has finished for PR 27542 at commit
|
val LEGACY_ARRAY_DEFAULT_TO_STRING = | ||
buildConf("spark.sql.legacy.arrayDefaultToStringType.enabled") | ||
val LEGACY_CREATE_EMPTY_COLLECTION_USING_STRING_TYPE = | ||
buildConf("spark.sql.legacy.createEmptyCollectionUsingStringType.enabled") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've sent a config naming proposal to dev list, and createEmptyCollectionUsingStringType
is a verb so I think we don't need the .enabled
postfix.
.doc("When set to true, it returns an empty array of string type when the `array` " + | ||
"function is called without any parameters. Otherwise, it returns an empty " + | ||
"array of `NullType`") | ||
.doc("When set to true, it returns an empty array of string type and an empty map with " + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's match the migration guide:
When set to true, Spark returns an empty collection with `StringType` as element type
if the `array`/`map` function is called without any parameters. Otherwise, Spark returns
an empty collection with `NullType` as element type.
Test build #118315 has finished for PR 27542 at commit
|
retest this please |
Test build #118322 has finished for PR 27542 at commit
|
thanks, merging to master/3.0! |
### What changes were proposed in this pull request? `spark.sql("select map()")` returns {}. After these changes it will return map<null,null> ### Why are the changes needed? After changes introduced due to #27521, it is important to maintain consistency while using map(). ### Does this PR introduce any user-facing change? Yes. Now map() will give map<null,null> instead of {}. ### How was this patch tested? UT added. Migration guide updated as well Closes #27542 from iRakson/SPARK-30790. Authored-by: iRakson <raksonrakesh@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit 926e3a1) Signed-off-by: Wenchen Fan <wenchen@databricks.com>
### What changes were proposed in this pull request? After #27542, `map()` returns `map<null, null>` instead of `map<string, string>`. However, this breaks queries which union `map()` and other maps. The reason is, `TypeCoercion` rules and `Cast` think it's illegal to cast null type map key to other types, as it makes the key nullable, but it's actually legal. This PR fixes it. ### Why are the changes needed? To avoid breaking queries. ### Does this PR introduce any user-facing change? Yes, now some queries that work in 2.x can work in 3.0 as well. ### How was this patch tested? new test Closes #27926 from cloud-fan/bug. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
After #27542, `map()` returns `map<null, null>` instead of `map<string, string>`. However, this breaks queries which union `map()` and other maps. The reason is, `TypeCoercion` rules and `Cast` think it's illegal to cast null type map key to other types, as it makes the key nullable, but it's actually legal. This PR fixes it. To avoid breaking queries. Yes, now some queries that work in 2.x can work in 3.0 as well. new test Closes #27926 from cloud-fan/bug. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit d7b97a1) Signed-off-by: Wenchen Fan <wenchen@databricks.com>
### What changes were proposed in this pull request? `spark.sql("select map()")` returns {}. After these changes it will return map<null,null> ### Why are the changes needed? After changes introduced due to apache#27521, it is important to maintain consistency while using map(). ### Does this PR introduce any user-facing change? Yes. Now map() will give map<null,null> instead of {}. ### How was this patch tested? UT added. Migration guide updated as well Closes apache#27542 from iRakson/SPARK-30790. Authored-by: iRakson <raksonrakesh@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
### What changes were proposed in this pull request? After apache#27542, `map()` returns `map<null, null>` instead of `map<string, string>`. However, this breaks queries which union `map()` and other maps. The reason is, `TypeCoercion` rules and `Cast` think it's illegal to cast null type map key to other types, as it makes the key nullable, but it's actually legal. This PR fixes it. ### Why are the changes needed? To avoid breaking queries. ### Does this PR introduce any user-facing change? Yes, now some queries that work in 2.x can work in 3.0 as well. ### How was this patch tested? new test Closes apache#27926 from cloud-fan/bug. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
What changes were proposed in this pull request?
spark.sql("select map()")
returns {}.After these changes it will return map<null,null>
Why are the changes needed?
After changes introduced due to #27521, it is important to maintain consistency while using map().
Does this PR introduce any user-facing change?
Yes. Now map() will give map<null,null> instead of {}.
How was this patch tested?
UT added. Migration guide updated as well