[SPARK-30790][SQL] The dataType of map() should be map<null,null> #27542

iRakson · 2020-02-11T16:11:09Z

What changes were proposed in this pull request?

spark.sql("select map()") returns {}.

After these changes it will return map<null,null>

Why are the changes needed?

After changes introduced due to #27521, it is important to maintain consistency while using map().

Does this PR introduce any user-facing change?

Yes. Now map() will give map<null,null> instead of {}.

How was this patch tested?

UT added. Migration guide updated as well

[SPARK-30790]map() should return map<null,null>

cloud-fan · 2020-02-11T16:13:41Z

ok to test

cloud-fan · 2020-02-11T16:16:34Z

docs/sql-migration-guide.md

@@ -218,6 +218,8 @@ license: |

  - Since Spark 3.0, when the `array` function is called without any parameters, it returns an empty array of `NullType`. In Spark version 2.4 and earlier, it returns an empty array of string type. To restore the behavior before Spark 3.0, you can set `spark.sql.legacy.arrayDefaultToStringType.enabled` to `true`.

+  - Since Spark 3.0, when the `map` function is called without any parameters, it returns an empty map of `NullType`. In Spark version 2.4 and earlier, it returns an empty map of string type. To restore the behavior before Spark 3.0, you can set `spark.sql.legacy.mapDefaultToStringType.enabled` to `true`.


an empty map with NullType as key/value type.

cloud-fan · 2020-02-11T16:16:52Z

sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala

+  val LEGACY_MAP_DEFAULT_TO_STRING =
+    buildConf("spark.sql.legacy.mapDefaultToStringType.enabled")
+      .internal()
+      .doc("When set to true, it returns an empty map of string type when the `map` " +


cloud-fan · 2020-02-11T16:17:44Z

sql/core/src/test/scala/org/apache/spark/sql/DataFrameFunctionsSuite.scala

-    var expectedSchema = new StructType()
-      .add("x", MapType(StringType, StringType, valueContainsNull = false), nullable = false)
-    assert(ds.select(map().as("x")).schema == expectedSchema)
+  test("SPARK-30790: Empty map of <NullType,NullType> for map function with no arguments") {


empty map with NullType as key/value type

can we move it closer to the array test?

MaxGekk · 2020-02-11T16:20:57Z

docs/sql-migration-guide.md

@@ -218,6 +218,8 @@ license: |

  - Since Spark 3.0, when the `array` function is called without any parameters, it returns an empty array of `NullType`. In Spark version 2.4 and earlier, it returns an empty array of string type. To restore the behavior before Spark 3.0, you can set `spark.sql.legacy.arrayDefaultToStringType.enabled` to `true`.

+  - Since Spark 3.0, when the `map` function is called without any parameters, it returns an empty map of `NullType`. In Spark version 2.4 and earlier, it returns an empty map of string type. To restore the behavior before Spark 3.0, you can set `spark.sql.legacy.mapDefaultToStringType.enabled` to `true`.


Maybe we should say that map's keys and values have either NullType or StringType.

MaxGekk · 2020-02-11T16:35:57Z

sql/core/src/test/scala/org/apache/spark/sql/DataFrameFunctionsSuite.scala

+          val schema = spark.range(1).select(map()).schema
+          assert(schema.nonEmpty && schema.head.dataType.isInstanceOf[MapType])
+          val actualKeyType = schema.head.dataType.asInstanceOf[MapType].keyType
+          val actualValueType = schema.head.dataType.asInstanceOf[MapType].valueType
+          assert(actualKeyType === expectedType && actualValueType === expectedType)


nit:

val schema = spark.range(1).select(map()).schema val StructType(Array(StructField(_, MapType(keyType, valueType, _), _, _))) = schema assert(keyType === expectedType) assert(valueType === expectedType)

SparkQA · 2020-02-11T16:50:08Z

Test build #118254 has finished for PR 27542 at commit 395eee8.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

maropu · 2020-02-12T00:46:57Z

Just to check; we need independent legacy configs for arrays and maps?

iRakson · 2020-02-12T04:45:05Z

Just to check; we need independent legacy configs for arrays and maps?

That is exactly my doubt as well. As per my opinion we should have a single config for both of them.

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala

cloud-fan · 2020-02-12T06:14:19Z

how about a single config spark.sql.legacy.createEmptyCollectionUsingStringType?

iRakson · 2020-02-12T06:18:11Z

how about a single config spark.sql.legacy.createEmptyCollectionUsingStringType?

Seems fine to me. I will update code.

iRakson · 2020-02-12T09:50:27Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ArrayBasedMapBuilder.scala


  private lazy val keyToIndex = keyType match {
    // Binary type data is `byte[]`, which can't use `==` to check equality.
-    case _: AtomicType | _: CalendarIntervalType if !keyType.isInstanceOf[BinaryType] =>
-      new java.util.HashMap[Any, Int]()
+    case _: AtomicType | _: CalendarIntervalType | _: NullType


This is done to handle this ->
scala.MatchError: NullType (of class org.apache.spark.sql.types.NullType$)

cloud-fan · 2020-02-12T10:54:20Z

docs/sql-migration-guide.md

-  - Since Spark 3.0, when the `array` function is called without any parameters, it returns an empty array of `NullType`. In Spark version 2.4 and earlier, it returns an empty array of string type. To restore the behavior before Spark 3.0, you can set `spark.sql.legacy.arrayDefaultToStringType.enabled` to `true`.
+  - Since Spark 3.0, when the `array` function is called without any parameters, it returns an empty array of `NullType`. In Spark version 2.4 and earlier, it returns an empty array of string type. To restore the behavior before Spark 3.0, you can set `spark.sql.legacy.createEmptyCollectionUsingStringType.enabled` to `true`.
+
+  - Since Spark 3.0, when the `map` function is called without any parameters, it returns an empty map with `NullType` as key/value type. In Spark version 2.4 and earlier, it returns an empty map of string type. To restore the behavior before Spark 3.0, you can set `spark.sql.legacy.createEmptyCollectionUsingStringType.enabled` to `true`.


can we combine the 2 items?

Since Spark 3.0, when the `array`/`map` function is called without any parameters, it returns an empty collection with `NullType` as element type. ...

Migration guide is updated.

SparkQA · 2020-02-12T14:29:49Z

Test build #118291 has finished for PR 27542 at commit 3319d3a.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2020-02-12T15:57:03Z

Test build #118296 has finished for PR 27542 at commit 1a066e1.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2020-02-12T16:53:31Z

sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala

-  val LEGACY_ARRAY_DEFAULT_TO_STRING =
-    buildConf("spark.sql.legacy.arrayDefaultToStringType.enabled")
+  val LEGACY_CREATE_EMPTY_COLLECTION_USING_STRING_TYPE =
+    buildConf("spark.sql.legacy.createEmptyCollectionUsingStringType.enabled")


I've sent a config naming proposal to dev list, and createEmptyCollectionUsingStringType is a verb so I think we don't need the .enabled postfix.

cloud-fan · 2020-02-12T16:55:31Z

sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala

-      .doc("When set to true, it returns an empty array of string type when the `array` " +
-        "function is called without any parameters. Otherwise, it returns an empty " +
-        "array of `NullType`")
+      .doc("When set to true, it returns an empty array of string type and an empty map with " +


let's match the migration guide:

When set to true, Spark returns an empty collection with `StringType` as element type if the `array`/`map` function is called without any parameters. Otherwise, Spark returns an empty collection with `NullType` as element type.

SparkQA · 2020-02-12T19:00:36Z

Test build #118315 has finished for PR 27542 at commit e873527.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

maropu · 2020-02-12T21:53:01Z

retest this please

SparkQA · 2020-02-13T02:39:56Z

Test build #118322 has finished for PR 27542 at commit e873527.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2020-02-13T04:24:04Z

thanks, merging to master/3.0!

### What changes were proposed in this pull request? `spark.sql("select map()")` returns {}. After these changes it will return map<null,null> ### Why are the changes needed? After changes introduced due to #27521, it is important to maintain consistency while using map(). ### Does this PR introduce any user-facing change? Yes. Now map() will give map<null,null> instead of {}. ### How was this patch tested? UT added. Migration guide updated as well Closes #27542 from iRakson/SPARK-30790. Authored-by: iRakson <raksonrakesh@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit 926e3a1) Signed-off-by: Wenchen Fan <wenchen@databricks.com>

### What changes were proposed in this pull request? After #27542, `map()` returns `map<null, null>` instead of `map<string, string>`. However, this breaks queries which union `map()` and other maps. The reason is, `TypeCoercion` rules and `Cast` think it's illegal to cast null type map key to other types, as it makes the key nullable, but it's actually legal. This PR fixes it. ### Why are the changes needed? To avoid breaking queries. ### Does this PR introduce any user-facing change? Yes, now some queries that work in 2.x can work in 3.0 as well. ### How was this patch tested? new test Closes #27926 from cloud-fan/bug. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>

After #27542, `map()` returns `map<null, null>` instead of `map<string, string>`. However, this breaks queries which union `map()` and other maps. The reason is, `TypeCoercion` rules and `Cast` think it's illegal to cast null type map key to other types, as it makes the key nullable, but it's actually legal. This PR fixes it. To avoid breaking queries. Yes, now some queries that work in 2.x can work in 3.0 as well. new test Closes #27926 from cloud-fan/bug. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit d7b97a1) Signed-off-by: Wenchen Fan <wenchen@databricks.com>

### What changes were proposed in this pull request? `spark.sql("select map()")` returns {}. After these changes it will return map<null,null> ### Why are the changes needed? After changes introduced due to apache#27521, it is important to maintain consistency while using map(). ### Does this PR introduce any user-facing change? Yes. Now map() will give map<null,null> instead of {}. ### How was this patch tested? UT added. Migration guide updated as well Closes apache#27542 from iRakson/SPARK-30790. Authored-by: iRakson <raksonrakesh@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>

### What changes were proposed in this pull request? After apache#27542, `map()` returns `map<null, null>` instead of `map<string, string>`. However, this breaks queries which union `map()` and other maps. The reason is, `TypeCoercion` rules and `Cast` think it's illegal to cast null type map key to other types, as it makes the key nullable, but it's actually legal. This PR fixes it. ### Why are the changes needed? To avoid breaking queries. ### Does this PR introduce any user-facing change? Yes, now some queries that work in 2.x can work in 3.0 as well. ### How was this patch tested? new test Closes apache#27926 from cloud-fan/bug. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>

[SPARK-30790]map() should return map<null,null>

395eee8

[SPARK-30790]map() should return map<null,null>

iRakson requested review from HyukjinKwon and cloud-fan February 11, 2020 16:11

cloud-fan reviewed Feb 11, 2020

View reviewed changes

MaxGekk reviewed Feb 11, 2020

View reviewed changes

iRakson commented Feb 12, 2020

View reviewed changes

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala Show resolved Hide resolved

Fix

3319d3a

iRakson commented Feb 12, 2020

View reviewed changes

iRakson requested a review from cloud-fan February 12, 2020 09:50

cloud-fan reviewed Feb 12, 2020

View reviewed changes

Fix migration guide

1a066e1

cloud-fan reviewed Feb 12, 2020

View reviewed changes

Review comment fix

e873527

iRakson requested a review from cloud-fan February 12, 2020 18:06

maropu approved these changes Feb 12, 2020

View reviewed changes

HyukjinKwon approved these changes Feb 13, 2020

View reviewed changes

dongjoon-hyun changed the title ~~[SPARK-30790]The dataType of map() should be map<null,null>~~ [SPARK-30790][SQL] The dataType of map() should be map<null,null> Feb 13, 2020

dongjoon-hyun added the SQL label Feb 13, 2020

dongjoon-hyun approved these changes Feb 13, 2020

View reviewed changes

cloud-fan closed this in 926e3a1 Feb 13, 2020

cloud-fan mentioned this pull request Mar 16, 2020

[SPARK-31166][SQL] UNION map<null, null> and other maps should not fail #27926

Closed

		@@ -218,6 +218,8 @@ license: \|

		- Since Spark 3.0, when the `array` function is called without any parameters, it returns an empty array of `NullType`. In Spark version 2.4 and earlier, it returns an empty array of string type. To restore the behavior before Spark 3.0, you can set `spark.sql.legacy.arrayDefaultToStringType.enabled` to `true`.

		- Since Spark 3.0, when the `map` function is called without any parameters, it returns an empty map of `NullType`. In Spark version 2.4 and earlier, it returns an empty map of string type. To restore the behavior before Spark 3.0, you can set `spark.sql.legacy.mapDefaultToStringType.enabled` to `true`.

[SPARK-30790][SQL] The dataType of map() should be map<null,null> #27542

[SPARK-30790][SQL] The dataType of map() should be map<null,null> #27542

Uh oh!

Conversation

iRakson commented Feb 11, 2020

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

cloud-fan commented Feb 11, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Feb 11, 2020

Uh oh!

maropu commented Feb 12, 2020

Uh oh!

iRakson commented Feb 12, 2020

Uh oh!

Uh oh!

cloud-fan commented Feb 12, 2020

Uh oh!

iRakson commented Feb 12, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Feb 12, 2020

Uh oh!

SparkQA commented Feb 12, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Feb 12, 2020

Uh oh!

maropu commented Feb 12, 2020

Uh oh!

SparkQA commented Feb 13, 2020

Uh oh!

cloud-fan commented Feb 13, 2020

Uh oh!

Uh oh!