Skip to content

[SPARK-30790][SQL] The dataType of map() should be map<null,null> #27542

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 4 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/sql-migration-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -216,7 +216,7 @@ license: |

- Since Spark 3.0, the `size` function returns `NULL` for the `NULL` input. In Spark version 2.4 and earlier, this function gives `-1` for the same input. To restore the behavior before Spark 3.0, you can set `spark.sql.legacy.sizeOfNull` to `true`.

- Since Spark 3.0, when the `array` function is called without any parameters, it returns an empty array of `NullType`. In Spark version 2.4 and earlier, it returns an empty array of string type. To restore the behavior before Spark 3.0, you can set `spark.sql.legacy.arrayDefaultToStringType.enabled` to `true`.
- Since Spark 3.0, when the `array`/`map` function is called without any parameters, it returns an empty collection with `NullType` as element type. In Spark version 2.4 and earlier, it returns an empty collection with `StringType` as element type. To restore the behavior before Spark 3.0, you can set `spark.sql.legacy.createEmptyCollectionUsingStringType` to `true`.

- Since Spark 3.0, the interval literal syntax does not allow multiple from-to units anymore. For example, `SELECT INTERVAL '1-1' YEAR TO MONTH '2-2' YEAR TO MONTH'` throws parser exception.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ case class CreateArray(children: Seq[Expression]) extends Expression {
}

private val defaultElementType: DataType = {
if (SQLConf.get.getConf(SQLConf.LEGACY_ARRAY_DEFAULT_TO_STRING)) {
if (SQLConf.get.getConf(SQLConf.LEGACY_CREATE_EMPTY_COLLECTION_USING_STRING_TYPE)) {
StringType
} else {
NullType
Expand Down Expand Up @@ -145,6 +145,14 @@ case class CreateMap(children: Seq[Expression]) extends Expression {
lazy val keys = children.indices.filter(_ % 2 == 0).map(children)
lazy val values = children.indices.filter(_ % 2 != 0).map(children)

private val defaultElementType: DataType = {
if (SQLConf.get.getConf(SQLConf.LEGACY_CREATE_EMPTY_COLLECTION_USING_STRING_TYPE)) {
StringType
} else {
NullType
}
}

override def foldable: Boolean = children.forall(_.foldable)

override def checkInputDataTypes(): TypeCheckResult = {
Expand All @@ -167,9 +175,9 @@ case class CreateMap(children: Seq[Expression]) extends Expression {
override lazy val dataType: MapType = {
MapType(
keyType = TypeCoercion.findCommonTypeDifferentOnlyInNullFlags(keys.map(_.dataType))
.getOrElse(StringType),
.getOrElse(defaultElementType),
valueType = TypeCoercion.findCommonTypeDifferentOnlyInNullFlags(values.map(_.dataType))
.getOrElse(StringType),
.getOrElse(defaultElementType),
valueContainsNull = values.exists(_.nullable))
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -29,12 +29,11 @@ import org.apache.spark.unsafe.array.ByteArrayMethods
*/
class ArrayBasedMapBuilder(keyType: DataType, valueType: DataType) extends Serializable {
assert(!keyType.existsRecursively(_.isInstanceOf[MapType]), "key of map cannot be/contain map")
assert(keyType != NullType, "map key cannot be null type.")

private lazy val keyToIndex = keyType match {
// Binary type data is `byte[]`, which can't use `==` to check equality.
case _: AtomicType | _: CalendarIntervalType if !keyType.isInstanceOf[BinaryType] =>
new java.util.HashMap[Any, Int]()
case _: AtomicType | _: CalendarIntervalType | _: NullType
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is done to handle this ->
scala.MatchError: NullType (of class org.apache.spark.sql.types.NullType$)

if !keyType.isInstanceOf[BinaryType] => new java.util.HashMap[Any, Int]()
case _ =>
// for complex types, use interpreted ordering to be able to compare unsafe data with safe
// data, e.g. UnsafeRow vs GenericInternalRow.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2007,12 +2007,12 @@ object SQLConf {
.booleanConf
.createWithDefault(false)

val LEGACY_ARRAY_DEFAULT_TO_STRING =
buildConf("spark.sql.legacy.arrayDefaultToStringType.enabled")
val LEGACY_CREATE_EMPTY_COLLECTION_USING_STRING_TYPE =
buildConf("spark.sql.legacy.createEmptyCollectionUsingStringType")
.internal()
.doc("When set to true, it returns an empty array of string type when the `array` " +
"function is called without any parameters. Otherwise, it returns an empty " +
"array of `NullType`")
.doc("When set to true, Spark returns an empty collection with `StringType` as element " +
"type if the `array`/`map` function is called without any parameters. Otherwise, Spark " +
"returns an empty collection with `NullType` as element type.")
.booleanConf
.createWithDefault(false)

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3499,13 +3499,6 @@ class DataFrameFunctionsSuite extends QueryTest with SharedSparkSession {
).foreach(assertValuesDoNotChangeAfterCoalesceOrUnion(_))
}

test("SPARK-21281 use string types by default if map have no argument") {
val ds = spark.range(1)
var expectedSchema = new StructType()
.add("x", MapType(StringType, StringType, valueContainsNull = false), nullable = false)
assert(ds.select(map().as("x")).schema == expectedSchema)
}

test("SPARK-21281 fails if functions have no argument") {
val df = Seq(1).toDF("a")

Expand Down Expand Up @@ -3578,14 +3571,30 @@ class DataFrameFunctionsSuite extends QueryTest with SharedSparkSession {
test("SPARK-29462: Empty array of NullType for array function with no arguments") {
Seq((true, StringType), (false, NullType)).foreach {
case (arrayDefaultToString, expectedType) =>
withSQLConf(SQLConf.LEGACY_ARRAY_DEFAULT_TO_STRING.key -> arrayDefaultToString.toString) {
withSQLConf(SQLConf.LEGACY_CREATE_EMPTY_COLLECTION_USING_STRING_TYPE.key ->
arrayDefaultToString.toString) {
val schema = spark.range(1).select(array()).schema
assert(schema.nonEmpty && schema.head.dataType.isInstanceOf[ArrayType])
val actualType = schema.head.dataType.asInstanceOf[ArrayType].elementType
assert(actualType === expectedType)
}
}
}

test("SPARK-30790: Empty map with NullType as key/value type for map function with no argument") {
Seq((true, StringType), (false, NullType)).foreach {
case (mapDefaultToString, expectedType) =>
withSQLConf(SQLConf.LEGACY_CREATE_EMPTY_COLLECTION_USING_STRING_TYPE.key ->
mapDefaultToString.toString) {
val schema = spark.range(1).select(map()).schema
assert(schema.nonEmpty && schema.head.dataType.isInstanceOf[MapType])
val actualKeyType = schema.head.dataType.asInstanceOf[MapType].keyType
val actualValueType = schema.head.dataType.asInstanceOf[MapType].valueType
assert(actualKeyType === expectedType)
assert(actualValueType === expectedType)
}
}
}
}

object DataFrameFunctionsSuite {
Expand Down