Skip to content

[native_datafusion] Spark SQL failure "select nested field from a complex map key using map_keys" #1754

@andygrove

Description

@andygrove

Describe the bug

SELECT map_keys(map1) FROM tbl works.
SELECT map_keys(map1).id2 FROM tbl does not work.

Repro:

  ignore("read map[struct, struct] from parquet") {
    assume(usingDataSourceExec(conf))

    withTempPath { dir =>
      // create input file with Comet disabled
      withSQLConf(CometConf.COMET_ENABLED.key -> "false") {
        val df = spark
          .range(5)
          .withColumn("id2", col("id"))
          .withColumn("id3", col("id"))
          // Spark does not allow null as a key but does allow null as a
          // value, and the entire map be null
          .select(
            when(col("id") > 1, map(struct(col("id"), col("id2"), col("id3")), when(col("id") > 2,
              struct(col("id"), col("id2"), col("id3"))))).alias("map1"))
        df.write.parquet(dir.toString())
      }

      Seq("", "parquet").foreach { v1List =>
        withSQLConf(SQLConf.USE_V1_SOURCE_LIST.key -> v1List) {
          val df = spark.read.parquet(dir.toString())
          df.createOrReplaceTempView("tbl")
          if (v1List.isEmpty) {
            checkSparkAnswer(df.select("map1"))
          } else {
            checkSparkAnswerAndOperator(df.select("map1"))
          }
          checkSparkAnswer(spark.sql("SELECT map_keys(map1).id2 FROM tbl"))
        }
      }
    }
  }

Error:

org.apache.comet.CometNativeException: Invalid argument error: column types must match schema types, 

Expected

Map(Field { name: "entries", data_type: Struct([
  Field { name: "key", data_type: Struct([
    Field { name: "id2", data_type: Int64 }
  ]) }, 
  Field { name: "value", data_type: Struct([
    Field { name: "id", data_type: Int64 }, 
    Field { name: "id2", data_type: Int64 }, 
    Field { name: "id3", data_type: Int64 }
  ]) }
]) }, false) 

Found

Map(Field { name: "key_value", data_type: Struct([
  Field { name: "key", data_type: Struct([
    Field { name: "id", data_type: Int64 }, 
    Field { name: "id2", data_type: Int64 }, 
    Field { name: "id3", data_type: Int64 }]) }, 
  Field { name: "value", data_type: Struct([
    Field { name: "id", data_type: Int64 }, 
    Field { name: "id2", data_type: Int64 }, 
    Field { name: "id3", data_type: Int64 }
  ]) }
]) }, false) at column index 0

Steps to reproduce

No response

Expected behavior

No response

Additional context

No response

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions