Skip to content

[SPARK-14000][SQL] case class with a tuple field can't work in Dataset #11816

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -596,7 +596,10 @@ class Analyzer(
exprs.exists(_.collect { case _: Star => true }.nonEmpty)
}

private def resolveExpression(expr: Expression, plan: LogicalPlan, throws: Boolean = false) = {
protected[sql] def resolveExpression(
expr: Expression,
plan: LogicalPlan,
throws: Boolean = false) = {
// Resolve expression in one round.
// If throws == false or the desired attribute doesn't exist
// (like try to resolve `a.b` but `a` doesn't exist), fail and return the origin one.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -282,8 +282,14 @@ case class ExpressionEncoder[T](
// If we have nested tuple, the `fromRowExpression` will contains `GetStructField` instead of
// `UnresolvedExtractValue`, so we need to check if their ordinals are all valid.
// Note that, `BoundReference` contains the expected type, but here we need the actual type, so
// we unbound it by the given `schema` and propagate the actual type to `GetStructField`.
val unbound = fromRowExpression transform {
// we unbound it by the given `schema` and propagate the actual type to `GetStructField`, after
// we resolve the `fromRowExpression`.
val resolved = SimpleAnalyzer.resolveExpression(
fromRowExpression,
LocalRelation(schema),
throws = true)

val unbound = resolved transform {
case b: BoundReference => schema(b.ordinal)
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -110,7 +110,12 @@ case class GetStructField(child: Expression, ordinal: Int, name: Option[String]

override def dataType: DataType = childSchema(ordinal).dataType
override def nullable: Boolean = child.nullable || childSchema(ordinal).nullable
override def toString: String = s"$child.${name.getOrElse(childSchema(ordinal).name)}"

override def toString: String = {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kind of unrelated change, toString should be able to run even this expression is unresolved.

val fieldName = if (resolved) childSchema(ordinal).name else s"_$ordinal"
s"$child.${name.getOrElse(fieldName)}"
}

override def sql: String =
child.sql + s".${quoteIdentifier(name.getOrElse(childSchema(ordinal).name))}"

Expand Down
13 changes: 11 additions & 2 deletions sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala
Original file line number Diff line number Diff line change
Expand Up @@ -27,8 +27,6 @@ import org.apache.spark.sql.functions._
import org.apache.spark.sql.test.SharedSQLContext
import org.apache.spark.sql.types.{IntegerType, StringType, StructField, StructType}

case class OtherTuple(_1: String, _2: Int)

class DatasetSuite extends QueryTest with SharedSQLContext {
import testImplicits._

Expand Down Expand Up @@ -636,8 +634,19 @@ class DatasetSuite extends QueryTest with SharedSQLContext {
Seq(OuterObject.InnerClass("foo")).toDS(),
OuterObject.InnerClass("foo"))
}

test("SPARK-14000: case class with tuple type field") {
checkDataset(
Seq(TupleClass((1, "a"))).toDS(),
TupleClass(1, "a")
)
}
}

case class OtherTuple(_1: String, _2: Int)

case class TupleClass(data: (Int, String))

class OuterClass extends Serializable {
case class InnerClass(a: String)
}
Expand Down