Skip to content

[SPARK-16853][SQL] fixes encoder error in DataSet typed select #14474

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from

Conversation

clockfly
Copy link
Contributor

@clockfly clockfly commented Aug 3, 2016

What changes were proposed in this pull request?

For DataSet typed select:

def select[U1: Encoder](c1: TypedColumn[T, U1]): Dataset[U1]

If type T is a case class or a tuple class that is not atomic, the resulting logical plan's schema will mismatch with Dataset[T] encoder's schema, which will cause encoder error and throw AnalysisException.

Before change:

scala> case class A(a: Int, b: Int)
scala> Seq((0, A(1,2))).toDS.select($"_2".as[A])
org.apache.spark.sql.AnalysisException: cannot resolve '`a`' given input columns: [_2];
..

After change:

scala> case class A(a: Int, b: Int)
scala> Seq((0, A(1,2))).toDS.select($"_2".as[A]).show
+---+---+
|  a|  b|
+---+---+
|  1|  2|
+---+---+

How was this patch tested?

Unit test.

@SparkQA
Copy link

SparkQA commented Aug 3, 2016

Test build #63151 has finished for PR 14474 at commit 2b2c1d6.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 4, 2016

Test build #63199 has finished for PR 14474 at commit 952ec92.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

cloud-fan commented Aug 4, 2016

LGTM, though the test hasn't passed.

@SparkQA
Copy link

SparkQA commented Aug 4, 2016

Test build #63204 has finished for PR 14474 at commit d9b5a40.

  • This patch fails MiMa tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -184,6 +184,17 @@ class DatasetSuite extends QueryTest with SharedSQLContext {
2, 3, 4)
}

test("SPARK-16853: select, case class and tuple") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about typed select that returns case class or tuple?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cloud-fan

This follows existing test case name, for example, for select 2, current name is

test("select 2, primitive and tuple") {

https://github.com/apache/spark/pull/14474/files#diff-3836bd1fc5ae9c8b8ac4bdb2b9944159L196

@SparkQA
Copy link

SparkQA commented Aug 4, 2016

Test build #63206 has finished for PR 14474 at commit 3d90a68.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@asfgit asfgit closed this in 9d7a474 Aug 4, 2016
@cloud-fan
Copy link
Contributor

thanks, merging to master!

@rxin
Copy link
Contributor

rxin commented Aug 5, 2016

Is this a bug fix? If yes, shouldn't we merge it in 2.0?

@cloud-fan
Copy link
Contributor

it's a bug since the very beginning, should we merge it to 1.6 too?

@zsxwing
Copy link
Member

zsxwing commented Sep 8, 2016

Should this one be merged into branch-2.0?

@cloud-fan
Copy link
Contributor

@clockfly can you create a new PR against 2.0? thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants