Skip to content

[SPARK-13456][SQL] fix creating encoders for case classes defined in Spark shell #11410

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 7 commits into from

Conversation

cloud-fan
Copy link
Contributor

What changes were proposed in this pull request?

case classes defined in REPL are wrapped by line classes, and we have a trick for scala 2.10 REPL to automatically register the wrapper classes to OuterScope so that we can use when create encoders.
However, this trick doesn't work right after we upgrade to scala 2.11, and unfortunately the tests are only in scala 2.10, which makes this bug hidden until now.

This PR moves the encoder tests to scala 2.11 ReplSuite, and fixes this bug by another approach(the previous trick can't port to scala 2.11 REPL): make OuterScope smarter that can detect classes defined in REPL and load the singleton of line wrapper classes automatically.

How was this patch tested?

the migrated encoder tests in ReplSuite

@cloud-fan
Copy link
Contributor Author

cc @marmbrus @liancheng @rxin

|import org.apache.spark.sql.TypedColumn
|/** An `Aggregator` that adds up any numeric type returned by the given function. */
|class SumOf[I, N : Numeric](f: I => N) extends
| org.apache.spark.sql.expressions.Aggregator[I, N, N] {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The import doesn't work here, not sure why

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SparkQA
Copy link

SparkQA commented Feb 27, 2016

Test build #52115 has finished for PR 11410 at commit a2c4617.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Feb 27, 2016

Test build #52116 has finished for PR 11410 at commit a0a8424.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented Feb 27, 2016

Test build #52119 has finished for PR 11410 at commit a0a8424.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor Author

retest this please.

if (outer == null) {
outerCls match {
case REPLClass(line) =>
val cls1 = Utils.classForName(line + "$read$")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add a comment here to explain the REPL use case.

@SparkQA
Copy link

SparkQA commented Feb 27, 2016

Test build #52129 has finished for PR 11410 at commit a9d832b.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • //INSTANCE()method to get the single instance of class$read. Then call$iw()``

@SparkQA
Copy link

SparkQA commented Feb 27, 2016

Test build #52126 has finished for PR 11410 at commit a0a8424.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented Feb 28, 2016

Test build #52137 has finished for PR 11410 at commit a9d832b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • //INSTANCE()method to get the single instance of class$read. Then call$iw()``

|}.toColumn
|
|val ds = Seq(1, 2, 3, 4).toDS()
|ds.select(simpleSum).collect
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think your solution works, but it might be good to add a test case where the scope that gets captured has a side effect (i.e. create a file so if you double execute the outer scope it will fail the second time).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to add such a test, but couldn't figure it out. The outer scope is the line wrapper class that generated by scala REPL framework and I'm not sure how to insert side effect into it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was just thinking you do something like create a file and then make sure that file handle gets used in the closure. If you got something wrong the second time the file is created it will fail.

@SparkQA
Copy link

SparkQA commented Mar 1, 2016

Test build #52228 has finished for PR 11410 at commit f44e479.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 2, 2016

Test build #52286 has finished for PR 11410 at commit a0510c4.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • //INSTANCE()method to get the single instance of class$read. Then call$iw()``

@cloud-fan
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented Mar 2, 2016

Test build #52303 has finished for PR 11410 at commit a0510c4.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • //INSTANCE()method to get the single instance of class$read. Then call$iw()``

@cloud-fan
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented Mar 8, 2016

Test build #52650 has finished for PR 11410 at commit 8b79215.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • |createFile();case class TestCaseClass(value: Int)

@yhuai
Copy link
Contributor

yhuai commented Mar 8, 2016

@retronym Will have time to do another round of review? Thank you!

@SparkQA
Copy link

SparkQA commented Mar 9, 2016

Test build #52715 has finished for PR 11410 at commit 9be89ff.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

obj = getter.invoke(obj)
getter = iwGetter(getter.getReturnType)
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hi @retronym , now I loop until there is no $iw method, is this looks good to you?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this looks cleaner to me.

@yhuai
Copy link
Contributor

yhuai commented Mar 11, 2016

@retronym Thank you for the review! I'd like to merge this fix. Sounds good?

@cloud-fan
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented Mar 15, 2016

Test build #53157 has finished for PR 11410 at commit 9be89ff.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@retronym
Copy link

Since offering review, I've found a more direct way to get to the object.

Welcome to Scala 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_71).
Type in expressions for evaluation. Or try :help.

scala> $intp.isettings.unwrapStrings = false
$intp.isettings.unwrapStrings: Boolean = false

scala> var foo = ""
foo: String = ""

scala> foo = "2"
foo: String = 2

scala> val last = $intp.lastRequest
last: $intp.Request = Request(line=foo = "2", 1 trees)

scala> last.fullAccessPath
res1: String = $line4.$read.$iw.$iw

scala> val iw = $line4.$read.$iw.$iw
iw: $line4.$read.$iw.$iw.type = $line4.$read$$iw$$iw$@23444a36

scala> iw.getClass.getMethods.find(_.getName.startsWith("$ires")).head.invoke(iw)
res2: Object = 2

scala> def nullIRes(x: Any) = { val cls = x.getClass; cls.getDeclaredFields.filter(_.getName.startsWith("$ires")).foreach{fld => fld.setAccessible(true); fld.set(x, null)}}
nullIRes: (x: Any)Unit

scala> $intp.interpret(s"nullIRes(${last.fullAccessPath})")
res4: scala.tools.nsc.interpreter.IR.Result = Success

scala> iw.getClass.getMethods.find(_.getName.startsWith("$ires")).head.invoke(iw)
res6: Object = null

@cloud-fan
Copy link
Contributor Author

Hi @retronym , your idea looks good, but in our situation, we only have a class name string and it's not that easy to get the line object. BTW the REPL classes are loaded dynamically in Spark, I think we have to use reflection to work on them. Any ideas?

@rxin
Copy link
Contributor

rxin commented Mar 18, 2016

ping @retronym again

@cloud-fan
Copy link
Contributor Author

retest this please.

@SparkQA
Copy link

SparkQA commented Mar 18, 2016

Test build #53520 has finished for PR 11410 at commit 9be89ff.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@retronym
Copy link

So long as you've got some defence from a test case, your current solution seems okay.

@SparkQA
Copy link

SparkQA commented Mar 18, 2016

Test build #53541 has finished for PR 11410 at commit 41c2cc5.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 21, 2016

Test build #53645 has finished for PR 11410 at commit 47e0e39.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented Mar 21, 2016

Test build #53651 has finished for PR 11410 at commit 47e0e39.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented Mar 21, 2016

Test build #53658 has finished for PR 11410 at commit 47e0e39.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor Author

retest this please.

@SparkQA
Copy link

SparkQA commented Mar 21, 2016

Test build #53666 has finished for PR 11410 at commit 47e0e39.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor Author

cc @rxin

@yhuai
Copy link
Contributor

yhuai commented Mar 21, 2016

Thanks. Merging to master.

@asfgit asfgit closed this in 43ebf7a Mar 21, 2016
roygao94 pushed a commit to roygao94/spark that referenced this pull request Mar 22, 2016
…Spark shell

## What changes were proposed in this pull request?

case classes defined in REPL are wrapped by line classes, and we have a trick for scala 2.10 REPL to automatically register the wrapper classes to `OuterScope` so that we can use when create encoders.
However, this trick doesn't work right after we upgrade to scala 2.11, and unfortunately the tests are only in scala 2.10, which makes this bug hidden until now.

This PR moves the encoder tests to scala 2.11  `ReplSuite`, and fixes this bug by another approach(the previous trick can't port to scala 2.11 REPL): make `OuterScope` smarter that can detect classes defined in REPL and load the singleton of line wrapper classes automatically.

## How was this patch tested?

the migrated encoder tests in `ReplSuite`

Author: Wenchen Fan <wenchen@databricks.com>

Closes apache#11410 from cloud-fan/repl.
ghost pushed a commit to dbtsai/spark that referenced this pull request Mar 25, 2016
…ase class defined in REPL

## What changes were proposed in this pull request?

In apache#11410, we missed a corner case: define the inner class and use it in `Dataset` at the same time by using paste mode. For this case, the inner class and the `Dataset` are inside same line object, when we build the `Dataset`, we try to get outer pointer from line object, and it will fail because the line object is not initialized yet.

https://issues.apache.org/jira/browse/SPARK-13456?focusedCommentId=15209174&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15209174 is an example for this corner case.

This PR make the process of getting outer pointer from line object lazy, so that we can successfully build the `Dataset` and finish initializing the line object.

## How was this patch tested?

new test in repl suite.

Author: Wenchen Fan <wenchen@databricks.com>

Closes apache#11931 from cloud-fan/repl.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants