-
Notifications
You must be signed in to change notification settings - Fork 28.6k
[SPARK-13456][SQL] fix creating encoders for case classes defined in Spark shell #11410
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|import org.apache.spark.sql.TypedColumn | ||
|/** An `Aggregator` that adds up any numeric type returned by the given function. */ | ||
|class SumOf[I, N : Numeric](f: I => N) extends | ||
| org.apache.spark.sql.expressions.Aggregator[I, N, N] { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The import doesn't work here, not sure why
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
created https://issues.apache.org/jira/browse/SPARK-13611 for this issue
Test build #52115 has finished for PR 11410 at commit
|
Test build #52116 has finished for PR 11410 at commit
|
retest this please |
Test build #52119 has finished for PR 11410 at commit
|
retest this please. |
if (outer == null) { | ||
outerCls match { | ||
case REPLClass(line) => | ||
val cls1 = Utils.classForName(line + "$read$") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add a comment here to explain the REPL use case.
Test build #52129 has finished for PR 11410 at commit
|
Test build #52126 has finished for PR 11410 at commit
|
retest this please |
Test build #52137 has finished for PR 11410 at commit
|
|}.toColumn | ||
| | ||
|val ds = Seq(1, 2, 3, 4).toDS() | ||
|ds.select(simpleSum).collect |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think your solution works, but it might be good to add a test case where the scope that gets captured has a side effect (i.e. create a file so if you double execute the outer scope it will fail the second time).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to add such a test, but couldn't figure it out. The outer scope is the line wrapper class that generated by scala REPL framework and I'm not sure how to insert side effect into it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was just thinking you do something like create a file and then make sure that file handle gets used in the closure. If you got something wrong the second time the file is created it will fail.
Test build #52228 has finished for PR 11410 at commit
|
Test build #52286 has finished for PR 11410 at commit
|
retest this please |
Test build #52303 has finished for PR 11410 at commit
|
retest this please |
Test build #52650 has finished for PR 11410 at commit
|
@retronym Will have time to do another round of review? Thank you! |
Test build #52715 has finished for PR 11410 at commit
|
obj = getter.invoke(obj) | ||
getter = iwGetter(getter.getReturnType) | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hi @retronym , now I loop until there is no $iw
method, is this looks good to you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this looks cleaner to me.
@retronym Thank you for the review! I'd like to merge this fix. Sounds good? |
retest this please |
Test build #53157 has finished for PR 11410 at commit
|
Since offering review, I've found a more direct way to get to the object.
|
Hi @retronym , your idea looks good, but in our situation, we only have a class name string and it's not that easy to get the line object. BTW the REPL classes are loaded dynamically in Spark, I think we have to use reflection to work on them. Any ideas? |
ping @retronym again |
retest this please. |
Test build #53520 has finished for PR 11410 at commit
|
So long as you've got some defence from a test case, your current solution seems okay. |
Test build #53541 has finished for PR 11410 at commit
|
Test build #53645 has finished for PR 11410 at commit
|
retest this please |
Test build #53651 has finished for PR 11410 at commit
|
retest this please |
Test build #53658 has finished for PR 11410 at commit
|
retest this please. |
Test build #53666 has finished for PR 11410 at commit
|
cc @rxin |
Thanks. Merging to master. |
…Spark shell ## What changes were proposed in this pull request? case classes defined in REPL are wrapped by line classes, and we have a trick for scala 2.10 REPL to automatically register the wrapper classes to `OuterScope` so that we can use when create encoders. However, this trick doesn't work right after we upgrade to scala 2.11, and unfortunately the tests are only in scala 2.10, which makes this bug hidden until now. This PR moves the encoder tests to scala 2.11 `ReplSuite`, and fixes this bug by another approach(the previous trick can't port to scala 2.11 REPL): make `OuterScope` smarter that can detect classes defined in REPL and load the singleton of line wrapper classes automatically. ## How was this patch tested? the migrated encoder tests in `ReplSuite` Author: Wenchen Fan <wenchen@databricks.com> Closes apache#11410 from cloud-fan/repl.
…ase class defined in REPL ## What changes were proposed in this pull request? In apache#11410, we missed a corner case: define the inner class and use it in `Dataset` at the same time by using paste mode. For this case, the inner class and the `Dataset` are inside same line object, when we build the `Dataset`, we try to get outer pointer from line object, and it will fail because the line object is not initialized yet. https://issues.apache.org/jira/browse/SPARK-13456?focusedCommentId=15209174&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15209174 is an example for this corner case. This PR make the process of getting outer pointer from line object lazy, so that we can successfully build the `Dataset` and finish initializing the line object. ## How was this patch tested? new test in repl suite. Author: Wenchen Fan <wenchen@databricks.com> Closes apache#11931 from cloud-fan/repl.
What changes were proposed in this pull request?
case classes defined in REPL are wrapped by line classes, and we have a trick for scala 2.10 REPL to automatically register the wrapper classes to
OuterScope
so that we can use when create encoders.However, this trick doesn't work right after we upgrade to scala 2.11, and unfortunately the tests are only in scala 2.10, which makes this bug hidden until now.
This PR moves the encoder tests to scala 2.11
ReplSuite
, and fixes this bug by another approach(the previous trick can't port to scala 2.11 REPL): makeOuterScope
smarter that can detect classes defined in REPL and load the singleton of line wrapper classes automatically.How was this patch tested?
the migrated encoder tests in
ReplSuite