-
Notifications
You must be signed in to change notification settings - Fork 28.6k
[SPARK-13432][SQL] add the source file name and line into a generated Java code #11301
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This PR is derived from an offline discussion with @sarutak . |
Jenkins, retest this please |
Test build #51664 has finished for PR 11301 at commit
|
ping @sarutak |
Thanks @kiszk ! I'll check this later. |
JIRA-13644 handles the following feature. This PR does not implement the following feature in a stack trace I also add one feature. This is to show a message points out the origin of a generated method when an exception occurs in the generated method at runtime. An example of a message (the first line is newly added)
Here is a part of generated code.
|
Closed by my mistake. Reopen this. |
Test build #51977 has finished for PR 11301 at commit
|
Test build #52059 has finished for PR 11301 at commit
|
Test build #52108 has finished for PR 11301 at commit
|
Test build #52120 has finished for PR 11301 at commit
|
The failed test |
Test build #52131 has finished for PR 11301 at commit
|
@sarutak , this is ready for your review now. |
O.K. I'll inspect this change. |
@@ -50,7 +50,7 @@ object ExpressionSet { | |||
class ExpressionSet protected( | |||
protected val baseSet: mutable.Set[Expression] = new mutable.HashSet, | |||
protected val originals: mutable.Buffer[Expression] = new ArrayBuffer) | |||
extends Set[Expression] { | |||
extends Set[Expression] with Serializable { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, constraints
in QueryPlan
is ExpressionSet
and SparkPlan
which is a subclass of QueryPlan
is serializable so ExpressionSet
should be also serializable strictly. But constraints
is lazy val and it's not accessed when the receiver object is a instance of SparkPlan
. In other word, constraints
is accessed only when the receiver object is a instance of LogicalPlan
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your explanation on how constraints
is accessed. From the implementation view, my understanding is that it is still necessary to declare Serializable
for ExpressionSet
. Is there another good idea to enable Serializable
only for LogicalPlan
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If ExpressionSet
is really serialized only in the case of LogicalPlan
, we could move constraints
from QueryPlan
to LogicalPlan
but I'm not sure it's correct way.
Have you ever got any problem because ExpressionSet
is not Serializable
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I got an exception regarding non-serializable in test suites in hive
when ExpressionSet
is not Serializable
. This is why I added Serialiable
to ExpressionSet
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
O.K. I got it.
@davies Can you also check this? Because you have modified code changed in this PR many times. |
The information of callsite is helpful in general. The try-catch in generated code just added too much noise. You will see more information when the job fails, I'm not sure how helpful that logging is. cc @rxin |
…me API's call site
Test build #59847 has finished for PR 11301 at commit
|
Test build #59848 has finished for PR 11301 at commit
|
Test build #59851 has finished for PR 11301 at commit
|
(Hi all, i am just wondering where we are on this) |
I have less bandwidth for this since I am busy for others. Let me close for now. |
What changes were proposed in this pull request?
This PR adds the source file name and line into a comment of a Java code generated by Catalyst. It would be helpful to quickly identify the original source file from a position where an error occurs during a problem determination of a customer. It supports only DataFrame and SQL.
This PR consists of three parts:
This PR adds the information to a comment for existing operations. Other PRs will address the followings:
Here is an example. The original Java program.
Generated Java code
How was the this patch tested?
Unit test (add a test to keep Origin during SerDe)