[SPARK-13432][SQL] add the source file name and line into a generated Java code #11301

kiszk · 2016-02-22T10:05:47Z

What changes were proposed in this pull request?

This PR adds the source file name and line into a comment of a Java code generated by Catalyst. It would be helpful to quickly identify the original source file from a position where an error occurs during a problem determination of a customer. It supports only DataFrame and SQL.

This PR consists of three parts:

add line information to Origin
pass Origin from a master to executors
add Origin into a comment

This PR adds the information to a comment for existing operations. Other PRs will address the followings:

Add a comment for Dataset
Insert a comment for other places

Here is an example. The original Java program.

object Test {
  ...
  df.filter("v <= 3")
    .filter("v % 2 == 0")
    .show()
  ...
}

Generated Java code

...
/* 031 */   protected void processNext() throws java.io.IOException {
/* 032 */     while (input.hasNext()) {
/* 033 */       InternalRow inputadapter_row = (InternalRow) input.next();
/* 034 */       /* input[0, string] @ filter at Test.scala:23 */
/* 035 */       boolean inputadapter_isNull = inputadapter_row.isNullAt(0);
/* 036 */       UTF8String inputadapter_value = inputadapter_isNull ? null : (inputadapter_row.getUTF8String(0));
/* 037 */       /* input[1, int] @ filter at Test.scala:23 */
/* 038 */       boolean inputadapter_isNull1 = inputadapter_row.isNullAt(1);
/* 039 */       int inputadapter_value1 = inputadapter_isNull1 ? -1 : (inputadapter_row.getInt(1));
/* 040 */       /* ((input[1, int] <= 3) && ((input[1, int] % 2) = 0)) @ filter at Test.scala:23 */
/* 041 */       /* (input[1, int] <= 3) @ filter at Dataset1.scala:22 */
...

How was the this patch tested?

Unit test (add a test to keep Origin during SerDe)

kiszk · 2016-02-22T10:06:26Z

This PR is derived from an offline discussion with @sarutak .

kiszk · 2016-02-22T13:38:17Z

Jenkins, retest this please

SparkQA · 2016-02-22T23:08:41Z

Test build #51664 has finished for PR 11301 at commit 0678a66.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

kiszk · 2016-02-24T04:27:10Z

ping @sarutak

sarutak · 2016-02-24T05:40:51Z

Thanks @kiszk ! I'll check this later.

kiszk · 2016-02-25T16:49:58Z

JIRA-13644 handles the following feature. This PR does not implement the following feature in a stack trace

I also add one feature. This is to show a message points out the origin of a generated method when an exception occurs in the generated method at runtime.

An example of a message (the first line is newly added)

07:49:29.525 ERROR org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator: The method GeneratedIterator.processNext() is generated for filter at Test.scala:23
07:49:29.526 ERROR org.apache.spark.executor.Executor: Exception in task 1.0 in stage 2.0 (TID 4)
java.lang.NullPointerException:
        at ...

Here is a part of generated code.

...
/* 031 */   protected void processNext() throws java.io.IOException {
/* 032 */     try {
/* 033 */       while (input.hasNext()) {
/* 034 */         InternalRow inputadapter_row = (InternalRow) input.next();
/* 035 */         /* input[0, string] @ filter at Test.scala:23 */
/* 036 */         boolean inputadapter_isNull = inputadapter_row.isNullAt(0);
/* 037 */         UTF8String inputadapter_value = inputadapter_isNull ? null : (inputadapter_row.getUTF8String(0));
/* 038 */         /* input[1, int] @ filter at Test.scala:23 */
/* 039 */         boolean inputadapter_isNull1 = inputadapter_row.isNullAt(1);
/* 040 */         int inputadapter_value1 = inputadapter_isNull1 ? -1 : (inputadapter_row.getInt(1));
/* 041 */         /* ((input[1, int] <= 3) && ((input[1, int] % 2) = 0)) @ filter at Test.scala:23 */
/* 042 */         /* (input[1, int] <= 3) @ filter at Dataset1.scala:22 */
...
/* 068 */         if (shouldStop()) {
/* 069 */           return;
/* 070 */         }
/* 071 */       }
/* 072 */     } catch (final Throwable e) {
/* 073 */       org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(this.getClass());
/* 074 */       logger.error("The method processNext() is generated for " +
/* 075 */         "filter at Test.scala:22");
/* 076 */       throw e;
/* 077 */     }
...

kiszk · 2016-02-25T16:51:49Z

Closed by my mistake. Reopen this.

SparkQA · 2016-02-25T17:49:09Z

Test build #51977 has finished for PR 11301 at commit 7671742.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-02-26T19:00:18Z

Test build #52059 has finished for PR 11301 at commit 8497208.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-02-27T08:29:09Z

Test build #52108 has finished for PR 11301 at commit f0d2b81.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-02-27T14:41:53Z

Test build #52120 has finished for PR 11301 at commit 1a1ff67.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

kiszk · 2016-02-27T16:24:31Z

The failed test ParquetHadoopFsRelationSuite is due to the lack of short type support in UnsafeRowParquetRecordReader. When we merge PR #11412, no failure will occur.

SparkQA · 2016-02-28T01:23:31Z

Test build #52131 has finished for PR 11301 at commit 02ebed2.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

kiszk · 2016-02-28T02:17:17Z

@sarutak , this is ready for your review now.

sarutak · 2016-02-29T05:05:45Z

O.K. I'll inspect this change.

sarutak · 2016-03-02T05:57:14Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ExpressionSet.scala

@@ -50,7 +50,7 @@ object ExpressionSet {
 class ExpressionSet protected(
    protected val baseSet: mutable.Set[Expression] = new mutable.HashSet,
    protected val originals: mutable.Buffer[Expression] = new ArrayBuffer)
-  extends Set[Expression] {
+  extends Set[Expression] with Serializable {


Yes, constraints in QueryPlan is ExpressionSet and SparkPlan which is a subclass of QueryPlan is serializable so ExpressionSet should be also serializable strictly. But constraints is lazy val and it's not accessed when the receiver object is a instance of SparkPlan. In other word, constraints is accessed only when the receiver object is a instance of LogicalPlan.

Thank you for your explanation on how constraints is accessed. From the implementation view, my understanding is that it is still necessary to declare Serializable for ExpressionSet . Is there another good idea to enable Serializable only for LogicalPlan?

If ExpressionSet is really serialized only in the case of LogicalPlan, we could move constraints from QueryPlan to LogicalPlan but I'm not sure it's correct way.
Have you ever got any problem because ExpressionSet is not Serializable ?

Yes, I got an exception regarding non-serializable in test suites in hive when ExpressionSet is not Serializable. This is why I added Serialiable to ExpressionSet

O.K. I got it.

sarutak · 2016-03-02T08:03:48Z

@davies Can you also check this? Because you have modified code changed in this PR many times.

davies · 2016-03-02T18:07:42Z

The information of callsite is helpful in general. The try-catch in generated code just added too much noise. You will see more information when the job fails, I'm not sure how helpful that logging is.

cc @rxin

…me API's call site

SparkQA · 2016-06-02T16:21:15Z

Test build #59847 has finished for PR 11301 at commit df5820b.

This patch fails Spark unit tests.
This patch does not merge cleanly.
This patch adds no public classes.

SparkQA · 2016-06-02T17:37:43Z

Test build #59848 has finished for PR 11301 at commit df774eb.

This patch passes all tests.
This patch does not merge cleanly.
This patch adds no public classes.

SparkQA · 2016-06-02T18:56:50Z

Test build #59851 has finished for PR 11301 at commit 597b732.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

HyukjinKwon · 2017-05-11T12:03:26Z

(Hi all, i am just wondering where we are on this)

kiszk · 2017-05-16T16:21:23Z

I have less bandwidth for this since I am busy for others. Let me close for now.

kiszk force-pushed the SPARK-13432 branch from 5719630 to 0678a66 Compare February 22, 2016 16:37

kiszk force-pushed the SPARK-13432 branch from 0678a66 to 7671742 Compare February 25, 2016 16:24

kiszk changed the title ~~[SPARK-13432][SQL] add the source file name and line into a generated Java code~~ [SPARK-13432][SQL] add the source file name and line into a generated Java code and stderr Feb 25, 2016

kiszk closed this Feb 25, 2016

kiszk reopened this Feb 25, 2016

kiszk force-pushed the SPARK-13432 branch from 8497208 to f0d2b81 Compare February 27, 2016 07:00

kiszk force-pushed the SPARK-13432 branch from 1a1ff67 to 02ebed2 Compare February 27, 2016 23:36

sarutak reviewed Mar 2, 2016
View reviewed changes

kiszk added 22 commits June 3, 2016 00:45

revert

9731e4e

Add toOriginString()

305852a

call toOriginString() in toCommentSafeString()

10de448

call toOriginString() in toCommentSafeString()

e68f551

add CurrentOrigin when a Column object is created

458db22

rebase

dfbe2df

replace Origin.callSite in Expression in Column with that for DataFra…

7b606ab

…me API's call site

add callSite in toOriginString() for all subclassses of a TreeNode class

d9536d4

rebase

9d70b36

fix test failure of 'semantic errors' in ErrorParserSuite

05332ed

Fix issues by avoiding to set Origin in a constructor of Column

330521c

addressed a comment by make TreeNode serializable

607e678

Revert a change. Now, TreeNode is non-serializable

f0b9587

Update CurrentOrigin to set origin to LogicalPlan

0bf3586

stop setting callSite into Origin at parse time

c08d839

add test suites

7c68e07

fix compilation error

0091eb5

resolved conflicts

bacfcc6

fix build error

4f8772b

resolved conflicts

ccdf12d

addressed minor comments

7279489

update expected line numbers in test suite

a96bc48

update testsuite

597b732

kiszk force-pushed the SPARK-13432 branch from df774eb to 597b732 Compare June 2, 2016 16:29

kiszk closed this May 16, 2017

[SPARK-13432][SQL] add the source file name and line into a generated Java code #11301

[SPARK-13432][SQL] add the source file name and line into a generated Java code #11301

Uh oh!

Conversation

kiszk commented Feb 22, 2016

What changes were proposed in this pull request?

How was the this patch tested?

Uh oh!

kiszk commented Feb 22, 2016

Uh oh!

kiszk commented Feb 22, 2016

Uh oh!

SparkQA commented Feb 22, 2016

Uh oh!

kiszk commented Feb 24, 2016

Uh oh!

sarutak commented Feb 24, 2016

Uh oh!

kiszk commented Feb 25, 2016

Uh oh!

kiszk commented Feb 25, 2016

Uh oh!

SparkQA commented Feb 25, 2016

Uh oh!

SparkQA commented Feb 26, 2016

Uh oh!

SparkQA commented Feb 27, 2016

Uh oh!

SparkQA commented Feb 27, 2016

Uh oh!

kiszk commented Feb 27, 2016

Uh oh!

SparkQA commented Feb 28, 2016

Uh oh!

kiszk commented Feb 28, 2016

Uh oh!

sarutak commented Feb 29, 2016

Uh oh!

sarutak Mar 2, 2016

Choose a reason for hiding this comment

Uh oh!

kiszk Mar 2, 2016

Choose a reason for hiding this comment

Uh oh!

sarutak Mar 7, 2016

Choose a reason for hiding this comment

Uh oh!

kiszk Mar 8, 2016

Choose a reason for hiding this comment

Uh oh!

sarutak Mar 9, 2016

Choose a reason for hiding this comment

Uh oh!

sarutak commented Mar 2, 2016

Uh oh!

davies commented Mar 2, 2016

Uh oh!

SparkQA commented Jun 2, 2016

Uh oh!

SparkQA commented Jun 2, 2016

Uh oh!

SparkQA commented Jun 2, 2016

Uh oh!

HyukjinKwon commented May 11, 2017

Uh oh!

kiszk commented May 16, 2017

Uh oh!

Uh oh!