Skip to content

[SPARK-13432][SQL] add the source file name and line into a generated Java code #11301

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 29 commits into from

Conversation

kiszk
Copy link
Member

@kiszk kiszk commented Feb 22, 2016

What changes were proposed in this pull request?

This PR adds the source file name and line into a comment of a Java code generated by Catalyst. It would be helpful to quickly identify the original source file from a position where an error occurs during a problem determination of a customer. It supports only DataFrame and SQL.

This PR consists of three parts:

  • add line information to Origin
  • pass Origin from a master to executors
  • add Origin into a comment

This PR adds the information to a comment for existing operations. Other PRs will address the followings:

  • Add a comment for Dataset
  • Insert a comment for other places

Here is an example. The original Java program.

object Test {
  ...
  df.filter("v <= 3")
    .filter("v % 2 == 0")
    .show()
  ...
}

Generated Java code

...
/* 031 */   protected void processNext() throws java.io.IOException {
/* 032 */     while (input.hasNext()) {
/* 033 */       InternalRow inputadapter_row = (InternalRow) input.next();
/* 034 */       /* input[0, string] @ filter at Test.scala:23 */
/* 035 */       boolean inputadapter_isNull = inputadapter_row.isNullAt(0);
/* 036 */       UTF8String inputadapter_value = inputadapter_isNull ? null : (inputadapter_row.getUTF8String(0));
/* 037 */       /* input[1, int] @ filter at Test.scala:23 */
/* 038 */       boolean inputadapter_isNull1 = inputadapter_row.isNullAt(1);
/* 039 */       int inputadapter_value1 = inputadapter_isNull1 ? -1 : (inputadapter_row.getInt(1));
/* 040 */       /* ((input[1, int] <= 3) && ((input[1, int] % 2) = 0)) @ filter at Test.scala:23 */
/* 041 */       /* (input[1, int] <= 3) @ filter at Dataset1.scala:22 */
...

How was the this patch tested?

Unit test (add a test to keep Origin during SerDe)

@kiszk
Copy link
Member Author

kiszk commented Feb 22, 2016

This PR is derived from an offline discussion with @sarutak .

@kiszk
Copy link
Member Author

kiszk commented Feb 22, 2016

Jenkins, retest this please

@SparkQA
Copy link

SparkQA commented Feb 22, 2016

Test build #51664 has finished for PR 11301 at commit 0678a66.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@kiszk
Copy link
Member Author

kiszk commented Feb 24, 2016

ping @sarutak

@sarutak
Copy link
Member

sarutak commented Feb 24, 2016

Thanks @kiszk ! I'll check this later.

@kiszk kiszk changed the title [SPARK-13432][SQL] add the source file name and line into a generated Java code [SPARK-13432][SQL] add the source file name and line into a generated Java code and stderr Feb 25, 2016
@kiszk
Copy link
Member Author

kiszk commented Feb 25, 2016

JIRA-13644 handles the following feature. This PR does not implement the following feature in a stack trace

I also add one feature. This is to show a message points out the origin of a generated method when an exception occurs in the generated method at runtime.

An example of a message (the first line is newly added)

07:49:29.525 ERROR org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator: The method GeneratedIterator.processNext() is generated for filter at Test.scala:23
07:49:29.526 ERROR org.apache.spark.executor.Executor: Exception in task 1.0 in stage 2.0 (TID 4)
java.lang.NullPointerException:
        at ...

Here is a part of generated code.

...
/* 031 */   protected void processNext() throws java.io.IOException {
/* 032 */     try {
/* 033 */       while (input.hasNext()) {
/* 034 */         InternalRow inputadapter_row = (InternalRow) input.next();
/* 035 */         /* input[0, string] @ filter at Test.scala:23 */
/* 036 */         boolean inputadapter_isNull = inputadapter_row.isNullAt(0);
/* 037 */         UTF8String inputadapter_value = inputadapter_isNull ? null : (inputadapter_row.getUTF8String(0));
/* 038 */         /* input[1, int] @ filter at Test.scala:23 */
/* 039 */         boolean inputadapter_isNull1 = inputadapter_row.isNullAt(1);
/* 040 */         int inputadapter_value1 = inputadapter_isNull1 ? -1 : (inputadapter_row.getInt(1));
/* 041 */         /* ((input[1, int] <= 3) && ((input[1, int] % 2) = 0)) @ filter at Test.scala:23 */
/* 042 */         /* (input[1, int] <= 3) @ filter at Dataset1.scala:22 */
...
/* 068 */         if (shouldStop()) {
/* 069 */           return;
/* 070 */         }
/* 071 */       }
/* 072 */     } catch (final Throwable e) {
/* 073 */       org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(this.getClass());
/* 074 */       logger.error("The method processNext() is generated for " +
/* 075 */         "filter at Test.scala:22");
/* 076 */       throw e;
/* 077 */     }
...

@kiszk kiszk closed this Feb 25, 2016
@kiszk
Copy link
Member Author

kiszk commented Feb 25, 2016

Closed by my mistake. Reopen this.

@kiszk kiszk reopened this Feb 25, 2016
@SparkQA
Copy link

SparkQA commented Feb 25, 2016

Test build #51977 has finished for PR 11301 at commit 7671742.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Feb 26, 2016

Test build #52059 has finished for PR 11301 at commit 8497208.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Feb 27, 2016

Test build #52108 has finished for PR 11301 at commit f0d2b81.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Feb 27, 2016

Test build #52120 has finished for PR 11301 at commit 1a1ff67.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@kiszk
Copy link
Member Author

kiszk commented Feb 27, 2016

The failed test ParquetHadoopFsRelationSuite is due to the lack of short type support in UnsafeRowParquetRecordReader. When we merge PR #11412, no failure will occur.

@SparkQA
Copy link

SparkQA commented Feb 28, 2016

Test build #52131 has finished for PR 11301 at commit 02ebed2.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@kiszk
Copy link
Member Author

kiszk commented Feb 28, 2016

@sarutak , this is ready for your review now.

@sarutak
Copy link
Member

sarutak commented Feb 29, 2016

O.K. I'll inspect this change.

@@ -50,7 +50,7 @@ object ExpressionSet {
class ExpressionSet protected(
protected val baseSet: mutable.Set[Expression] = new mutable.HashSet,
protected val originals: mutable.Buffer[Expression] = new ArrayBuffer)
extends Set[Expression] {
extends Set[Expression] with Serializable {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, constraints in QueryPlan is ExpressionSet and SparkPlan which is a subclass of QueryPlan is serializable so ExpressionSet should be also serializable strictly. But constraints is lazy val and it's not accessed when the receiver object is a instance of SparkPlan. In other word, constraints is accessed only when the receiver object is a instance of LogicalPlan.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your explanation on how constraints is accessed. From the implementation view, my understanding is that it is still necessary to declare Serializable for ExpressionSet . Is there another good idea to enable Serializable only for LogicalPlan?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If ExpressionSet is really serialized only in the case of LogicalPlan, we could move constraints from QueryPlan to LogicalPlan but I'm not sure it's correct way.
Have you ever got any problem because ExpressionSet is not Serializable ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I got an exception regarding non-serializable in test suites in hive when ExpressionSet is not Serializable. This is why I added Serialiable to ExpressionSet

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

O.K. I got it.

@sarutak
Copy link
Member

sarutak commented Mar 2, 2016

@davies Can you also check this? Because you have modified code changed in this PR many times.

@davies
Copy link
Contributor

davies commented Mar 2, 2016

The information of callsite is helpful in general. The try-catch in generated code just added too much noise. You will see more information when the job fails, I'm not sure how helpful that logging is.

cc @rxin

@SparkQA
Copy link

SparkQA commented Jun 2, 2016

Test build #59847 has finished for PR 11301 at commit df5820b.

  • This patch fails Spark unit tests.
  • This patch does not merge cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 2, 2016

Test build #59848 has finished for PR 11301 at commit df774eb.

  • This patch passes all tests.
  • This patch does not merge cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 2, 2016

Test build #59851 has finished for PR 11301 at commit 597b732.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

(Hi all, i am just wondering where we are on this)

@kiszk
Copy link
Member Author

kiszk commented May 16, 2017

I have less bandwidth for this since I am busy for others. Let me close for now.

@kiszk kiszk closed this May 16, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants