-
Notifications
You must be signed in to change notification settings - Fork 28.6k
[SPARK-13432][SQL] add the source file name and line into a generated Java code #11301
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
3f99740
ec0c1a9
899bc64
bf72c13
27b13d2
09245eb
9731e4e
305852a
10de448
e68f551
458db22
dfbe2df
7b606ab
d9536d4
9d70b36
05332ed
330521c
607e678
f0b9587
0bf3586
c08d839
7c68e07
0091eb5
bacfcc6
4f8772b
ccdf12d
7279489
a96bc48
597b732
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -98,7 +98,7 @@ abstract class AbstractSqlParser extends ParserInterface with Logging { | |
case e: ParseException => | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Just in case, please keep the mode of this file and other files 644 instead of 755 |
||
throw e.withCommand(command) | ||
case e: AnalysisException => | ||
val position = Origin(e.line, e.startPosition) | ||
val position = Origin(None, e.line, e.startPosition) | ||
throw new ParseException(Option(command), e.message, position, position) | ||
} | ||
} | ||
|
@@ -150,7 +150,7 @@ case object ParseErrorListener extends BaseErrorListener { | |
charPositionInLine: Int, | ||
msg: String, | ||
e: RecognitionException): Unit = { | ||
val position = Origin(Some(line), Some(charPositionInLine)) | ||
val position = Origin(None, Some(line), Some(charPositionInLine)) | ||
throw new ParseException(None, msg, position, position) | ||
} | ||
} | ||
|
@@ -176,7 +176,7 @@ class ParseException( | |
val builder = new StringBuilder | ||
builder ++= "\n" ++= message | ||
start match { | ||
case Origin(Some(l), Some(p)) => | ||
case Origin(_, Some(l), Some(p)) => | ||
builder ++= s"(line $l, pos $p)\n" | ||
command.foreach { cmd => | ||
val (above, below) = cmd.split("\n").splitAt(l) | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -41,6 +41,7 @@ import org.apache.spark.util.Utils | |
private class MutableInt(var i: Int) | ||
|
||
case class Origin( | ||
var callSite: Option[String] = None, | ||
line: Option[Int] = None, | ||
startPosition: Option[Int] = None) | ||
|
||
|
@@ -58,15 +59,15 @@ object CurrentOrigin { | |
|
||
def reset(): Unit = value.set(Origin()) | ||
|
||
def setPosition(line: Int, start: Int): Unit = { | ||
def setPosition(callSite: String, line: Int, start: Int): Unit = { | ||
value.set( | ||
value.get.copy(line = Some(line), startPosition = Some(start))) | ||
value.get.copy(callSite = Some(callSite), line = Some(line), startPosition = Some(start))) | ||
} | ||
|
||
def withOrigin[A](o: Origin)(f: => A): A = { | ||
val current = get | ||
set(o) | ||
val ret = try f finally { reset() } | ||
reset() | ||
val ret = try f finally { set(current) } | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It might correct change but I noticed that after this change, we have another issue when we operate For example, If we have follwing code and run it.
And then, we have generated code like as follows.
At the line #72, it should not be This issue is because origin is not reset properly. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I also supported There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks for the change. Ideally, the real call site should be displayed. In the case above, the comment should be displayed like There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I ideally agree with you. As you can imagine, this is why an argument There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think it should be replaced with the real call site but if it's difficult, how about just reset the origin and memorize TODO for the future work. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This change can generate a comment for /* 120 */ while (!shouldStop() && (firstRow || rdd_input.hasNext())) {
/* 121 */ if (firstRow) {
/* 122 */ firstRow = false;
/* 123 */ } else {
/* 124 */ rdd_row = (InternalRow) rdd_input.next();
/* 125 */ }
/* 126 */ rdd_metricValue.add(1);
/* 127 */ /*** CONSUME: Filter (isnotnull(_1#0) && (_1#0 > 4)) */
/* 128 */ /* input[0, int] [org.apache.spark.sql.catalyst.expressions.BoundReference] @ filter at DF.scala:16 */
/* 129 */ boolean project_isNull4 = rdd_row.isNullAt(0);
/* 130 */ int project_value4 = project_isNull4 ? -1 : (rdd_row.getInt(0));
/* 131 */
/* 132 */ /* (isnotnull(input[0, int]) && (input[0, int] > 4)) [org.apache.spark.sql.catalyst.expressions.And] @ select at DF.scala:17 */
/* 133 */ boolean filter_isNull6 = false;
/* 134 */ boolean filter_value6 = false;
/* 135 */
/* 136 */ if (!false && !(!(project_isNull4))) {
/* 137 */ } else {
/* 138 */ /* (input[0, int] > 4) [org.apache.spark.sql.catalyst.expressions.GreaterThan] @ filter at DF.scala:16 */
/* 139 */ boolean filter_isNull9 = true;
/* 140 */ boolean filter_value9 = false;
/* 141 */
/* 142 */ if (!project_isNull4) {
/* 143 */ filter_isNull9 = false; // resultCode could change nullability.
/* 144 */ filter_value9 = project_value4 > 4;
/* 145 */
/* 146 */ }
/* 147 */ if (!filter_isNull9 && !filter_value9) {
/* 148 */ } else if (!false && !filter_isNull9) {
/* 149 */ filter_value6 = true;
/* 150 */ } else {
/* 151 */ filter_isNull6 = true;
/* 152 */ }
/* 153 */ }
/* 154 */ if (!(!filter_isNull6 && filter_value6)) continue;
/* 155 */ filter_metricValue1.add(1);
/* 156 */
/* 157 */ /*** CONSUME: Project [(_1#0 * 10) AS (_1 * 10)#1] */
/* 158 */
/* 159 */ /* (input[0, int] * 10) [org.apache.spark.sql.catalyst.expressions.Multiply] @ select at DF.scala:17 */
/* 160 */ boolean project_isNull5 = true;
/* 161 */ int project_value5 = -1; |
||
ret | ||
} | ||
} | ||
|
@@ -442,6 +443,13 @@ abstract class TreeNode[BaseType <: TreeNode[BaseType]] extends Product { | |
|
||
override def toString: String = treeString | ||
|
||
def toOriginString: String = | ||
if (this.origin.callSite.isDefined) { | ||
this.toString + " @ " + this.origin.callSite.get | ||
} else { | ||
this.toString | ||
} | ||
|
||
/** Returns a string representation of the nodes in this tree */ | ||
def treeString: String = generateTreeString(0, Nil, new StringBuilder).toString | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -26,10 +26,12 @@ import org.apache.spark.sql.catalyst.encoders.{encoderFor, ExpressionEncoder} | |
import org.apache.spark.sql.catalyst.expressions._ | ||
import org.apache.spark.sql.catalyst.expressions.aggregate.AggregateExpression | ||
import org.apache.spark.sql.catalyst.parser.CatalystSqlParser | ||
import org.apache.spark.sql.catalyst.trees.{CurrentOrigin, Origin} | ||
import org.apache.spark.sql.catalyst.util.usePrettyExpression | ||
import org.apache.spark.sql.execution.aggregate.TypedAggregateExpression | ||
import org.apache.spark.sql.functions.lit | ||
import org.apache.spark.sql.types._ | ||
import org.apache.spark.util.Utils.getCallSite | ||
|
||
private[sql] object Column { | ||
|
||
|
@@ -46,6 +48,16 @@ private[sql] object Column { | |
case expr => usePrettyExpression(expr).sql | ||
} | ||
} | ||
|
||
@scala.annotation.varargs | ||
def updateExpressionsOrigin(cols: Column*): Unit = { | ||
// Update Expression.origin using the callSite of an operation | ||
val callSite = org.apache.spark.util.Utils.getCallSite().shortForm | ||
cols.map(col => col.expr.foreach(e => e.origin.callSite = Some(callSite))) | ||
// Update CurrentOrigin for setting origin for LogicalPlan node | ||
CurrentOrigin.set( | ||
Origin(Some(callSite), CurrentOrigin.get.line, CurrentOrigin.get.startPosition)) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think it's better not to set CurrentOrigin directly because origin state changes globally under the same thread.
For example, let's say we execute following 2 queries. One is filter and other is orderBy.
One of the code generated for query2 is as follows.
When we write queries without column objects like
But if we write queries with column objects, call sites are associated with
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe, we can use There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The current version does not use For the first problem, I am afraid that two queries For the second problem, I expected the same behavior. I will investigate it. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Since I was too busy, I did not have time to work for this PR. Sorry. My current idea is to set My next step seems to pass |
||
} | ||
} | ||
|
||
/** | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes,
constraints
inQueryPlan
isExpressionSet
andSparkPlan
which is a subclass ofQueryPlan
is serializable soExpressionSet
should be also serializable strictly. Butconstraints
is lazy val and it's not accessed when the receiver object is a instance ofSparkPlan
. In other word,constraints
is accessed only when the receiver object is a instance ofLogicalPlan
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your explanation on how
constraints
is accessed. From the implementation view, my understanding is that it is still necessary to declareSerializable
forExpressionSet
. Is there another good idea to enableSerializable
only forLogicalPlan
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If
ExpressionSet
is really serialized only in the case ofLogicalPlan
, we could moveconstraints
fromQueryPlan
toLogicalPlan
but I'm not sure it's correct way.Have you ever got any problem because
ExpressionSet
is notSerializable
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I got an exception regarding non-serializable in test suites in
hive
whenExpressionSet
is notSerializable
. This is why I addedSerialiable
toExpressionSet
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
O.K. I got it.