Skip to content

[SPARK-30764][SQL] Improve the readability of EXPLAIN FORMATTED style #27509

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion python/pyspark/sql/dataframe.py
Original file line number Diff line number Diff line change
Expand Up @@ -284,7 +284,8 @@ def explain(self, extended=None, mode=None):
== Physical Plan ==
* Scan ExistingRDD (1)
(1) Scan ExistingRDD [codegen id : 1]
Output: [age#0, name#1]
Output [2]: [age#0, name#1]
...

.. versionchanged:: 3.0.0
Added optional argument `mode` to specify the expected output format of plans.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,6 @@ package org.apache.spark.sql.catalyst.plans
import org.apache.spark.sql.AnalysisException
import org.apache.spark.sql.catalyst.expressions._
import org.apache.spark.sql.catalyst.trees.{CurrentOrigin, TreeNode, TreeNodeTag}
import org.apache.spark.sql.catalyst.util.StringUtils.PlanStringConcat
import org.apache.spark.sql.internal.SQLConf
import org.apache.spark.sql.types.{DataType, StructType}

Expand Down Expand Up @@ -189,9 +188,19 @@ abstract class QueryPlan[PlanType <: QueryPlan[PlanType]] extends TreeNode[PlanT
val codegenIdStr =
getTagValue(QueryPlan.CODEGEN_ID_TAG).map(id => s"[codegen id : $id]").getOrElse("")
val operatorId = getTagValue(QueryPlan.OP_ID_TAG).map(id => s"$id").getOrElse("unknown")
s"""
|($operatorId) $nodeName $codegenIdStr
""".stripMargin
val baseStr = s"($operatorId) $nodeName $codegenIdStr"
val argumentString = argString(SQLConf.get.maxToStringFields)

if (argumentString.nonEmpty) {
s"""
|$baseStr
|Arguments: $argumentString
""".stripMargin
} else {
s"""
|$baseStr
""".stripMargin
}
}

/**
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -76,7 +76,7 @@ trait DataSourceScanExec extends LeafExecNode {

s"""
|(${ExplainUtils.getOpId(this)}) $nodeName ${ExplainUtils.getCodegenId(this)}
|Output: ${producedAttributes.mkString("[", ", ", "]")}
|${ExplainUtils.generateFieldString("Output", producedAttributes)}
|${metadataStr.mkString("\n")}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes are only for DSV1. Could we make the corresponding changes when using DSV2? Open the ticket https://issues.apache.org/jira/browse/SPARK-31480

Also, please check the output when the schema is very long. For example, containing 250+ columns.

""".stripMargin
}
Expand Down Expand Up @@ -377,7 +377,7 @@ case class FileSourceScanExec(

s"""
|(${ExplainUtils.getOpId(this)}) $nodeName ${ExplainUtils.getCodegenId(this)}
|Output: ${producedAttributes.mkString("[", ", ", "]")}
|${ExplainUtils.generateFieldString("Output", producedAttributes)}
|${metadataStr.mkString("\n")}
""".stripMargin
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,6 @@ import scala.collection.mutable.ArrayBuffer
import org.apache.spark.sql.AnalysisException
import org.apache.spark.sql.catalyst.expressions.{Expression, PlanExpression}
import org.apache.spark.sql.catalyst.plans.QueryPlan
import org.apache.spark.sql.catalyst.trees.TreeNodeTag

object ExplainUtils {
/**
Expand Down Expand Up @@ -171,7 +170,7 @@ object ExplainUtils {
var currentCodegenId = -1
plan.foreach {
case p: WholeStageCodegenExec => currentCodegenId = p.codegenStageId
case p: InputAdapter => currentCodegenId = -1
case _: InputAdapter => currentCodegenId = -1
case other: QueryPlan[_] =>
if (currentCodegenId != -1) {
other.setTagValue(QueryPlan.CODEGEN_ID_TAG, currentCodegenId)
Expand All @@ -182,6 +181,17 @@ object ExplainUtils {
}
}

/**
* Generate detailed field string with different format based on type of input value
*/
def generateFieldString(fieldName: String, values: Any): String = values match {
case iter: Iterable[_] if (iter.size == 0) => s"${fieldName}: []"
case iter: Iterable[_] => s"${fieldName} [${iter.size}]: ${iter.mkString("[", ", ", "]")}"
case str: String if (str == null || str.isEmpty) => s"${fieldName}: None"
case str: String => s"${fieldName}: ${str}"
case _ => throw new IllegalArgumentException(s"Unsupported type for argument values: $values")
}

/**
* Given a input plan, returns an array of tuples comprising of :
* 1. Hosting opeator id.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@ import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
import org.apache.spark.sql.catalyst.plans.physical._
import org.apache.spark.sql.catalyst.trees.TreeNodeTag
import org.apache.spark.sql.execution.metric.SQLMetric
import org.apache.spark.sql.internal.SQLConf
import org.apache.spark.sql.vectorized.ColumnarBatch

object SparkPlan {
Expand Down Expand Up @@ -512,10 +513,22 @@ trait LeafExecNode extends SparkPlan {
override final def children: Seq[SparkPlan] = Nil
override def producedAttributes: AttributeSet = outputSet
override def verboseStringWithOperatorId(): String = {
s"""
|(${ExplainUtils.getOpId(this)}) $nodeName ${ExplainUtils.getCodegenId(this)}
|Output: ${producedAttributes.mkString("[", ", ", "]")}
""".stripMargin
val argumentString = argString(SQLConf.get.maxToStringFields)
val baseStr = s"(${ExplainUtils.getOpId(this)}) $nodeName ${ExplainUtils.getCodegenId(this)}"
val outputStr = s"${ExplainUtils.generateFieldString("Output", producedAttributes)}"

if (argumentString.nonEmpty) {
s"""
|$baseStr
|$outputStr
|Arguments: $argumentString
""".stripMargin
} else {
s"""
|$baseStr
|$outputStr
""".stripMargin
}
}
}

Expand All @@ -531,10 +544,22 @@ trait UnaryExecNode extends SparkPlan {

override final def children: Seq[SparkPlan] = child :: Nil
override def verboseStringWithOperatorId(): String = {
s"""
|(${ExplainUtils.getOpId(this)}) $nodeName ${ExplainUtils.getCodegenId(this)}
|Input: ${child.output.mkString("[", ", ", "]")}
""".stripMargin
val argumentString = argString(SQLConf.get.maxToStringFields)
val baseStr = s"(${ExplainUtils.getOpId(this)}) $nodeName ${ExplainUtils.getCodegenId(this)}"
val inputStr = s"${ExplainUtils.generateFieldString("Input", child.output)}"

if (argumentString.nonEmpty) {
s"""
|$baseStr
|$inputStr
|Arguments: $argumentString
""".stripMargin
} else {
s"""
|$baseStr
|$inputStr
""".stripMargin
}
}
}

Expand All @@ -544,10 +569,24 @@ trait BinaryExecNode extends SparkPlan {

override final def children: Seq[SparkPlan] = Seq(left, right)
override def verboseStringWithOperatorId(): String = {
s"""
|(${ExplainUtils.getOpId(this)}) $nodeName ${ExplainUtils.getCodegenId(this)}
|Left output: ${left.output.mkString("[", ", ", "]")}
|Right output: ${right.output.mkString("[", ", ", "]")}
""".stripMargin
val argumentString = argString(SQLConf.get.maxToStringFields)
val baseStr = s"(${ExplainUtils.getOpId(this)}) $nodeName ${ExplainUtils.getCodegenId(this)}"
val leftOutputStr = s"${ExplainUtils.generateFieldString("Left output", left.output)}"
val rightOutputStr = s"${ExplainUtils.generateFieldString("Right output", right.output)}"

if (argumentString.nonEmpty) {
s"""
|$baseStr
|$leftOutputStr
|$rightOutputStr
|Arguments: $argumentString
""".stripMargin
} else {
s"""
|$baseStr
|$leftOutputStr
|$rightOutputStr
""".stripMargin
}
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -31,18 +31,13 @@ trait BaseAggregateExec extends UnaryExecNode {
def resultExpressions: Seq[NamedExpression]

override def verboseStringWithOperatorId(): String = {
val inputString = child.output.mkString("[", ", ", "]")
val keyString = groupingExpressions.mkString("[", ", ", "]")
val functionString = aggregateExpressions.mkString("[", ", ", "]")
val aggregateAttributeString = aggregateAttributes.mkString("[", ", ", "]")
val resultString = resultExpressions.mkString("[", ", ", "]")
s"""
|(${ExplainUtils.getOpId(this)}) $nodeName ${ExplainUtils.getCodegenId(this)}
|Input: $inputString
|Keys: $keyString
|Functions: $functionString
|Aggregate Attributes: $aggregateAttributeString
|Results: $resultString
|${ExplainUtils.generateFieldString("Input", child.output)}
|${ExplainUtils.generateFieldString("Keys", groupingExpressions)}
|${ExplainUtils.generateFieldString("Functions", aggregateExpressions)}
|${ExplainUtils.generateFieldString("Aggregate Attributes", aggregateAttributes)}
|${ExplainUtils.generateFieldString("Results", resultExpressions)}
""".stripMargin
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -86,8 +86,8 @@ case class ProjectExec(projectList: Seq[NamedExpression], child: SparkPlan)
override def verboseStringWithOperatorId(): String = {
s"""
|(${ExplainUtils.getOpId(this)}) $nodeName ${ExplainUtils.getCodegenId(this)}
|Output : ${projectList.mkString("[", ", ", "]")}
|Input : ${child.output.mkString("[", ", ", "]")}
|${ExplainUtils.generateFieldString("Output", projectList)}
|${ExplainUtils.generateFieldString("Input", child.output)}
""".stripMargin
}
}
Expand Down Expand Up @@ -243,7 +243,7 @@ case class FilterExec(condition: Expression, child: SparkPlan)
override def verboseStringWithOperatorId(): String = {
s"""
|(${ExplainUtils.getOpId(this)}) $nodeName ${ExplainUtils.getCodegenId(this)}
|Input : ${child.output.mkString("[", ", ", "]")}
|${ExplainUtils.generateFieldString("Input", child.output)}
|Condition : ${condition}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove the space before ":"?

(3) Filter [codegen id : 1]
Input [1]: [col.dots#22]
Condition : (isnotnull(col.dots#22) AND (col.dots#22 = 500))

""".stripMargin
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -92,7 +92,7 @@ case class ReusedExchangeExec(override val output: Seq[Attribute], child: Exchan
val reuse_op_str = ExplainUtils.getOpId(child)
s"""
|(${ExplainUtils.getOpId(this)}) $nodeName ${cdgen} [Reuses operator id: $reuse_op_str]
|Output : ${output}
|${ExplainUtils.generateFieldString("Output", output)}
""".stripMargin
}
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,7 @@ case class CartesianProductExec(

s"""
|(${ExplainUtils.getOpId(this)}) $nodeName ${ExplainUtils.getCodegenId(this)}
|Join condition: ${joinCondStr}
|${ExplainUtils.generateFieldString("Join condition", joinCondStr)}
""".stripMargin
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -49,9 +49,9 @@ trait HashJoin {

s"""
|(${ExplainUtils.getOpId(this)}) $nodeName ${ExplainUtils.getCodegenId(this)}
|Left keys: ${leftKeys}
|Right keys: ${rightKeys}
|Join condition: ${joinCondStr}
|${ExplainUtils.generateFieldString("Left keys", leftKeys)}
|${ExplainUtils.generateFieldString("Right keys", rightKeys)}
|${ExplainUtils.generateFieldString("Join condition", joinCondStr)}
""".stripMargin
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -63,9 +63,9 @@ case class SortMergeJoinExec(
} else "None"
s"""
|(${ExplainUtils.getOpId(this)}) $nodeName ${ExplainUtils.getCodegenId(this)}
|Left keys : ${leftKeys}
|Right keys: ${rightKeys}
|Join condition : ${joinCondStr}
|${ExplainUtils.generateFieldString("Left keys", leftKeys)}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might not be related to this PR, but can we do the same thing as https://github.com/apache/spark/pull/27368/files#diff-ddb517fe44ae649ddda3c733c2adcb76R70 for joins? Just for symmetry and future handiness.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make HashJoin extend BinaryExecNode, and ShuffledHashJoinExec/BroadcastHashJoinExec extend HashJoin, right? Yea, I can make it here together :-)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. I meant creating a trait for all physical joins. It'll make pattern matching easier although we don't have this requirement right now. We could do it in a follow-up.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, yea. I just recall the conversation, thanks for your explanation :-)
I'll submit a follow-up PR for joins accordingly.

Copy link
Contributor Author

@Eric5553 Eric5553 Feb 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR #27595 opened for this follow-up.

|${ExplainUtils.generateFieldString("Right keys", rightKeys)}
|${ExplainUtils.generateFieldString("Join condition", joinCondStr)}
""".stripMargin
}

Expand Down
Loading