Skip to content

[SPARK-34399][SQL] Add commit duration to SQL tab's graph node. #31522

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 34 commits into from

Conversation

AngersZhuuuu
Copy link
Contributor

@AngersZhuuuu AngersZhuuuu commented Feb 8, 2021

What changes were proposed in this pull request?

Since we have add log about commit time, I think this useful and we can make user know it directly in SQL tab's UI.

image

Why are the changes needed?

Make user can directly know commit duration.

Does this PR introduce any user-facing change?

User can see file commit duration in SQL tab's SQL plan graph

How was this patch tested?

Mannul tested

@AngersZhuuuu AngersZhuuuu marked this pull request as draft February 8, 2021 08:59
@SparkQA
Copy link

SparkQA commented Feb 8, 2021

Test build #135016 has finished for PR 31522 at commit d065bcd.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AngersZhuuuu AngersZhuuuu marked this pull request as ready for review February 9, 2021 07:09
@SparkQA
Copy link

SparkQA commented Feb 9, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39648/

@SparkQA
Copy link

SparkQA commented Feb 9, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39648/

@SparkQA
Copy link

SparkQA commented Feb 9, 2021

Test build #135066 has finished for PR 31522 at commit 89f8201.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • trait UserDefinedExpression
  • case class SubqueryAdaptiveBroadcastExec(
  • case class PlanAdaptiveDynamicPruningFilters(
  • case class PlanAdaptiveSubqueries(

@AngersZhuuuu
Copy link
Contributor Author

AngersZhuuuu commented Feb 20, 2021

Gentle ping @HeartSaVioR @dongjoon-hyun @HyukjinKwon @maropu @cloud-fan Could you help to review this I think it's really help since always INSERT statement slow caused by commit file.

@SparkQA
Copy link

SparkQA commented Feb 20, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39868/

@SparkQA
Copy link

SparkQA commented Feb 20, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39868/

@SparkQA
Copy link

SparkQA commented Feb 20, 2021

Test build #135289 has finished for PR 31522 at commit 9ddd28c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

The file commit is a driver side thing, why do we need to update BasicWriteJobStatsTracker? I think we can follow BroadcastExchangeExec and simply call SQLMetrics.postDriverMetricUpdates

@AngersZhuuuu
Copy link
Contributor Author

AngersZhuuuu commented Feb 23, 2021

The file commit is a driver side thing, why do we need to update BasicWriteJobStatsTracker? I think we can follow BroadcastExchangeExec and simply call SQLMetrics.postDriverMetricUpdates

Since we compute DataWritingCommand's metrics in driver side and all metrics stored in BasicWriteJobStatsTracker , so I changed BasicWriteJobStatsTracker.

If we want to directly update this value, we need to add it in

lazy val metrics: Map[String, SQLMetric] = BasicWriteJobStatsTracker.metrics

Then pass the metrics parameter to FileFormatWriter.writter() then post this metric after complete file committing.

@SparkQA
Copy link

SparkQA commented Jul 21, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45944/

@SparkQA
Copy link

SparkQA commented Jul 21, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45944/

@SparkQA
Copy link

SparkQA commented Jul 21, 2021

Test build #141422 has finished for PR 31522 at commit 8107d20.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 21, 2021

Test build #141425 has finished for PR 31522 at commit 9b5aa94.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

NUM_PARTS_KEY -> SQLMetrics.createMetric(sparkContext, "number of dynamic part")
NUM_PARTS_KEY -> SQLMetrics.createMetric(sparkContext, "number of dynamic part"),
TASK_COMMIT_TIME ->
SQLMetrics.createTimingMetric(sparkContext, "time of committing the tasks"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make it task commit time for short

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

NUM_PARTS_KEY -> SQLMetrics.createMetric(sparkContext, "number of dynamic part"),
TASK_COMMIT_TIME ->
SQLMetrics.createTimingMetric(sparkContext, "time of committing the tasks"),
JOB_COMMIT_TIME -> SQLMetrics.createTimingMetric(sparkContext, "time of committing the job")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make it job commit time for short

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Member

@gengliangwang gengliangwang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM otherwise

@gengliangwang
Copy link
Member

@AngersZhuuuu Thanks. Please update the screenshot in the PR description as well.

@SparkQA
Copy link

SparkQA commented Jul 22, 2021

Test build #141495 has finished for PR 31522 at commit b5c9d63.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AngersZhuuuu
Copy link
Contributor Author

@AngersZhuuuu Thanks. Please update the screenshot in the PR description as well.

DOne

@AngersZhuuuu
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented Jul 22, 2021

Test build #141501 has started for PR 31522 at commit b5c9d63.

@SparkQA
Copy link

SparkQA commented Jul 22, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46011/

@SparkQA
Copy link

SparkQA commented Jul 22, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46018/

@SparkQA
Copy link

SparkQA commented Jul 22, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46011/

@SparkQA
Copy link

SparkQA commented Jul 22, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46018/

@dongjoon-hyun
Copy link
Member

Retest this please

@SparkQA
Copy link

SparkQA commented Jul 22, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46034/

@SparkQA
Copy link

SparkQA commented Jul 22, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46034/

@SparkQA
Copy link

SparkQA commented Jul 22, 2021

Test build #141517 has finished for PR 31522 at commit b5c9d63.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

thanks, merging to master/3.2 (as it replaces the log we added in 3.2: #33279)

@cloud-fan cloud-fan closed this in 3ff8c9f Jul 23, 2021
@cloud-fan
Copy link
Contributor

oh it has conflicts. @AngersZhuuuu can you open a backport PR? thanks!

cloud-fan pushed a commit that referenced this pull request Jul 28, 2021
…b commit time

### What changes were proposed in this pull request?

This is follow-up of #31522.
It adds docs for the new metrics of task/job commit time

### Why are the changes needed?

So that users can understand the metrics better and know that the new metrics are only for file table writes.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Build docs and preview:
![image](https://user-images.githubusercontent.com/1097932/127198210-2ab201d3-5fca-4065-ace6-0b930390380f.png)

Closes #33542 from gengliangwang/addDocForMetrics.

Authored-by: Gengliang Wang <gengliang@apache.org>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
cloud-fan pushed a commit that referenced this pull request Jul 28, 2021
…b commit time

### What changes were proposed in this pull request?

This is follow-up of #31522.
It adds docs for the new metrics of task/job commit time

### Why are the changes needed?

So that users can understand the metrics better and know that the new metrics are only for file table writes.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Build docs and preview:
![image](https://user-images.githubusercontent.com/1097932/127198210-2ab201d3-5fca-4065-ace6-0b930390380f.png)

Closes #33542 from gengliangwang/addDocForMetrics.

Authored-by: Gengliang Wang <gengliang@apache.org>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit c9a7ff3)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
AngersZhuuuu added a commit to AngersZhuuuu/spark that referenced this pull request Jul 28, 2021
Since we have add log about commit time, I think this useful and we can make user know it directly in SQL tab's UI.

![image](https://user-images.githubusercontent.com/46485123/126647754-dc3ba83a-5391-427c-8a67-e6af46e82290.png)

Make user can directly know commit duration.

User can see file commit duration in SQL tab's SQL plan graph

Mannul tested

Closes apache#31522 from AngersZhuuuu/SPARK-34399.

Lead-authored-by: Angerszhuuuu <angers.zhu@gmail.com>
Co-authored-by: AngersZhuuuu <angers.zhu@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
@@ -788,6 +792,24 @@ class SQLMetricsSuite extends SharedSparkSession with SQLMetricsTestUtils
}
}

test("SPARK-34399: Add job commit duration metrics for DataWritingCommand") {
Copy link
Member

@HyukjinKwon HyukjinKwon Dec 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just happened to see the flakiness of the PR here: https://github.com/apache/spark/runs/4378978842

sbt.ForkMain$ForkError: org.scalatest.exceptions.TestFailedException: 0 was not greater than 0
	at org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:472)
	at org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:471)
	at org.scalatest.Assertions$.newAssertionFailedException(Assertions.scala:1231)
	at org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:1295)
	at org.apache.spark.sql.execution.metric.SQLMetricsSuite.$anonfun$new$87(SQLMetricsSuite.scala:810)
	at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.scala:18)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1462)
	at org.apache.spark.sql.test.SQLTestUtilsBase.withTable(SQLTestUtils.scala:305)
	at org.apache.spark.sql.test.SQLTestUtilsBase.withTable$(SQLTestUtils.scala:303)
	at org.apache.spark.sql.execution.metric.SQLMetricsSuite.withTable(SQLMetricsSuite.scala:44)
	at org.apache.spark.sql.execution.metric.SQLMetricsSuite.$anonfun$new$86(SQLMetricsSuite.scala:800)
	at org.apache.spark.sql.catalyst.plans.SQLHelper.withSQLConf(SQLHelper.scala:54)
	at org.apache.spark.sql.catalyst.plans.SQLHelper.withSQLConf$(SQLHelper.scala:38)
	at org.apache.spark.sql.execution.metric.SQLMetricsSuite.org$apache$spark$sql$test$SQLTestUtilsBase$$super$withSQLConf(SQLMetricsSuite.scala:44)
	at org.apache.spark.sql.test.SQLTestUtilsBase.withSQLConf(SQLTestUtils.scala:246)
	at org.apache.spark.sql.test.SQLTestUtilsBase.withSQLConf$(SQLTestUtils.scala:244)
	at org.apache.spark.sql.execution.metric.SQLMetricsSuite.withSQLConf(SQLMetricsSuite.scala:44)
	at org.apache.spark.sql.execution.metric.SQLMetricsSuite.$anonfun$new$85(SQLMetricsSuite.scala:800)
	at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.scala:18)
	at org.apache.spark.sql.execution.adaptive.DisableAdaptiveExecutionSuite.$anonfun$test$5(AdaptiveTestUtils.scala:65)
	at org.apache.spark.sql.catalyst.plans.SQLHelper.withSQLConf(SQLHelper.scala:54)
	at org.apache.spark.sql.catalyst.plans.SQLHelper.withSQLConf$(SQLHelper.scala:38)
	at org.apache.spark.sql.execution.metric.SQLMetricsSuite.org$apache$spark$sql$test$SQLTestUtilsBase$$super$withSQLConf(SQLMetricsSuite.scala:44)
	at org.apache.spark.sql.test.SQLTestUtilsBase.withSQLConf(SQLTestUtils.scala:246)
	at org.apache.spark.sql.test.SQLTestUtilsBase.withSQLConf$(SQLTestUtils.scala:244)
	at org.apache.spark.sql.execution.metric.SQLMetricsSuite.withSQLConf(SQLMetricsSuite.scala:44)
	at org.apache.spark.sql.execution.adaptive.DisableAdaptiveExecutionSuite.$anonfun$test$4(AdaptiveTestUtils.scala:65)
	at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.scala:18)
	at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
	at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
	at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
	at org.scalatest.Transformer.apply(Transformer.scala:22)
	at org.scalatest.Transformer.apply(Transformer.scala:20)
	at org.scalatest.funsuite.AnyFunSuiteLike$$anon$1.apply(AnyFunSuiteLike.scala:226)
	at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:190)
	at org.scalatest.funsuite.AnyFunSuiteLike.invokeWithFixture$1(AnyFunSuiteLike.scala:224)
	at org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTest$1(AnyFunSuiteLike.scala:236)
	at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306)
	at org.scalatest.funsuite.AnyFunSuiteLike.runTest(AnyFunSuiteLike.scala:236)
	at org.scalatest.funsuite.AnyFunSuiteLike.runTest$(AnyFunSuiteLike.scala:218)
	at org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(SparkFunSuite.scala:62)
	at org.scalatest.BeforeAndAfterEach.runTest(BeforeAndAfterEach.scala:234)
	at org.scalatest.BeforeAndAfterEach.runTest$(BeforeAndAfterEach.scala:227)
	at org.apache.spark.SparkFunSuite.runTest(SparkFunSuite.scala:62)
	at org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTests$1(AnyFunSuiteLike.scala:269)

Seems not super flaky though. I am noting it here in case other people see this failure more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants