Skip to content

[SPARK-36070][CORE] Log time cost info for writing rows out and committing the task #33279

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

yaooqinn
Copy link
Member

@yaooqinn yaooqinn commented Jul 9, 2021

What changes were proposed in this pull request?

We have a job that has a stage that contains about 8k tasks. Most tasks take about 1~10min to finish but 3 of them tasks run extremely slow with similar data sizes. They take about 1 hour each to finish and also do their speculations.

The root cause is most likely the delay of the storage system. But it's not straightforward enough to find where the performance issue occurs, in the phase of shuffle read, task execution, output, commitment e.t.c..

2021-07-09 03:05:17 CST SparkHadoopMapRedUtil INFO - attempt_20210709022249_0003_m_007050_37351: Committed
2021-07-09 03:05:17 CST Executor INFO - Finished task 7050.0 in stage 3.0 (TID 37351). 3311 bytes result sent to driver
2021-07-09 04:06:10 CST ShuffleBlockFetcherIterator INFO - Getting 9 non-empty blocks including 0 local blocks and 9 remote blocks
2021-07-09 04:06:10 CST TransportClientFactory INFO - Found inactive connection to

Why are the changes needed?

On the spark side, we can record the time cost in logs for better bug hunting or performance tuning.

Does this PR introduce any user-facing change?

no

How was this patch tested?

passing GA

@yaooqinn
Copy link
Member Author

yaooqinn commented Jul 9, 2021

cc @cloud-fan @maropu @dongjoon-hyun, thanks

@SparkQA
Copy link

SparkQA commented Jul 9, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45355/

@yaooqinn yaooqinn changed the title [SPARK-36070][CORE] Log time cost info for writing rows out and committing the task. [SPARK-36070][CORE] Log time cost info for writing rows out and committing the task Jul 9, 2021
@SparkQA
Copy link

SparkQA commented Jul 9, 2021

Kubernetes integration test unable to build dist.

exiting with code: 1
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45358/

@SparkQA
Copy link

SparkQA commented Jul 9, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45355/

@SparkQA
Copy link

SparkQA commented Jul 9, 2021

Test build #140844 has finished for PR 33279 at commit 7950524.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 9, 2021

Test build #140847 has finished for PR 33279 at commit fb3af2a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@yaooqinn yaooqinn closed this in f5a6332 Jul 9, 2021
@yaooqinn
Copy link
Member Author

yaooqinn commented Jul 9, 2021

thanks, merged to master

@yaooqinn yaooqinn deleted the SPARK-36070 branch July 9, 2021 16:55
dataWriter.writeWithIterator(iterator)
dataWriter.commit()
}
logInfo(s"$taskAttemptID finished to write and commit. Elapsed time: $timeCost ms.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After some more thought, I think it's better to use SQL metrics for it. It's very hard to know max/min/avg by reading the logs.

@AngersZhuuuu I think you tried it before. Can you restore the work?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After some more thought, I think it's better to use SQL metrics for it. It's very hard to know max/min/avg by reading the logs.

@AngersZhuuuu I think you tried it before. Can you restore the work?

Yea, working on this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants