Skip to content

[SPARK-3984] [SPARK-3983] Fix incorrect scheduler delay and display task deserialization time in UI #2832

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 5 commits into from

Conversation

kayousterhout
Copy link
Contributor

This commit fixes the scheduler delay in the UI (which previously
included things that are not scheduler delay, like time to
deserialize the task and serialize the result), and also
adds information about time to deserialize tasks to the optional
additional metrics. Time to deserialize the task can be large relative
to task time for short jobs, and understanding when it is high can help
developers realize that they should try to reduce closure size (e.g, by including
less data in the task description).

cc @shivaram @etrain

@SparkQA
Copy link

SparkQA commented Oct 17, 2014

QA tests have started for PR 2832 at commit 81fb86b.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Oct 17, 2014

QA tests have finished for PR 2832 at commit 81fb86b.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21828/
Test FAILed.

@andrewor14
Copy link
Contributor

retest this please

@SparkQA
Copy link

SparkQA commented Oct 20, 2014

QA tests have started for PR 2832 at commit 81fb86b.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Oct 20, 2014

QA tests have finished for PR 2832 at commit 81fb86b.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21927/
Test FAILed.

@shivaram
Copy link
Contributor

@kayousterhout - This is failing scalastyle checks -- Could you run style check locally ?

@kayousterhout
Copy link
Contributor Author

I'm holding off on this until I finish https://issues.apache.org/jira/browse/SPARK-4016 due to the concern that otherwise these new metrics will add confusion for the average user.

@JoshRosen
Copy link
Contributor

Now that I've merged #2867, this should be unblocked.

@kayousterhout
Copy link
Contributor Author

Thanks @JoshRosen -- fixing this up now!

@JoshRosen
Copy link
Contributor

Just a head's up: I merged #3031, the "hide accumulators column when empty" patch, so it's likely to cause conflicts here; you might want to merge / rebase.

This commit fixes the scheduler delay in the UI (which previously
included things that are not scheduler delay, like time to
deserialize the task and serialize the result), and also
adds finer-grained information to the summary table for each
stage about task launch overhead (which is useful for debugging
performance of short jobs, where the overhead is not-insignificant).
@kayousterhout
Copy link
Contributor Author

Updated this so the two new metrics are hideable!

@SparkQA
Copy link

SparkQA commented Nov 1, 2014

Test build #22666 has started for PR 2832 at commit 335be4b.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Nov 1, 2014

Test build #22666 has finished for PR 2832 at commit 335be4b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22666/
Test PASSed.

@kayousterhout
Copy link
Contributor Author

@JoshRosen @andrewor14 does one of you have time to take a look at this?

@SparkQA
Copy link

SparkQA commented Nov 1, 2014

Test build #22710 has started for PR 2832 at commit 1f13afe.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Nov 2, 2014

Test build #22710 has finished for PR 2832 at commit 1f13afe.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22710/
Test PASSed.

@kayousterhout kayousterhout changed the title [SPARK-3984] [SPARK-3983] Improve UI task metrics. [SPARK-3984] [SPARK-3983] Fix incorrect scheduler delay and display task deserialization time in UI Nov 5, 2014
@SparkQA
Copy link

SparkQA commented Nov 5, 2014

Test build #22920 has started for PR 2832 at commit 531575d.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Nov 5, 2014

Test build #22920 has finished for PR 2832 at commit 531575d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22920/
Test PASSed.

@kayousterhout
Copy link
Contributor Author

@pwendell as per our discussion, I changed this to eliminate the additional metric about task launch, so now this change just fixes the scheduler delay to be correct, and show task deserialization time in the UI. Does this look OK?

@pwendell
Copy link
Contributor

pwendell commented Nov 5, 2014

LGTM!

@SparkQA
Copy link

SparkQA commented Nov 5, 2014

Test build #22946 has started for PR 2832 at commit 0c1398e.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Nov 5, 2014

Test build #22946 has finished for PR 2832 at commit 0c1398e.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22946/
Test FAILed.

@kayousterhout
Copy link
Contributor Author

Jenkins, retest this please

@SparkQA
Copy link

SparkQA commented Nov 5, 2014

Test build #22949 has started for PR 2832 at commit 0c1398e.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Nov 5, 2014

Test build #22949 has finished for PR 2832 at commit 0c1398e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22949/
Test PASSed.

@andrewor14
Copy link
Contributor

LGTM too

asfgit pushed a commit that referenced this pull request Nov 5, 2014
…ask deserialization time in UI

This commit fixes the scheduler delay in the UI (which previously
included things that are not scheduler delay, like time to
deserialize the task and serialize the result), and also
adds information about time to deserialize tasks to the optional
additional metrics.  Time to deserialize the task can be large relative
to task time for short jobs, and understanding when it is high can help
developers realize that they should try to reduce closure size (e.g, by including
less data in the task description).

cc shivaram etrain

Author: Kay Ousterhout <kayousterhout@gmail.com>

Closes #2832 from kayousterhout/SPARK-3983 and squashes the following commits:

0c1398e [Kay Ousterhout] Fixed ordering
531575d [Kay Ousterhout] Removed executor launch time
1f13afe [Kay Ousterhout] Minor spacing fixes
335be4b [Kay Ousterhout] Made metrics hideable
5bc3cba [Kay Ousterhout] [SPARK-3984] [SPARK-3983] Improve UI task metrics.

(cherry picked from commit a46497e)
Signed-off-by: Kay Ousterhout <kayousterhout@gmail.com>
@asfgit asfgit closed this in a46497e Nov 5, 2014
@kayousterhout
Copy link
Contributor Author

Thanks for looking at this @andrewor14 and @pwendell ! I've merged it into master and 1.2.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants