-
Notifications
You must be signed in to change notification settings - Fork 28.6k
[SPARK-3984] [SPARK-3983] Fix incorrect scheduler delay and display task deserialization time in UI #2832
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
QA tests have started for PR 2832 at commit
|
QA tests have finished for PR 2832 at commit
|
Test FAILed. |
retest this please |
QA tests have started for PR 2832 at commit
|
QA tests have finished for PR 2832 at commit
|
Test FAILed. |
@kayousterhout - This is failing scalastyle checks -- Could you run style check locally ? |
I'm holding off on this until I finish https://issues.apache.org/jira/browse/SPARK-4016 due to the concern that otherwise these new metrics will add confusion for the average user. |
Now that I've merged #2867, this should be unblocked. |
Thanks @JoshRosen -- fixing this up now! |
Just a head's up: I merged #3031, the "hide accumulators column when empty" patch, so it's likely to cause conflicts here; you might want to merge / rebase. |
This commit fixes the scheduler delay in the UI (which previously included things that are not scheduler delay, like time to deserialize the task and serialize the result), and also adds finer-grained information to the summary table for each stage about task launch overhead (which is useful for debugging performance of short jobs, where the overhead is not-insignificant).
81fb86b
to
335be4b
Compare
Updated this so the two new metrics are hideable! |
Test build #22666 has started for PR 2832 at commit
|
Test build #22666 has finished for PR 2832 at commit
|
Test PASSed. |
@JoshRosen @andrewor14 does one of you have time to take a look at this? |
Test build #22710 has started for PR 2832 at commit
|
Test build #22710 has finished for PR 2832 at commit
|
Test PASSed. |
Test build #22920 has started for PR 2832 at commit
|
Test build #22920 has finished for PR 2832 at commit
|
Test PASSed. |
@pwendell as per our discussion, I changed this to eliminate the additional metric about task launch, so now this change just fixes the scheduler delay to be correct, and show task deserialization time in the UI. Does this look OK? |
LGTM! |
Test build #22946 has started for PR 2832 at commit
|
Test build #22946 has finished for PR 2832 at commit
|
Test FAILed. |
Jenkins, retest this please |
Test build #22949 has started for PR 2832 at commit
|
Test build #22949 has finished for PR 2832 at commit
|
Test PASSed. |
LGTM too |
…ask deserialization time in UI This commit fixes the scheduler delay in the UI (which previously included things that are not scheduler delay, like time to deserialize the task and serialize the result), and also adds information about time to deserialize tasks to the optional additional metrics. Time to deserialize the task can be large relative to task time for short jobs, and understanding when it is high can help developers realize that they should try to reduce closure size (e.g, by including less data in the task description). cc shivaram etrain Author: Kay Ousterhout <kayousterhout@gmail.com> Closes #2832 from kayousterhout/SPARK-3983 and squashes the following commits: 0c1398e [Kay Ousterhout] Fixed ordering 531575d [Kay Ousterhout] Removed executor launch time 1f13afe [Kay Ousterhout] Minor spacing fixes 335be4b [Kay Ousterhout] Made metrics hideable 5bc3cba [Kay Ousterhout] [SPARK-3984] [SPARK-3983] Improve UI task metrics. (cherry picked from commit a46497e) Signed-off-by: Kay Ousterhout <kayousterhout@gmail.com>
Thanks for looking at this @andrewor14 and @pwendell ! I've merged it into master and 1.2. |
This commit fixes the scheduler delay in the UI (which previously
included things that are not scheduler delay, like time to
deserialize the task and serialize the result), and also
adds information about time to deserialize tasks to the optional
additional metrics. Time to deserialize the task can be large relative
to task time for short jobs, and understanding when it is high can help
developers realize that they should try to reduce closure size (e.g, by including
less data in the task description).
cc @shivaram @etrain