Skip to content

SPARK-2298: Show stage attempt in UI #1384

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

tsudukim
Copy link
Contributor

Added attempt ID column into stage page of webUI.
Added attemptId handling code into StageInfo, JsonProtocol.
Modified DAGScheduler to identify stages whose stageId is same but attemptId is different.
Modified testcode for stage attempt ID.

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@tsudukim
Copy link
Contributor Author

Attempt Id shows up in web ui. Submitted and Duration became individual value to stage attempts.
spark-2298

@pwendell
Copy link
Contributor

To make this a bit more concise, what about having one column on the left whose header is ID: Attempt and separating them with a colon. Current the word "Stage" is redundant there.

@pwendell
Copy link
Contributor

/cc @rxin who is interested in this.

@tsudukim
Copy link
Contributor Author

@pwendell Thank you for your response. You mean like this?
spark-2298-2

@andrewor14
Copy link
Contributor

Hm the latest screenshot looks a little funky to me.

Most stages will only have 1 attempt, so I think it makes sense to only show the attempt if this is not the first one. Something like:

ID
---------------
4
2
3 (Attempt 2)
1
0

@tsudukim
Copy link
Contributor Author

@andrewor14 Thank you for your comment.
I think it is more weird if the display style of ID/attempt changes by conditions.
Surely most stages will only have 1 attempt, but as the set of stage id and stage attempt id is the identifier of the taskset, I'd like to show the attempt id even if the stage is the first one.

@rxin
Copy link
Contributor

rxin commented Jul 12, 2014

@tsudukim The concept of TaskSet should be internal to Spark. Users shouldn't have to aware of task set. Users should only care about stage + attempt.

@tsudukim
Copy link
Contributor Author

@rxin OK, thanks. Then attempt id is still required in the web ui for users to know stage + attempt. Have I got that right?

@rxin
Copy link
Contributor

rxin commented Jul 12, 2014

Yup - but let's avoid exposing the concept of TaskSet to users in the UI. That's only for internal engineering.

@tsudukim
Copy link
Contributor Author

I'm wondering how to show it. I gave it a shot. Is it smart?
spark-2298-3

@@ -478,6 +479,7 @@ private[spark] object JsonProtocol {

def stageInfoFromJson(json: JValue): StageInfo = {
val stageId = (json \ "Stage ID").extract[Int]
val attemptId = (json \ "Attempt ID").extract[Int]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this backwards compatible?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, no it's not. How about this?

  val attemptId = (json \ "Attempt ID").extractOpt[Int].getOrElse(0)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a backwards compatible test in JsonProtocolSuite?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure.

@pwendell
Copy link
Contributor

@tsudukim @kayousterhout so I think in general here, our handling of stage re-submissions is broken in the UI. For instance, I looked in the JobProgressListener and we index many things on StageId that might better be indexed on StageId, AttemptId. Also, we should probably give the AttemptId when start a task so that we understand which stage attempt to associated it with. I also don't understand exactly what happens when a stage gets re-attempted, do we send a "stage completed" event? It might be good to fix the way we deal with stage re-submissions to make this work better overall.

@kayousterhout
Copy link
Contributor

@rxin is this something you've thought about in your various schedule refactoring things?

@pwendell
Copy link
Contributor

@tsudukim I created a JIRA to deal with the broader issue. If you want to take that on as well, let me know: https://issues.apache.org/jira/browse/SPARK-2501 it might make sense to wrap it into this patch.

@tsudukim
Copy link
Contributor Author

@pwendell I agree that there are many room for improvement about handling of stageId and attemptId. It might be better to break this problems into some sub-tasks. I think this patch should be one of them. Or did you mean we should fix all of this problem in one patch?

@rxin
Copy link
Contributor

rxin commented Jul 16, 2014

Let's hold off merging this one until we merge #1262. Then it will be easier to index the information based on stage + attempt.

@tsudukim
Copy link
Contributor Author

@rxin OK. After that, I think I can make this patch better.

@tsudukim
Copy link
Contributor Author

@rxin in #1262, can I expect the key of the stagedata in JobProgressListener become stageId + attemptId instead of stageId only?

tsudukim added 2 commits July 18, 2014 17:44
Added attempt ID column into stage page of webUI.
Added attemptId handling code into StageInfo, JsonProtocol.
Modified DAGScheduler to identify stages whose stageId is same but attemptId is different.
Modified testcode for stage attempt ID.
Modified format of stageId and attemtId.
Modified to backward compatible style.
Added backward compatibility test.
@tsudukim
Copy link
Contributor Author

Modified PR as your comments. thank you!

@rxin
Copy link
Contributor

rxin commented Jul 23, 2014

It turned out much trickier than I thought to add attempt id. I submitted a PR here #1545

That PR already modifies the UI, since that's the only way I could test.

@lianhuiwang
Copy link
Contributor

i think we can add jobid to stageTable/UI. because jobid is very useful when a application has many jobs.that can distinguish every job's stages.

@tsudukim
Copy link
Contributor Author

@rxin Surely we can also fix them all in one patch. But it can be a little bit hard work to modify them compatibly in one patch so I just have thought to separate into several tasks and to make #1384 as first step by showing only attemptId to distinguish attempts.
You can take whichever is convenient for you.

@tsudukim
Copy link
Contributor Author

@lianhuiwang It appears to be a different problem to SPARK-2298.
Is your aim same as this ticket?
https://issues.apache.org/jira/browse/SPARK-1362
If so, how about creating another PR to modify it?

@lianhuiwang
Copy link
Contributor

@tsudukim yes,SPARK-2298 is that i want to. but i think a simple way is on this PR add a jobid column to stage table.it is very easy to achieve it.

@pwendell
Copy link
Contributor

pwendell commented Sep 2, 2014

I think this was ultimately fixed by #1545 so we can close this issue. But feel free to open another PR if that one did not fix this.

@asfgit asfgit closed this in 1f98add Sep 2, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants