-
Notifications
You must be signed in to change notification settings - Fork 28.6k
SPARK-3014. Log a more informative messages in a couple failure scenario... #1934
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
QA tests have started for PR 1934. This patch merges cleanly. |
} | ||
|
||
uiAddress = sparkContext.ui.appUIHostPort |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Won't this throw an NPE if sparkContext == null
? Also, it seems that this whole code block assumes sparkContext
is not null, which may not be true. Maybe we should put this code block in an else case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just realized, even in the old code it's bound to throw an NPE if sparkContext == null
, because we access it in L236 of the old code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. I thought finishApplicationMaster actually handled the shutdown so we wouldn't get past there. Fixing this.
Tests actually passed, just not posted here |
Updated patch fixes an issue that @andrewor14 pointed out. |
|
||
if (sparkContext != null) { | ||
if (sparkContext == null) { | ||
logError(("Unable to retrieve SparkContext in spite of waiting for %d, maxNumTries = %d." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This language is quite specific to the internal implementation and might be a bit confusing for users. For instance, what does it mean to "retrieve" a SparkContext? It might be better to make it more consistent with the language in the two other log statements and say something like:
Spark context did not initialize properly after waiting for XXX.
Please check earlier log output for errors. Failing the application.
Also - I don't think this will print a unit for the wait time. I believe the unit is ms
so it might be good to add that.
Hey @sryza - left a minor comment on language, but this looks good overall. |
test this please |
QA tests have started for PR 1934 at commit
|
QA tests have finished for PR 1934 at commit
|
test this please |
QA tests have started for PR 1934 at commit
|
QA tests have finished for PR 1934 at commit
|
Hey @sryza, This looks like a good change; sorry for allowing it to sit unreviewed for so long. Do you mind updating this to address the merge conflicts? |
36eb7db
to
ae19cc1
Compare
QA tests have started for PR 1934 at commit
|
QA tests have finished for PR 1934 at commit
|
@@ -270,11 +270,9 @@ private[spark] class ApplicationMaster(args: ApplicationMasterArguments, | |||
} | |||
|
|||
val sparkContext = sparkContextRef.get() | |||
assert(sparkContext != null || count >= numTries) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did you remove this assertion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the assertion fails, it's a consequence of user error, not a logic error in Spark's code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, actually misread it. Will add it back in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On third thought, I don't really think the assert is useful. Adding it back in would require adding "|| finished", which basically makes it look exactly the same as the while loop condition above. It doesn't make the code any easier to understand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. This assertion technically had a chance to fail before, if we haven't incremented count all the way and finished
becomes true, and the assertion failure is meaningless.
@JoshRosen any other comments on this |
This looks good to me. I tested |
Jenkins, test this please |
retest this please |
QA tests have started for PR 1934 at commit
|
QA tests have finished for PR 1934 at commit
|
Alright, LGTM. Feel free to merge this. |
...s