Skip to content

Commit 0b6775e

Browse files
LantaoJinjerryshao
authored andcommitted
[SPARK-29112][YARN] Expose more details when ApplicationMaster reporter faces a fatal exception
### What changes were proposed in this pull request? In `ApplicationMaster.Reporter` thread, fatal exception information is swallowed. It's better to expose it. We found our thrift server was shutdown due to a fatal exception but no useful information from log. > 19/09/16 06:59:54,498 INFO [Reporter] yarn.ApplicationMaster:54 : Final app status: FAILED, exitCode: 12, (reason: Exception was thrown 1 time(s) from Reporter thread.) 19/09/16 06:59:54,500 ERROR [Driver] thriftserver.HiveThriftServer2:91 : Error starting HiveThriftServer2 java.lang.InterruptedException: sleep interrupted at java.lang.Thread.sleep(Native Method) at org.apache.spark.sql.hive.thriftserver.HiveThriftServer2$.main(HiveThriftServer2.scala:160) at org.apache.spark.sql.hive.thriftserver.HiveThriftServer2.main(HiveThriftServer2.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$4.run(ApplicationMaster.scala:708) ### Does this PR introduce any user-facing change? No ### How was this patch tested? Manual test Closes #25810 from LantaoJin/SPARK-29112. Authored-by: LantaoJin <jinlantao@gmail.com> Signed-off-by: jerryshao <jerryshao@tencent.com>
1 parent eef5e6d commit 0b6775e

File tree

1 file changed

+5
-1
lines changed

1 file changed

+5
-1
lines changed

resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -578,7 +578,11 @@ private[spark] class ApplicationMaster(
578578
e.getMessage)
579579
case e: Throwable =>
580580
failureCount += 1
581-
if (!NonFatal(e) || failureCount >= reporterMaxFailures) {
581+
if (!NonFatal(e)) {
582+
finish(FinalApplicationStatus.FAILED,
583+
ApplicationMaster.EXIT_REPORTER_FAILURE,
584+
"Fatal exception: " + StringUtils.stringifyException(e))
585+
} else if (failureCount >= reporterMaxFailures) {
582586
finish(FinalApplicationStatus.FAILED,
583587
ApplicationMaster.EXIT_REPORTER_FAILURE, "Exception was thrown " +
584588
s"$failureCount time(s) from Reporter thread.")

0 commit comments

Comments
 (0)