YARN-8470. Fix a NPE in identifyContainersToPreemptOnNode() #416

gg7 · 2018-09-11T15:49:00Z

I encountered this issue while running 3.1.0:

2018-09-10 13:42:39,437 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler: Container container_1536156801471_0071_01_000055 completed with event FINISHED, but corresponding RMContainer doesn't exist.
2018-09-10 13:42:39,881 ERROR org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received RMFatalEvent of type CRITICAL_THREAD_CRASH, caused by a critical thread, FSPreemptionThread, that exited unexpectedly: java.lang.NullPointerException
        at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSPreemptionThread.identifyContainersToPreemptOnNode(FSPreemptionThread.java:207)
        at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSPreemptionThread.identifyContainersToPreemptForOneContainer(FSPreemptionThread.java:161)
        at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSPreemptionThread.identifyContainersToPreempt(FSPreemptionThread.java:121)
        at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSPreemptionThread.run(FSPreemptionThread.java:81)

2018-09-10 13:42:39,886 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Shutting down the resource manager.
2018-09-10 13:42:39,891 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 1: a critical thread, FSPreemptionThread, that exited unexpectedly: java.lang.NullPointerException
        at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSPreemptionThread.identifyContainersToPreemptOnNode(FSPreemptionThread.java:207)
        at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSPreemptionThread.identifyContainersToPreemptForOneContainer(FSPreemptionThread.java:161)
        at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSPreemptionThread.identifyContainersToPreempt(FSPreemptionThread.java:121)
        at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSPreemptionThread.run(FSPreemptionThread.java:81)

I'm guessing a better fix would be to synchronise the removal of applications, but this simple patch should be an improvement IMO.

I encountered this issue while running 3.1.0: ``` 2018-09-10 13:42:39,437 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler: Container container_1536156801471_0071_01_000055 completed with event FINISHED, but corresponding RMContainer doesn't exist. 2018-09-10 13:42:39,881 ERROR org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received RMFatalEvent of type CRITICAL_THREAD_CRASH, caused by a critical thread, FSPreemptionThread, that exited unexpectedly: java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSPreemptionThread.identifyContainersToPreemptOnNode(FSPreemptionThread.java:207) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSPreemptionThread.identifyContainersToPreemptForOneContainer(FSPreemptionThread.java:161) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSPreemptionThread.identifyContainersToPreempt(FSPreemptionThread.java:121) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSPreemptionThread.run(FSPreemptionThread.java:81) 2018-09-10 13:42:39,886 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Shutting down the resource manager. 2018-09-10 13:42:39,891 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 1: a critical thread, FSPreemptionThread, that exited unexpectedly: java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSPreemptionThread.identifyContainersToPreemptOnNode(FSPreemptionThread.java:207) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSPreemptionThread.identifyContainersToPreemptForOneContainer(FSPreemptionThread.java:161) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSPreemptionThread.identifyContainersToPreempt(FSPreemptionThread.java:121) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSPreemptionThread.run(FSPreemptionThread.java:81) ``` I'm guessing a better fix would be to synchronise the removal of applications, but this simple patch should be an improvement IMO. Signed-off-by: George G <git@gg7.io>

hadoop-yetus · 2019-07-19T17:21:12Z

💔 -1 overall

Vote	Subsystem	Runtime	Comment
0	reexec	36	Docker mode activated.
		_ Prechecks _
+1	dupname	0	No case conflicting files found.
+1	@author	0	The patch does not contain any @author tags.
-1	test4tests	0	The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
		_ trunk Compile Tests _
+1	mvninstall	1054	trunk passed
+1	compile	48	trunk passed
+1	checkstyle	36	trunk passed
+1	mvnsite	50	trunk passed
+1	shadedclient	754	branch has no errors when building and testing our client artifacts.
+1	javadoc	29	trunk passed
0	spotbugs	97	Used deprecated FindBugs config; considering switching to SpotBugs.
+1	findbugs	96	trunk passed
		_ Patch Compile Tests _
+1	mvninstall	42	the patch passed
+1	compile	42	the patch passed
+1	javac	42	the patch passed
-0	checkstyle	27	hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 2 new + 4 unchanged - 0 fixed = 6 total (was 4)
+1	mvnsite	45	the patch passed
+1	whitespace	0	The patch has no whitespace issues.
+1	shadedclient	701	patch has no errors when building and testing our client artifacts.
+1	javadoc	29	the patch passed
+1	findbugs	101	the patch passed
		_ Other Tests _
-1	unit	4760	hadoop-yarn-server-resourcemanager in the patch failed.
+1	asflicense	25	The patch does not generate ASF License warnings.
		7929

Reason	Tests
Failed junit tests	hadoop.yarn.server.resourcemanager.rmapp.TestApplicationLifetimeMonitor

Subsystem	Report/Notes
Docker	Client=18.09.8 Server=18.09.8 base: https://builds.apache.org/job/hadoop-multibranch/job/PR-416/1/artifact/out/Dockerfile
GITHUB PR	#416
Optional Tests	dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle
uname	Linux 63935efd06ab 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 17:16:02 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	personality/hadoop.sh
git revision	trunk / `cd967c7`
Default Java	1.8.0_212
checkstyle	https://builds.apache.org/job/hadoop-multibranch/job/PR-416/1/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
unit	https://builds.apache.org/job/hadoop-multibranch/job/PR-416/1/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
Test Results	https://builds.apache.org/job/hadoop-multibranch/job/PR-416/1/testReport/
Max. process+thread count	903 (vs. ulimit of 5500)
modules	C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
Console output	https://builds.apache.org/job/hadoop-multibranch/job/PR-416/1/console
versions	git=2.7.4 maven=3.3.9 findbugs=3.1.0-RC1
Powered by	Apache Yetus 0.10.0 http://yetus.apache.org