SPY-350 Don't let Master remove RUNNING Applications #21

markhamstra · 2014-07-10T00:26:36Z

SPARK-2425 introduce LOADING -> RUNNING ApplicationState transition and prevent Master from removing Application with RUNNING Executors

and prevent Master from removing Application with RUNNING Executors Conflicts: core/src/main/scala/org/apache/spark/deploy/master/Master.scala core/src/main/scala/org/apache/spark/deploy/worker/ExecutorRunner.scala core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala

…exited executors There seems to be 2 issues. 1. When job is done, driver asks executor to shutdown. However, this clean exit was assigned FAILED executor state by Worker. I introduced EXITED executor state for executors who voluntarily exit (both normal and abnormal exit depending on the exit code). 2. When Master gets notified an executor has exited, it launches another one to replace it, regardless of reason why the executor had exited. When the reason was job has finished, the unnecessary replacement got subsequently killed when App disassociates. This launching and killing of unnecessary executors shows up in the log and is confusing to users. I added check for executor exit status and avoid launching (and subsequent killing) of unnecessary replacements when executors exit cleanly. One could ask the scheduler to tell Master job is done so that Master wouldn't launch the replacement executor. However, there is a race condition between App telling Master job is done and Worker telling Master an executor had exited. There is no guarantee the former will happen before the later. Instead, I chose to check the exit code when executor exits. If the exit code is 0, I assume executor has been asked to shutdown by driver and Master will not launch replacements. Due to race condition, it could also happen that (although didn't happen on my local cluster), Master detects App disassociation event before the executor exits by itself. In such cases, the executor will be rightfully killed and labeled as KILLED, while the App state will show FINISHED. Author: Kan Zhang <kzhang@apache.org> Closes apache#306 from kanzhang/SPARK-1118 and squashes the following commits: cb0cc86 [Kan Zhang] [SPARK-937] adding EXITED executor state and not relaunching cleanly exited executors Conflicts: core/src/main/scala/org/apache/spark/deploy/master/Master.scala core/src/main/scala/org/apache/spark/deploy/worker/ExecutorRunner.scala

jhartlaub · 2014-07-11T14:54:38Z

core/src/main/scala/org/apache/spark/deploy/master/Master.scala

+              if (appInfo.incrementRetryCount() < ApplicationState.MAX_NUM_RETRY) {
+                schedule()
+              } else {
+                logError("Application %s with ID %s failed %d times, removing it".format(


Is the convention in spark is to use "format" rather than string interpolation?

Good point. This section of code is old enough that there is potential to backport the change all the way to pre-string-interpolation Spark 0.8, but realistically that's not going to happen, so I'll update to string interpolation.

jhartlaub · 2014-07-11T14:58:13Z

Would it be possible to create a unit test to exercise this situation?

lianhuiwang · 2014-07-23T14:57:24Z

core/src/main/scala/org/apache/spark/deploy/worker/ExecutorRunner.scala

      val exitCode = process.waitFor()
-      state = ExecutorState.FAILED
+      state = ExecutorState.EXITED


state need to check,because ExecutorExitCode has other state. in class ExecutorExitCode there are many others and their belong to ExecutorState.FAILED, if(exitCode !=0 ) state=ExecutorState.FAILED else state=ExecutorState.EXITED.

@harishreedharan

…either from BlockManager or WAL in HDFS As part of the initiative of preventing data loss on streaming driver failure, this sub-task implements a BlockRDD that is backed by HDFS. This BlockRDD can either read data from the Spark's BlockManager, or read the data from file-segments in write ahead log in HDFS. Most of this code has been written by @harishreedharan Author: Tathagata Das <tathagata.das1565@gmail.com> Author: Hari Shreedharan <hshreedharan@apache.org> Closes apache#2931 from tdas/driver-ha-rdd and squashes the following commits: 209e49c [Tathagata Das] Better fix to style issue. 4a5866f [Tathagata Das] Addressed one more comment. ed5fbf0 [Tathagata Das] Minor updates. b0a18b1 [Tathagata Das] Fixed import order. 20aa7c6 [Tathagata Das] Fixed more line length issues. 29aa099 [Tathagata Das] Fixed line length issues. 9e47b5b [Tathagata Das] Renamed class, simplified+added unit tests. 6e1bfb8 [Tathagata Das] Tweaks testuite to create spark contxt lazily to prevent contxt leaks. 9c86a61 [Tathagata Das] Merge pull request alteryx#22 from harishreedharan/driver-ha-rdd 2878c38 [Hari Shreedharan] Shutdown spark context after tests. Formatting/minor fixes c709f2f [Tathagata Das] Merge pull request alteryx#21 from harishreedharan/driver-ha-rdd 5cce16f [Hari Shreedharan] Make sure getBlockLocations uses offset and length to find the blocks on HDFS eadde56 [Tathagata Das] Transferred HDFSBackedBlockRDD for the driver-ha-working branch

…ommands. Adding support for defining schema in foreign DDL commands. Now foreign DDL support commands like: ``` CREATE TEMPORARY TABLE avroTable USING org.apache.spark.sql.avro OPTIONS (path "../hive/src/test/resources/data/files/episodes.avro") ``` With this PR user can define schema instead of infer from file, so support ddl command as follows: ``` CREATE TEMPORARY TABLE avroTable(a int, b string) USING org.apache.spark.sql.avro OPTIONS (path "../hive/src/test/resources/data/files/episodes.avro") ``` Author: scwf <wangfei1@huawei.com> Author: Yin Huai <yhuai@databricks.com> Author: Fei Wang <wangfei1@huawei.com> Author: wangfei <wangfei1@huawei.com> Closes apache#3431 from scwf/ddl and squashes the following commits: 7e79ce5 [Fei Wang] Merge pull request alteryx#22 from yhuai/pr3431yin 38f634e [Yin Huai] Remove Option from createRelation. 65e9c73 [Yin Huai] Revert all changes since applying a given schema has not been testd. a852b10 [scwf] remove cleanIdentifier f336a16 [Fei Wang] Merge pull request alteryx#21 from yhuai/pr3431yin baf79b5 [Yin Huai] Test special characters quoted by backticks. 50a03b0 [Yin Huai] Use JsonRDD.nullTypeToStringType to convert NullType to StringType. 1eeb769 [Fei Wang] Merge pull request alteryx#20 from yhuai/pr3431yin f5c22b0 [Yin Huai] Refactor code and update test cases. f1cffe4 [Yin Huai] Revert "minor refactory" b621c8f [scwf] minor refactory d02547f [scwf] fix HiveCompatibilitySuite test failure 8dfbf7a [scwf] more tests for complex data type ddab984 [Fei Wang] Merge pull request alteryx#19 from yhuai/pr3431yin 91ad91b [Yin Huai] Parse data types in DDLParser. cf982d2 [scwf] fixed test failure 445b57b [scwf] address comments 02a662c [scwf] style issue 44eb70c [scwf] fix decimal parser issue 83b6fc3 [scwf] minor fix 9bf12f8 [wangfei] adding test case 7787ec7 [wangfei] added SchemaRelationProvider 0ba70df [wangfei] draft version

mbautin · 2015-01-30T07:22:13Z

@markhamstra: do we still need this PR?

markhamstra · 2015-01-30T22:18:56Z

No this has already been fully integrated and through another generation of development upstream, and the fix is already present in csd-1.1 and csd-1.2.

* Add kubernetes profile to travis yml file * Fix long lines in CompressionUtils.scala

markhamstra assigned jhartlaub Jul 10, 2014

jhartlaub reviewed Jul 11, 2014
View reviewed changes

switched to string interpolation

80c2fc0

lianhuiwang reviewed Jul 23, 2014
View reviewed changes

markhamstra unassigned jhartlaub Jan 30, 2015

markhamstra closed this Jan 30, 2015

markhamstra pushed a commit to markhamstra/spark that referenced this pull request Nov 7, 2017

Add kubernetes profile to travis CI yml file (alteryx#21)

979fa92

* Add kubernetes profile to travis yml file * Fix long lines in CompressionUtils.scala

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

SPY-350 Don't let Master remove RUNNING Applications #21

SPY-350 Don't let Master remove RUNNING Applications #21

Uh oh!

markhamstra commented Jul 10, 2014

Uh oh!

jhartlaub Jul 11, 2014

Uh oh!

markhamstra Jul 11, 2014

Uh oh!

jhartlaub commented Jul 11, 2014

Uh oh!

lianhuiwang Jul 23, 2014

Uh oh!

mbautin commented Jan 30, 2015

Uh oh!

markhamstra commented Jan 30, 2015

Uh oh!

Uh oh!

SPY-350 Don't let Master remove RUNNING Applications #21

SPY-350 Don't let Master remove RUNNING Applications #21

Uh oh!

Conversation

markhamstra commented Jul 10, 2014

Uh oh!

jhartlaub Jul 11, 2014

Choose a reason for hiding this comment

Uh oh!

markhamstra Jul 11, 2014

Choose a reason for hiding this comment

Uh oh!

jhartlaub commented Jul 11, 2014

Uh oh!

lianhuiwang Jul 23, 2014

Choose a reason for hiding this comment

Uh oh!

mbautin commented Jan 30, 2015

Uh oh!

markhamstra commented Jan 30, 2015

Uh oh!

Uh oh!