Skip to content

Commit

Permalink
Add Tencent is evaluating. (kubeflow#521)
Browse files Browse the repository at this point in the history
* fix a typo in design.md
* add tencent is evaluating
  • Loading branch information
runzhliu authored and liyinan926 committed Jun 15, 2019
1 parent 7f1de86 commit c495bb1
Show file tree
Hide file tree
Showing 2 changed files with 3 additions and 2 deletions.
2 changes: 1 addition & 1 deletion docs/design.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ When a `SparkApplication` object gets updated (i.e., when the `UpdateFunc` callb

The controller is also responsible for updating the status of a `SparkApplication` object with the help of the Spark pod monitor, which watches Spark pods and update the `SparkApplicationStatus` field of corresponding `SparkApplication` objects based on the status of the pods. The Spark pod monitor watches events of creation, updates, and deletion of Spark pods, creates status update messages based on the status of the pods, and sends the messages to the controller to process. When the controller receives a status update message, it gets the corresponding `SparkApplication` object from the cache store and updates the the `Status` accordingly.

As described in [API Definition](api.md), the `Status` field (of type `SparkApplicationStatus`) records the overall state of the application as well as the state of each executor pod. Note that the overall state of an application is determined by the driver pod state, except when submission fails, in which case no driver pod gets launched. Particulrly, the final application state is set to the termination state of the driver pod when applicable, i.e., `COMPLETED` if the driver pod completed or `FAILED` if the driver pod failed. If the driver pod gets deleted while running, the final application state is set to `FAILED`. If submission fails, the application state is set to `FAILED_SUBMISSION`. There are two terminal states: `COMPLETED` and `FAILED` which means that any Application in these states will never be retried by the Operator. All other states are non-terminal and based on the State as well as RestartPolicy (discussed below) can be retried.
As described in [API Definition](api.md), the `Status` field (of type `SparkApplicationStatus`) records the overall state of the application as well as the state of each executor pod. Note that the overall state of an application is determined by the driver pod state, except when submission fails, in which case no driver pod gets launched. Particularly, the final application state is set to the termination state of the driver pod when applicable, i.e., `COMPLETED` if the driver pod completed or `FAILED` if the driver pod failed. If the driver pod gets deleted while running, the final application state is set to `FAILED`. If submission fails, the application state is set to `FAILED_SUBMISSION`. There are two terminal states: `COMPLETED` and `FAILED` which means that any Application in these states will never be retried by the Operator. All other states are non-terminal and based on the State as well as RestartPolicy (discussed below) can be retried.

As part of preparing a submission for a newly created `SparkApplication` object, the controller parses the object and adds configuration options for adding certain annotations to the driver and executor pods of the application. The annotations are later used by the mutating admission webhook to configure the pods before they start to run. For example,if a Spark application needs a certain Kubernetes ConfigMap to be mounted into the driver and executor pods, the controller adds an annotation that specifies the name of the ConfigMap to mount. Later the mutating admission webhook sees the annotation on the pods and mount the ConfigMap to the pods.

Expand Down
3 changes: 2 additions & 1 deletion docs/who-is-using.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,4 +10,5 @@
| Lyft |@kumare3| Evaluation | ML & Data Infrastructure |
| MapR Technologies |@sarjeet2013| Evaluation | ML/AI & Analytics Data Platform |
| Uber| @chenqin| Evaluation| Spark / ML|
| HashmapInc| @prem0132 | Evaluation | Analytics Data Platform
| HashmapInc| @prem0132 | Evaluation | Analytics Data Platform |
| Tencent | @runzhliu | Evaluation | ML Analytics Platform |

0 comments on commit c495bb1

Please sign in to comment.