You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Robustness to driver pod taking time to create (#2315)
* Retry after driver pod now found if recent submission
Signed-off-by: Thomas Newton <thomas.w.newton@gmail.com>
* Add a test
Signed-off-by: Thomas Newton <thomas.w.newton@gmail.com>
* Make grace period configurable
Signed-off-by: Thomas Newton <thomas.w.newton@gmail.com>
* Update test
Signed-off-by: Thomas Newton <thomas.w.newton@gmail.com>
* Add an extra test with the driver pod
Signed-off-by: Thomas Newton <thomas.w.newton@gmail.com>
* Separate context to create and delete the driver pod
Signed-off-by: Thomas Newton <thomas.w.newton@gmail.com>
* Tidy
Signed-off-by: Thomas Newton <thomas.w.newton@gmail.com>
* Autoformat
Signed-off-by: Thomas Newton <thomas.w.newton@gmail.com>
* Update error message
Signed-off-by: Thomas Newton <thomas.w.newton@gmail.com>
* Add helm paramater
Signed-off-by: Thomas Newton <thomas.w.newton@gmail.com>
* Update internal/controller/sparkapplication/controller.go
Co-authored-by: Yi Chen <github@chenyicn.net>
Signed-off-by: Thomas Newton <thomas.w.newton@gmail.com>
* Newlines between helm tests
Signed-off-by: Thomas Newton <thomas.w.newton@gmail.com>
---------
Signed-off-by: Thomas Newton <thomas.w.newton@gmail.com>
Co-authored-by: Yi Chen <github@chenyicn.net>
| controller.logLevel | string |`"info"`| Configure the verbosity of logging, can be one of `debug`, `info`, `error`. |
89
+
| controller.driverPodCreationGracePeriod | string |`"10s"`| Grace period after a successful spark-submit when driver pod not found errors will be retried. Useful if the driver pod can take some time to be created. |
89
90
| controller.maxTrackedExecutorPerApp | int |`1000`| Specifies the maximum number of Executor pods that can be tracked by the controller per SparkApplication. |
90
91
| controller.uiService.enable | bool |`true`| Specifies whether to create service for Spark web UI. |
91
92
| controller.uiIngress.enable | bool |`false`| Specifies whether to create ingress for Spark web UI. `controller.uiService.enable` must be `true` to enable ingress. |
Copy file name to clipboardexpand all lines: charts/spark-operator-chart/values.yaml
+3
Original file line number
Diff line number
Diff line change
@@ -51,6 +51,9 @@ controller:
51
51
# -- Configure the verbosity of logging, can be one of `debug`, `info`, `error`.
52
52
logLevel: info
53
53
54
+
# -- Grace period after a successful spark-submit when driver pod not found errors will be retried. Useful if the driver pod can take some time to be created.
55
+
driverPodCreationGracePeriod: 10s
56
+
54
57
# -- Specifies the maximum number of Executor pods that can be tracked by the controller per SparkApplication.
command.Flags().DurationVar(&driverPodCreationGracePeriod, "driver-pod-creation-grace-period", 10*time.Second, "Grace period after a successful spark-submit when driver pod not found errors will be retried. Useful if the driver pod can take some time to be created.")
app.Status.AppState.ErrorMessage="driver pod not found"
781
+
app.Status.TerminationTime=metav1.Now()
782
+
returnnil
783
+
}
784
+
returnfmt.Errorf("driver pod not found, while inside the grace period. Grace period of %v expires at %v", r.options.DriverPodCreationGracePeriod, app.Status.LastSubmissionAttemptTime.Add(r.options.DriverPodCreationGracePeriod))
0 commit comments