Closed as not planned
Closed as not planned
Description
Description
- ✋ I have searched the open/closed issues and my issue is not listed. (a similar issue Executor is FAILED even when it completes successfully #1290, but it didn't work for me)
I submit a simple SparkApplication
, the executor pods successfully completed, but the executorState
in SparkApplication
is failed.
Reproduction Code [Required]
Steps to reproduce the behavior:
just kubectl apply -f spark-app.yaml
:
spark-app.yaml
apiVersion: "sparkoperator.k8s.io/v1beta2"
kind: SparkApplication
metadata:
name: spark-test
namespace: default
spec:
type: Scala
mode: cluster
image: "ghcr.io/apache/spark-docker/spark:3.5.2"
imagePullPolicy: IfNotPresent
mainClass: org.apache.spark.examples.SparkPi
mainApplicationFile: "local:///opt/spark/examples/jars/spark-examples_2.12-3.5.2.jar"
sparkVersion: "3.5.2"
sparkConf:
spark.rss.storage.type: MEMORY_LOCALFILE
spark.executor.extraJavaOptions: "-DlogTarget=Console -DbootstrapServers=127.0.0.1:9092"
spark.driver.extraJavaOptions: "-DlogTarget=Console -DbootstrapServers=127.0.0.1:9092"
spark.log.level: "DEBUG"
restartPolicy:
type: Never
volumes:
- name: "test-volume"
hostPath:
path: "/tmp"
type: Directory
driver:
kubernetesMaster: https://172.18.0.4:5443
env:
- name: KUBERNETES_SERVICE_HOST
value: "172.18.0.4"
- name: KUBERNETES_SERVICE_PORT
value: "5443"
coreLimit: "1200m"
memory: "512m"
labels:
version: 3.5.2
serviceLabels:
spark-app-name: spark-test
secrets:
- name: spark-opaque-secret
path: /var/run/secrets/kubernetes.io/serviceaccount
secretType: Generic
executor:
env:
- name: KUBERNETES_SERVICE_HOST
value: "172.18.0.4"
- name: KUBERNETES_SERVICE_PORT
value: "5443"
coreRequest: "500m"
coreLimit: "500m"
instances: 3
memory: "500m"
labels:
version: 3.5.2
secrets:
- name: spark-opaque-secret
path: /var/run/secrets/kubernetes.io/serviceaccount
secretType: Generic
volumeMounts:
- name: "test-volume"
mountPath: "/tmp"
Expected behavior
Since the executor pods were successfully completed, the executorState
in SparkApplication
should be COMPLETED
.
Actual behavior
Output of kubectl get SparkApplication spark-test -o yaml
:
status:
applicationState:
state: COMPLETED
driverInfo:
podName: spark-test-driver
webUIAddress: 10.99.199.244:4040
webUIPort: 4040
webUIServiceName: spark-test-ui-svc
executionAttempts: 1
executorState:
spark-pi-95401492b3821b19-exec-1: FAILED
spark-pi-95401492b3821b19-exec-2: FAILED
spark-pi-95401492b3821b19-exec-3: FAILED
lastSubmissionAttemptTime: "2024-10-22T09:14:53Z"
sparkApplicationId: spark-8ad0ef0fd6e34f438da301ce5e6ef585
submissionAttempts: 1
submissionID: 11f64e8c-dc5e-416e-9ada-6601fca39ece
terminationTime: "2024-10-22T09:15:30Z"
Logs of spark-operator-controller
:
2024-10-22T09:14:47.287Z INFO sparkapplication/event_handler.go:168 SparkApplication created {"name": "spark-test", "namespace": "default", "state": ""}
2024-10-22T09:14:47.292Z INFO sparkapplication/controller.go:171 Reconciling SparkApplication {"name": "spark-test", "namespace": "default", "state": ""}
2024-10-22T09:14:47.292Z INFO sparkapplication/controller.go:633 Submitting SparkApplication {"name": "spark-test", "namespace": "default", "state": ""}
2024-10-22T09:14:47.302Z INFO sparkapplication/controller.go:659 Created web UI service for SparkApplication {"name": "spark-test", "namespace": "default"}
2024-10-22T09:14:47.302Z INFO sparkapplication/controller.go:716 Running spark-submit for SparkApplication {"name": "spark-test", "namespace": "default", "arguments": ["--master", "k8s://https://172.18.0.4:5443", "--deploy-mode", "cluster", "--class", "org.apache.spark.examples.SparkPi", "--name", "spark-test", "--conf", "spark.kubernetes.namespace=default", "--conf", "spark.kubernetes.container.image=ghcr.io/apache/spark-docker/spark:3.5.2", "--conf", "spark.kubernetes.container.image.pullPolicy=IfNotPresent", "--conf", "spark.kubernetes.submission.waitAppCompletion=false", "--conf", "spark.driver.extraJavaOptions=-DlogTarget=Console -DbootstrapServers=127.0.0.1:9092", "--conf", "spark.executor.extraJavaOptions=-DlogTarget=Console -DbootstrapServers=127.0.0.1:9092", "--conf", "spark.log.level=DEBUG", "--conf", "spark.rss.storage.type=MEMORY_LOCALFILE", "--conf", "spark.kubernetes.driver.pod.name=spark-test-driver", "--conf", "spark.kubernetes.driver.label.sparkoperator.k8s.io/app-name=spark-test", "--conf", "spark.kubernetes.driver.label.sparkoperator.k8s.io/launched-by-spark-operator=true", "--conf", "spark.kubernetes.driver.label.sparkoperator.k8s.io/submission-id=11f64e8c-dc5e-416e-9ada-6601fca39ece", "--conf", "spark.kubernetes.driver.container.image=ghcr.io/apache/spark-docker/spark:3.5.2", "--conf", "spark.driver.cores=1", "--conf", "spark.kubernetes.driver.limit.cores=1200m", "--conf", "spark.driver.memory=512m", "--conf", "spark.kubernetes.driver.master=https://172.18.0.4:5443", "--conf", "spark.kubernetes.driver.label.version=3.5.2", "--conf", "spark.kubernetes.driver.service.label.spark-app-name=spark-test", "--conf", "spark.kubernetes.driver.secrets.spark-opaque-secret=/var/run/secrets/kubernetes.io/serviceaccount", "--conf", "spark.kubernetes.executor.label.sparkoperator.k8s.io/app-name=spark-test", "--conf", "spark.kubernetes.executor.label.sparkoperator.k8s.io/launched-by-spark-operator=true", "--conf", "spark.kubernetes.executor.label.sparkoperator.k8s.io/submission-id=11f64e8c-dc5e-416e-9ada-6601fca39ece", "--conf", "spark.executor.instances=3", "--conf", "spark.kubernetes.executor.container.image=ghcr.io/apache/spark-docker/spark:3.5.2", "--conf", "spark.executor.cores=1", "--conf", "spark.kubernetes.executor.request.cores=500m", "--conf", "spark.kubernetes.executor.limit.cores=500m", "--conf", "spark.executor.memory=500m", "--conf", "spark.kubernetes.executor.label.version=3.5.2", "--conf", "spark.kubernetes.executor.secrets.spark-opaque-secret=/var/run/secrets/kubernetes.io/serviceaccount", "local:///opt/spark/examples/jars/spark-examples_2.12-3.5.2.jar"]}
2024-10-22T09:14:52.824Z INFO sparkapplication/event_handler.go:60 Spark pod created {"name": "spark-test-driver", "namespace": "default", "phase": "Pending"}
2024-10-22T09:14:53.962Z INFO sparkapplication/event_handler.go:188 SparkApplication updated {"name": "spark-test", "namespace": "default", "oldState": "", "newState": "SUBMITTED"}
2024-10-22T09:14:53.967Z INFO sparkapplication/controller.go:171 Reconciling SparkApplication {"name": "spark-test", "namespace": "default", "state": "SUBMITTED"}
2024-10-22T09:14:53.986Z INFO sparkapplication/event_handler.go:188 SparkApplication updated {"name": "spark-test", "namespace": "default", "oldState": "SUBMITTED", "newState": "SUBMITTED"}
2024-10-22T09:14:53.995Z INFO sparkapplication/controller.go:171 Reconciling SparkApplication {"name": "spark-test", "namespace": "default", "state": "SUBMITTED"}
2024-10-22T09:14:55.366Z INFO sparkapplication/event_handler.go:84 Spark pod updated {"name": "spark-test-driver", "namespace": "default", "oldPhase": "Pending", "newPhase": "Running"}
2024-10-22T09:14:55.373Z INFO sparkapplication/controller.go:171 Reconciling SparkApplication {"name": "spark-test", "namespace": "default", "state": "SUBMITTED"}
2024-10-22T09:14:55.398Z INFO sparkapplication/event_handler.go:188 SparkApplication updated {"name": "spark-test", "namespace": "default", "oldState": "SUBMITTED", "newState": "RUNNING"}
2024-10-22T09:14:55.407Z INFO sparkapplication/controller.go:171 Reconciling SparkApplication {"name": "spark-test", "namespace": "default", "state": "RUNNING"}
2024-10-22T09:15:05.369Z INFO sparkapplication/event_handler.go:60 Spark pod created {"name": "spark-pi-95401492b3821b19-exec-1", "namespace": "default", "phase": "Pending"}
2024-10-22T09:15:05.386Z INFO sparkapplication/controller.go:171 Reconciling SparkApplication {"name": "spark-test", "namespace": "default", "state": "RUNNING"}
2024-10-22T09:15:05.401Z INFO sparkapplication/event_handler.go:188 SparkApplication updated {"name": "spark-test", "namespace": "default", "oldState": "RUNNING", "newState": "RUNNING"}
2024-10-22T09:15:05.412Z INFO sparkapplication/controller.go:171 Reconciling SparkApplication {"name": "spark-test", "namespace": "default", "state": "RUNNING"}
2024-10-22T09:15:05.520Z INFO sparkapplication/event_handler.go:60 Spark pod created {"name": "spark-pi-95401492b3821b19-exec-2", "namespace": "default", "phase": "Pending"}
2024-10-22T09:15:05.531Z INFO sparkapplication/controller.go:171 Reconciling SparkApplication {"name": "spark-test", "namespace": "default", "state": "RUNNING"}
2024-10-22T09:15:05.594Z INFO sparkapplication/event_handler.go:188 SparkApplication updated {"name": "spark-test", "namespace": "default", "oldState": "RUNNING", "newState": "RUNNING"}
2024-10-22T09:15:05.615Z INFO sparkapplication/controller.go:171 Reconciling SparkApplication {"name": "spark-test", "namespace": "default", "state": "RUNNING"}
2024-10-22T09:15:05.878Z INFO sparkapplication/event_handler.go:60 Spark pod created {"name": "spark-pi-95401492b3821b19-exec-3", "namespace": "default", "phase": "Pending"}
2024-10-22T09:15:05.892Z INFO sparkapplication/controller.go:171 Reconciling SparkApplication {"name": "spark-test", "namespace": "default", "state": "RUNNING"}
2024-10-22T09:15:06.042Z INFO sparkapplication/event_handler.go:188 SparkApplication updated {"name": "spark-test", "namespace": "default", "oldState": "RUNNING", "newState": "RUNNING"}
2024-10-22T09:15:06.061Z INFO sparkapplication/controller.go:171 Reconciling SparkApplication {"name": "spark-test", "namespace": "default", "state": "RUNNING"}
2024-10-22T09:15:08.108Z INFO sparkapplication/event_handler.go:84 Spark pod updated {"name": "spark-pi-95401492b3821b19-exec-1", "namespace": "default", "oldPhase": "Pending", "newPhase": "Running"}
2024-10-22T09:15:08.115Z INFO sparkapplication/controller.go:171 Reconciling SparkApplication {"name": "spark-test", "namespace": "default", "state": "RUNNING"}
2024-10-22T09:15:08.121Z INFO sparkapplication/event_handler.go:84 Spark pod updated {"name": "spark-pi-95401492b3821b19-exec-3", "namespace": "default", "oldPhase": "Pending", "newPhase": "Running"}
2024-10-22T09:15:08.214Z INFO sparkapplication/controller.go:171 Reconciling SparkApplication {"name": "spark-test", "namespace": "default", "state": "RUNNING"}
2024-10-22T09:15:08.214Z INFO sparkapplication/event_handler.go:84 Spark pod updated {"name": "spark-pi-95401492b3821b19-exec-2", "namespace": "default", "oldPhase": "Pending", "newPhase": "Running"}
2024-10-22T09:15:08.216Z INFO sparkapplication/event_handler.go:188 SparkApplication updated {"name": "spark-test", "namespace": "default", "oldState": "RUNNING", "newState": "RUNNING"}
2024-10-22T09:15:08.314Z INFO sparkapplication/event_handler.go:188 SparkApplication updated {"name": "spark-test", "namespace": "default", "oldState": "RUNNING", "newState": "RUNNING"}
2024-10-22T09:15:08.316Z INFO sparkapplication/controller.go:171 Reconciling SparkApplication {"name": "spark-test", "namespace": "default", "state": "RUNNING"}
2024-10-22T09:15:08.338Z INFO sparkapplication/controller.go:171 Reconciling SparkApplication {"name": "spark-test", "namespace": "default", "state": "RUNNING"}
2024-10-22T09:15:28.384Z INFO sparkapplication/event_handler.go:99 Spark pod deleted {"name": "spark-pi-95401492b3821b19-exec-1", "namespace": "default", "phase": "Running"}
2024-10-22T09:15:28.392Z INFO sparkapplication/controller.go:171 Reconciling SparkApplication {"name": "spark-test", "namespace": "default", "state": "RUNNING"}
2024-10-22T09:15:28.475Z INFO sparkapplication/event_handler.go:188 SparkApplication updated {"name": "spark-test", "namespace": "default", "oldState": "RUNNING", "newState": "RUNNING"}
2024-10-22T09:15:28.477Z INFO sparkapplication/event_handler.go:99 Spark pod deleted {"name": "spark-pi-95401492b3821b19-exec-2", "namespace": "default", "phase": "Running"}
2024-10-22T09:15:28.483Z INFO sparkapplication/controller.go:171 Reconciling SparkApplication {"name": "spark-test", "namespace": "default", "state": "RUNNING"}
2024-10-22T09:15:28.512Z INFO sparkapplication/event_handler.go:99 Spark pod deleted {"name": "spark-pi-95401492b3821b19-exec-3", "namespace": "default", "phase": "Running"}
2024-10-22T09:15:28.565Z INFO sparkapplication/controller.go:171 Reconciling SparkApplication {"name": "spark-test", "namespace": "default", "state": "RUNNING"}
2024-10-22T09:15:28.576Z INFO sparkapplication/event_handler.go:188 SparkApplication updated {"name": "spark-test", "namespace": "default", "oldState": "RUNNING", "newState": "RUNNING"}
2024-10-22T09:15:28.730Z INFO sparkapplication/event_handler.go:188 SparkApplication updated {"name": "spark-test", "namespace": "default", "oldState": "RUNNING", "newState": "RUNNING"}
2024-10-22T09:15:28.731Z INFO sparkapplication/controller.go:171 Reconciling SparkApplication {"name": "spark-test", "namespace": "default", "state": "RUNNING"}
2024-10-22T09:15:28.768Z INFO sparkapplication/controller.go:171 Reconciling SparkApplication {"name": "spark-test", "namespace": "default", "state": "RUNNING"}
2024-10-22T09:15:30.321Z INFO sparkapplication/event_handler.go:84 Spark pod updated {"name": "spark-test-driver", "namespace": "default", "oldPhase": "Running", "newPhase": "Succeeded"}
2024-10-22T09:15:30.326Z INFO sparkapplication/controller.go:171 Reconciling SparkApplication {"name": "spark-test", "namespace": "default", "state": "RUNNING"}
2024-10-22T09:15:30.337Z INFO sparkapplication/event_handler.go:188 SparkApplication updated {"name": "spark-test", "namespace": "default", "oldState": "RUNNING", "newState": "SUCCEEDING"}
2024-10-22T09:15:30.343Z INFO sparkapplication/controller.go:171 Reconciling SparkApplication {"name": "spark-test", "namespace": "default", "state": "SUCCEEDING"}
2024-10-22T09:15:30.355Z INFO sparkapplication/event_handler.go:188 SparkApplication updated {"name": "spark-test", "namespace": "default", "oldState": "SUCCEEDING", "newState": "COMPLETED"}
2024-10-22T09:15:30.360Z INFO sparkapplication/controller.go:171 Reconciling SparkApplication {"name": "spark-test", "namespace": "default", "state": "COMPLETED"}
Status of driver pod:
status:
containerStatuses:
- image: ""
imageID: ""
lastState: {}
name: ""
ready: false
restartCount: 0
state:
terminated:
containerID: containerd://c8821e8da225211b84763e3a3509435e1b7771992dc25556563cff3f6b7fd0a9
exitCode: 0
finishedAt: "2024-10-22T09:15:28Z"
reason: Completed
startedAt: "2024-10-22T09:14:54Z"
phase: Succeeded
the executor pods is first running and then deleted
Terminal Output Screenshot(s)
Environment & Versions
- Spark Operator App version: 2.0.2
- Helm Chart Version: v3.15.3
- Kubernetes Version: v1.30.4
- Apache Spark version: v3.5.2
Additional context
If there is any information missing, you can @ me to provide it~