Skip to content

[BUG] executor pod successfully completed but executorState in SparkApplication is failed #2277

Closed as not planned
@chaosi-zju

Description

@chaosi-zju

Description

I submit a simple SparkApplication, the executor pods successfully completed, but the executorState in SparkApplication is failed.

Reproduction Code [Required]

Steps to reproduce the behavior:

just kubectl apply -f spark-app.yaml:

spark-app.yaml
apiVersion: "sparkoperator.k8s.io/v1beta2"
kind: SparkApplication
metadata:
  name: spark-test
  namespace: default
spec:
  type: Scala
  mode: cluster
  image: "ghcr.io/apache/spark-docker/spark:3.5.2"
  imagePullPolicy: IfNotPresent
  mainClass: org.apache.spark.examples.SparkPi
  mainApplicationFile: "local:///opt/spark/examples/jars/spark-examples_2.12-3.5.2.jar"
  sparkVersion: "3.5.2"
  sparkConf:
    spark.rss.storage.type: MEMORY_LOCALFILE
    spark.executor.extraJavaOptions: "-DlogTarget=Console -DbootstrapServers=127.0.0.1:9092"
    spark.driver.extraJavaOptions: "-DlogTarget=Console -DbootstrapServers=127.0.0.1:9092"
    spark.log.level: "DEBUG"
  restartPolicy:
    type: Never
  volumes:
    - name: "test-volume"
      hostPath:
        path: "/tmp"
        type: Directory
  driver:
    kubernetesMaster: https://172.18.0.4:5443
    env:
      - name: KUBERNETES_SERVICE_HOST
        value: "172.18.0.4"
      - name: KUBERNETES_SERVICE_PORT
        value: "5443"
    coreLimit: "1200m"
    memory: "512m"
    labels:
      version: 3.5.2
    serviceLabels:
      spark-app-name: spark-test
    secrets:
      - name: spark-opaque-secret
        path: /var/run/secrets/kubernetes.io/serviceaccount
        secretType: Generic
  executor:
    env:
      - name: KUBERNETES_SERVICE_HOST
        value: "172.18.0.4"
      - name: KUBERNETES_SERVICE_PORT
        value: "5443"
    coreRequest: "500m"
    coreLimit: "500m"
    instances: 3
    memory: "500m"
    labels:
      version: 3.5.2
    secrets:
      - name: spark-opaque-secret
        path: /var/run/secrets/kubernetes.io/serviceaccount
        secretType: Generic
    volumeMounts:
      - name: "test-volume"
        mountPath: "/tmp"

Expected behavior

Since the executor pods were successfully completed, the executorState in SparkApplication should be COMPLETED.

Actual behavior

Output of kubectl get SparkApplication spark-test -o yaml:

status:
    applicationState:
      state: COMPLETED
    driverInfo:
      podName: spark-test-driver
      webUIAddress: 10.99.199.244:4040
      webUIPort: 4040
      webUIServiceName: spark-test-ui-svc
    executionAttempts: 1
    executorState:
      spark-pi-95401492b3821b19-exec-1: FAILED
      spark-pi-95401492b3821b19-exec-2: FAILED
      spark-pi-95401492b3821b19-exec-3: FAILED
    lastSubmissionAttemptTime: "2024-10-22T09:14:53Z"
    sparkApplicationId: spark-8ad0ef0fd6e34f438da301ce5e6ef585
    submissionAttempts: 1
    submissionID: 11f64e8c-dc5e-416e-9ada-6601fca39ece
    terminationTime: "2024-10-22T09:15:30Z"

Logs of spark-operator-controller:

2024-10-22T09:14:47.287Z        INFO    sparkapplication/event_handler.go:168   SparkApplication created        {"name": "spark-test", "namespace": "default", "state": ""}
2024-10-22T09:14:47.292Z        INFO    sparkapplication/controller.go:171      Reconciling SparkApplication    {"name": "spark-test", "namespace": "default", "state": ""}
2024-10-22T09:14:47.292Z        INFO    sparkapplication/controller.go:633      Submitting SparkApplication     {"name": "spark-test", "namespace": "default", "state": ""}
2024-10-22T09:14:47.302Z        INFO    sparkapplication/controller.go:659      Created web UI service for SparkApplication     {"name": "spark-test", "namespace": "default"}
2024-10-22T09:14:47.302Z        INFO    sparkapplication/controller.go:716      Running spark-submit for SparkApplication       {"name": "spark-test", "namespace": "default", "arguments": ["--master", "k8s://https://172.18.0.4:5443", "--deploy-mode", "cluster", "--class", "org.apache.spark.examples.SparkPi", "--name", "spark-test", "--conf", "spark.kubernetes.namespace=default", "--conf", "spark.kubernetes.container.image=ghcr.io/apache/spark-docker/spark:3.5.2", "--conf", "spark.kubernetes.container.image.pullPolicy=IfNotPresent", "--conf", "spark.kubernetes.submission.waitAppCompletion=false", "--conf", "spark.driver.extraJavaOptions=-DlogTarget=Console -DbootstrapServers=127.0.0.1:9092", "--conf", "spark.executor.extraJavaOptions=-DlogTarget=Console -DbootstrapServers=127.0.0.1:9092", "--conf", "spark.log.level=DEBUG", "--conf", "spark.rss.storage.type=MEMORY_LOCALFILE", "--conf", "spark.kubernetes.driver.pod.name=spark-test-driver", "--conf", "spark.kubernetes.driver.label.sparkoperator.k8s.io/app-name=spark-test", "--conf", "spark.kubernetes.driver.label.sparkoperator.k8s.io/launched-by-spark-operator=true", "--conf", "spark.kubernetes.driver.label.sparkoperator.k8s.io/submission-id=11f64e8c-dc5e-416e-9ada-6601fca39ece", "--conf", "spark.kubernetes.driver.container.image=ghcr.io/apache/spark-docker/spark:3.5.2", "--conf", "spark.driver.cores=1", "--conf", "spark.kubernetes.driver.limit.cores=1200m", "--conf", "spark.driver.memory=512m", "--conf", "spark.kubernetes.driver.master=https://172.18.0.4:5443", "--conf", "spark.kubernetes.driver.label.version=3.5.2", "--conf", "spark.kubernetes.driver.service.label.spark-app-name=spark-test", "--conf", "spark.kubernetes.driver.secrets.spark-opaque-secret=/var/run/secrets/kubernetes.io/serviceaccount", "--conf", "spark.kubernetes.executor.label.sparkoperator.k8s.io/app-name=spark-test", "--conf", "spark.kubernetes.executor.label.sparkoperator.k8s.io/launched-by-spark-operator=true", "--conf", "spark.kubernetes.executor.label.sparkoperator.k8s.io/submission-id=11f64e8c-dc5e-416e-9ada-6601fca39ece", "--conf", "spark.executor.instances=3", "--conf", "spark.kubernetes.executor.container.image=ghcr.io/apache/spark-docker/spark:3.5.2", "--conf", "spark.executor.cores=1", "--conf", "spark.kubernetes.executor.request.cores=500m", "--conf", "spark.kubernetes.executor.limit.cores=500m", "--conf", "spark.executor.memory=500m", "--conf", "spark.kubernetes.executor.label.version=3.5.2", "--conf", "spark.kubernetes.executor.secrets.spark-opaque-secret=/var/run/secrets/kubernetes.io/serviceaccount", "local:///opt/spark/examples/jars/spark-examples_2.12-3.5.2.jar"]}
2024-10-22T09:14:52.824Z        INFO    sparkapplication/event_handler.go:60    Spark pod created       {"name": "spark-test-driver", "namespace": "default", "phase": "Pending"}
2024-10-22T09:14:53.962Z        INFO    sparkapplication/event_handler.go:188   SparkApplication updated        {"name": "spark-test", "namespace": "default", "oldState": "", "newState": "SUBMITTED"}
2024-10-22T09:14:53.967Z        INFO    sparkapplication/controller.go:171      Reconciling SparkApplication    {"name": "spark-test", "namespace": "default", "state": "SUBMITTED"}
2024-10-22T09:14:53.986Z        INFO    sparkapplication/event_handler.go:188   SparkApplication updated        {"name": "spark-test", "namespace": "default", "oldState": "SUBMITTED", "newState": "SUBMITTED"}
2024-10-22T09:14:53.995Z        INFO    sparkapplication/controller.go:171      Reconciling SparkApplication    {"name": "spark-test", "namespace": "default", "state": "SUBMITTED"}
2024-10-22T09:14:55.366Z        INFO    sparkapplication/event_handler.go:84    Spark pod updated       {"name": "spark-test-driver", "namespace": "default", "oldPhase": "Pending", "newPhase": "Running"}
2024-10-22T09:14:55.373Z        INFO    sparkapplication/controller.go:171      Reconciling SparkApplication    {"name": "spark-test", "namespace": "default", "state": "SUBMITTED"}
2024-10-22T09:14:55.398Z        INFO    sparkapplication/event_handler.go:188   SparkApplication updated        {"name": "spark-test", "namespace": "default", "oldState": "SUBMITTED", "newState": "RUNNING"}
2024-10-22T09:14:55.407Z        INFO    sparkapplication/controller.go:171      Reconciling SparkApplication    {"name": "spark-test", "namespace": "default", "state": "RUNNING"}

2024-10-22T09:15:05.369Z        INFO    sparkapplication/event_handler.go:60    Spark pod created       {"name": "spark-pi-95401492b3821b19-exec-1", "namespace": "default", "phase": "Pending"}
2024-10-22T09:15:05.386Z        INFO    sparkapplication/controller.go:171      Reconciling SparkApplication    {"name": "spark-test", "namespace": "default", "state": "RUNNING"}
2024-10-22T09:15:05.401Z        INFO    sparkapplication/event_handler.go:188   SparkApplication updated        {"name": "spark-test", "namespace": "default", "oldState": "RUNNING", "newState": "RUNNING"}
2024-10-22T09:15:05.412Z        INFO    sparkapplication/controller.go:171      Reconciling SparkApplication    {"name": "spark-test", "namespace": "default", "state": "RUNNING"}
2024-10-22T09:15:05.520Z        INFO    sparkapplication/event_handler.go:60    Spark pod created       {"name": "spark-pi-95401492b3821b19-exec-2", "namespace": "default", "phase": "Pending"}
2024-10-22T09:15:05.531Z        INFO    sparkapplication/controller.go:171      Reconciling SparkApplication    {"name": "spark-test", "namespace": "default", "state": "RUNNING"}
2024-10-22T09:15:05.594Z        INFO    sparkapplication/event_handler.go:188   SparkApplication updated        {"name": "spark-test", "namespace": "default", "oldState": "RUNNING", "newState": "RUNNING"}
2024-10-22T09:15:05.615Z        INFO    sparkapplication/controller.go:171      Reconciling SparkApplication    {"name": "spark-test", "namespace": "default", "state": "RUNNING"}
2024-10-22T09:15:05.878Z        INFO    sparkapplication/event_handler.go:60    Spark pod created       {"name": "spark-pi-95401492b3821b19-exec-3", "namespace": "default", "phase": "Pending"}
2024-10-22T09:15:05.892Z        INFO    sparkapplication/controller.go:171      Reconciling SparkApplication    {"name": "spark-test", "namespace": "default", "state": "RUNNING"}
2024-10-22T09:15:06.042Z        INFO    sparkapplication/event_handler.go:188   SparkApplication updated        {"name": "spark-test", "namespace": "default", "oldState": "RUNNING", "newState": "RUNNING"}
2024-10-22T09:15:06.061Z        INFO    sparkapplication/controller.go:171      Reconciling SparkApplication    {"name": "spark-test", "namespace": "default", "state": "RUNNING"}
2024-10-22T09:15:08.108Z        INFO    sparkapplication/event_handler.go:84    Spark pod updated       {"name": "spark-pi-95401492b3821b19-exec-1", "namespace": "default", "oldPhase": "Pending", "newPhase": "Running"}
2024-10-22T09:15:08.115Z        INFO    sparkapplication/controller.go:171      Reconciling SparkApplication    {"name": "spark-test", "namespace": "default", "state": "RUNNING"}
2024-10-22T09:15:08.121Z        INFO    sparkapplication/event_handler.go:84    Spark pod updated       {"name": "spark-pi-95401492b3821b19-exec-3", "namespace": "default", "oldPhase": "Pending", "newPhase": "Running"}
2024-10-22T09:15:08.214Z        INFO    sparkapplication/controller.go:171      Reconciling SparkApplication    {"name": "spark-test", "namespace": "default", "state": "RUNNING"}
2024-10-22T09:15:08.214Z        INFO    sparkapplication/event_handler.go:84    Spark pod updated       {"name": "spark-pi-95401492b3821b19-exec-2", "namespace": "default", "oldPhase": "Pending", "newPhase": "Running"}
2024-10-22T09:15:08.216Z        INFO    sparkapplication/event_handler.go:188   SparkApplication updated        {"name": "spark-test", "namespace": "default", "oldState": "RUNNING", "newState": "RUNNING"}
2024-10-22T09:15:08.314Z        INFO    sparkapplication/event_handler.go:188   SparkApplication updated        {"name": "spark-test", "namespace": "default", "oldState": "RUNNING", "newState": "RUNNING"}
2024-10-22T09:15:08.316Z        INFO    sparkapplication/controller.go:171      Reconciling SparkApplication    {"name": "spark-test", "namespace": "default", "state": "RUNNING"}
2024-10-22T09:15:08.338Z        INFO    sparkapplication/controller.go:171      Reconciling SparkApplication    {"name": "spark-test", "namespace": "default", "state": "RUNNING"}

2024-10-22T09:15:28.384Z        INFO    sparkapplication/event_handler.go:99    Spark pod deleted       {"name": "spark-pi-95401492b3821b19-exec-1", "namespace": "default", "phase": "Running"}
2024-10-22T09:15:28.392Z        INFO    sparkapplication/controller.go:171      Reconciling SparkApplication    {"name": "spark-test", "namespace": "default", "state": "RUNNING"}
2024-10-22T09:15:28.475Z        INFO    sparkapplication/event_handler.go:188   SparkApplication updated        {"name": "spark-test", "namespace": "default", "oldState": "RUNNING", "newState": "RUNNING"}
2024-10-22T09:15:28.477Z        INFO    sparkapplication/event_handler.go:99    Spark pod deleted       {"name": "spark-pi-95401492b3821b19-exec-2", "namespace": "default", "phase": "Running"}
2024-10-22T09:15:28.483Z        INFO    sparkapplication/controller.go:171      Reconciling SparkApplication    {"name": "spark-test", "namespace": "default", "state": "RUNNING"}
2024-10-22T09:15:28.512Z        INFO    sparkapplication/event_handler.go:99    Spark pod deleted       {"name": "spark-pi-95401492b3821b19-exec-3", "namespace": "default", "phase": "Running"}
2024-10-22T09:15:28.565Z        INFO    sparkapplication/controller.go:171      Reconciling SparkApplication    {"name": "spark-test", "namespace": "default", "state": "RUNNING"}
2024-10-22T09:15:28.576Z        INFO    sparkapplication/event_handler.go:188   SparkApplication updated        {"name": "spark-test", "namespace": "default", "oldState": "RUNNING", "newState": "RUNNING"}
2024-10-22T09:15:28.730Z        INFO    sparkapplication/event_handler.go:188   SparkApplication updated        {"name": "spark-test", "namespace": "default", "oldState": "RUNNING", "newState": "RUNNING"}
2024-10-22T09:15:28.731Z        INFO    sparkapplication/controller.go:171      Reconciling SparkApplication    {"name": "spark-test", "namespace": "default", "state": "RUNNING"}
2024-10-22T09:15:28.768Z        INFO    sparkapplication/controller.go:171      Reconciling SparkApplication    {"name": "spark-test", "namespace": "default", "state": "RUNNING"}
2024-10-22T09:15:30.321Z        INFO    sparkapplication/event_handler.go:84    Spark pod updated       {"name": "spark-test-driver", "namespace": "default", "oldPhase": "Running", "newPhase": "Succeeded"}
2024-10-22T09:15:30.326Z        INFO    sparkapplication/controller.go:171      Reconciling SparkApplication    {"name": "spark-test", "namespace": "default", "state": "RUNNING"}
2024-10-22T09:15:30.337Z        INFO    sparkapplication/event_handler.go:188   SparkApplication updated        {"name": "spark-test", "namespace": "default", "oldState": "RUNNING", "newState": "SUCCEEDING"}
2024-10-22T09:15:30.343Z        INFO    sparkapplication/controller.go:171      Reconciling SparkApplication    {"name": "spark-test", "namespace": "default", "state": "SUCCEEDING"}
2024-10-22T09:15:30.355Z        INFO    sparkapplication/event_handler.go:188   SparkApplication updated        {"name": "spark-test", "namespace": "default", "oldState": "SUCCEEDING", "newState": "COMPLETED"}
2024-10-22T09:15:30.360Z        INFO    sparkapplication/controller.go:171      Reconciling SparkApplication    {"name": "spark-test", "namespace": "default", "state": "COMPLETED"}

Status of driver pod:

status:
  containerStatuses:
  - image: ""
    imageID: ""
    lastState: {}
    name: ""
    ready: false
    restartCount: 0
    state:
      terminated:
        containerID: containerd://c8821e8da225211b84763e3a3509435e1b7771992dc25556563cff3f6b7fd0a9
        exitCode: 0
        finishedAt: "2024-10-22T09:15:28Z"
        reason: Completed
        startedAt: "2024-10-22T09:14:54Z"
  phase: Succeeded

the executor pods is first running and then deleted

Terminal Output Screenshot(s)

Environment & Versions

  • Spark Operator App version: 2.0.2
  • Helm Chart Version: v3.15.3
  • Kubernetes Version: v1.30.4
  • Apache Spark version: v3.5.2

Additional context

If there is any information missing, you can @ me to provide it~

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions