Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure that driver is deleted prior to sparkapplication resubmission #1521

Conversation

khorshuheng
Copy link
Contributor

@khorshuheng khorshuheng commented Apr 28, 2022

Under rare circumstances, a Spark driver pod could have been created despite the fact that spark submission attempt failed. The driver pod might not necessarily be running, it could be stuck in Pending state due to failure of creating the associated configmap, for example. In such situation, the current Spark operator would not attempt to resubmit the job, nor would the submission count attempt increase. As a result, the sparkapplication will be stuck in the submission failure state indefinitely.

This MR deletes the associated spark resource in case such situation arise, which will allow the spark operator to resubmit the job again.

Signed-off-by: Khor Shu Heng <khor.heng@gojek.com>
Signed-off-by: Khor Shu Heng <khor.heng@gojek.com>
Signed-off-by: Khor Shu Heng <khor.heng@gojek.com>
@liyinan926 liyinan926 merged commit d7a85bd into kubeflow:master May 12, 2022
jbhalodia-slack pushed a commit to jbhalodia-slack/spark-operator that referenced this pull request Oct 4, 2024
…ubeflow#1521)

* Ensure that driver is deleted prior to sparkapplication resubmission

Signed-off-by: Khor Shu Heng <khor.heng@gojek.com>

* Update app version

Signed-off-by: Khor Shu Heng <khor.heng@gojek.com>

* Update chart version

Signed-off-by: Khor Shu Heng <khor.heng@gojek.com>

Co-authored-by: Khor Shu Heng <khor.heng@gojek.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants