Skip to content

Conversation

@ihorlukianov
Copy link
Contributor

@ihorlukianov ihorlukianov commented Aug 11, 2025

Summary:

This PR introduces an option to configure the restartPolicy for batch jobs, allowing it to be set to Never instead of the current hardcoded OnFailure.

Problem:

Currently, the restartPolicy for batch jobs is hardcoded to OnFailure. In deployments that use sidecar containers (e.g., Istio), this can lead to issues. When the main container fails and restarts, the sidecar may be prematurely terminated by a wrapper (like scuttle), affecting subsequent attempts. This means the job cannot recover properly on its own.

Solution:

This change allows users to configure the restartPolicy to Never. When set to Never, a failed job will trigger a new pod instead of just restarting the container within the same pod. This ensures that a fresh, uncompromised environment is created for each retry, resolving the sidecar termination issue and enabling proper job recovery.

Impact:

This change provides greater flexibility for users with complex deployment configurations, particularly those relying on service meshes or other sidecar patterns. The new configuration option is opt-in, so existing deployments will not be affected unless they explicitly configure the new policy.

@potiuk
Copy link
Member

potiuk commented Aug 20, 2025

Would it be possible to add helm tests for it ?

@ihorlukianov
Copy link
Contributor Author

@potiuk, added UTs covering this change. Thanks!

@potiuk potiuk merged commit b48260d into apache:main Aug 21, 2025
72 checks passed
@potiuk
Copy link
Member

potiuk commented Aug 21, 2025

Nice! thanks for being responsive !

mangal-vairalkar pushed a commit to mangal-vairalkar/airflow that referenced this pull request Aug 30, 2025
* Allow setting restartPolicy for batch jobs in chart

* fix static checks

* Added UTs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:helm-chart Airflow Helm Chart

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants