Skip to content

Retain pods for failed jobs a longer period #102

Open
@Evesy

Description

2.0.0 introduced terminating a pod immediately upon job completion which has proved useful when running many different pipelines with elastic agents, as nodes' resources are freed up quicker which reduces the chances of nodes have to autoscale.

This has introduced some extra difficulties when troubleshooting failed jobs though since the pods are cleaned up immediately it leaves nothing left to debug. It could be useful to set an alternate grace period for pods whose assigned jobs have failed, to give the option to look around the pod.

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions