Skip to content

Documentation: Best practices for zero-downtime plugin updates #167

@grosskur

Description

@grosskur

I have an NRI plugin running as a DaemonSet. Ideally, I would like to set spec.updateStrategy.rollingUpdate.maxSurge to a positive number, and spec.updateStrategy.rollingUpdate.maxUnavailable to zero to have "make before break" semantics where on each node the new pod starts up and becomes ready, before the old pod is terminated.

However, for my NRI plugin, it's not safe to have two instances running at the same time acting on the same container, since this would result in an action being performed twice on pod startup rather than once. So for now, I am fully terminating the old pod before starting the new pod to prevent overall. And when the new pod comes up, it has to "catch up" to process any events that were missed.

It would be helpful if we could document any known patterns for achieving this kind of zero-downtime update scenario for an NRI plugin, in a way that avoids duplicate processing.

Metadata

Metadata

Assignees

No one assigned

    Labels

    documentationImprovements or additions to documentation

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions