Skip to content

warn on job submit when job has services but no shutdown_delay #23326

Open
@tgross

Description

A proposal for discussion:

Way back in #2441, we added a configurable shutdown_delay to tasks so that there's a delay between deregistering services from Consul (and now Nomad services as well) and the shutdown of the task. As noted by @schmichael in #2441 (comment), we set this to 0 by default for backwards compatibility.

However, it's arguably a user error to ever deploy a job with service blocks without a shutdown_delay. Downstream services consume the service registration data via polling, blocking queries, or DNS, but all of these methods are all racy if the upstream service isn't given time to gracefully drain connections. The service registration data is inherently "eventually consistent" because the task shutdown takes a non-zero amount of time anyways. Our experience from users reports suggests this isn't obvious to all users, and so we could nudge users to have more reliable deployments by emitting a warning at job submit time.

There are a couple of caveats here:

  • Services are often set at the group level (and must be for Connect), whereas shutdown_delay is set on tasks. Typically each service is only being implemented by one of the many tasks, and the service.task field is not required for anything except script checks. So maybe this warning would only fire if no task has a shutdown_delay?
  • Not all users will actually care, if they have other systems in place to make sure downstream consumers handle this gracefully. For these users this warning will be noise.

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions