Description
A proposal for discussion:
Way back in #2441, we added a configurable shutdown_delay
to tasks so that there's a delay between deregistering services from Consul (and now Nomad services as well) and the shutdown of the task. As noted by @schmichael in #2441 (comment), we set this to 0 by default for backwards compatibility.
However, it's arguably a user error to ever deploy a job with service
blocks without a shutdown_delay
. Downstream services consume the service registration data via polling, blocking queries, or DNS, but all of these methods are all racy if the upstream service isn't given time to gracefully drain connections. The service registration data is inherently "eventually consistent" because the task shutdown takes a non-zero amount of time anyways. Our experience from users reports suggests this isn't obvious to all users, and so we could nudge users to have more reliable deployments by emitting a warning at job submit time.
There are a couple of caveats here:
- Services are often set at the
group
level (and must be for Connect), whereasshutdown_delay
is set on tasks. Typically each service is only being implemented by one of the many tasks, and theservice.task
field is not required for anything except script checks. So maybe this warning would only fire if no task has ashutdown_delay
? - Not all users will actually care, if they have other systems in place to make sure downstream consumers handle this gracefully. For these users this warning will be noise.