Open
Description
If a node holding a primary shard leaves the cluster then one of the replica shards is immediately promoted to primary to replace the failed copy. Today if there was a snapshot ongoing when the promotion happens then the corresponding shard-level snapshot fails and the overall snapshot status is at best PARTIAL
. This is a problem for graceful shutdowns (#70338), which ideally would not result in any such failures. In cases where a replica is promoted to replace a failed primary it would be better instead to retry the shard-level snapshot on the new primary.
This isn't the first time this idea has come up: