Open
Description
Description
Since #56911 we can create (or delete) snapshots concurrently and we are limited to 1000 operations at a time. But we don't have any limit on the number of shards these snapshots can contain, and in a cluster with many shards this can end up with hundred thousands shards waiting to be snapshotted.
I think we could introduce a limit on the maximum number of shards a cluster can snapshot a a time and reject any new snapshot creation that would cause this limit to be exceeded (without adding it to the cluster state as a new snapshot-in-progress entry).
This would also serve as a cheap back-pressure mechanism in case aggreassive SLM policies are creating new snapshots faster than the cluster can snapshot the shards.