Open
Description
Backgroud
Now we have 12 vmagent shards and there are 2 replicas of each shard.
When we apply changes to the crd VMAgent
, the operator restarts the pods of VMAgent one by one. For each pod it will consume about 2 minutes to trigger the pod restart and wait it ready. I'll spend about 50 minutes to watch the operation process. So I'm wondering that we could modify the restart process to speed up the operation process.
What I want
Since we have 2 replicas for each shard, we could restart one replica of every shard. The full steps will be
- we restart replica 0 of every shard
- wait all replica 0 get ready
- restart replica 1 of every shard
- wait all replicas 1 get ready
If we change the deloy process to this way, the upgrading process should complete more quickly.
Futhermore, we can upgrade restart every shard concurrently if shardCount won't change.
cc @f41gh7