Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: support pause and resume reconciliation of a cluster #7435

Open
wants to merge 28 commits into
base: main
Choose a base branch
from

Conversation

yipeng1030
Copy link
Contributor

@yipeng1030 yipeng1030 commented May 28, 2024

support cluster pause and resume

  • Fixes [Features] Support Pause and Resume Reconcilation of a Cluster #6969

  • Pause Cluster, Component, and InstanSet
    Annotate the cluster, cascade pause Components (by annotating) and InstanceSets (reuse the Paused field), after pausing, the three controllers will only handle delete operations.

  • Pause Reconfigure and Configuration
    Asynchronous methods are not paused currently. (configconstraint.spec.reloadAction.*Trigger.sync = false)
    For synchronous reconfiguration operations, sending ops and modifying config are performed by the configuration operator rendering the configmap, and the changes to the configmap are implemented in the engine via Reconfigure_controller.go, thus the Configuration and Reconfigure controllers need to be paused.
    When the cluster resumes, modifying the corresponding configmap and configuration annotations will trigger a round of configuration tuning, and changes made during the pause will be applied.

  • Pause Backup
    The backup operation aims to record the true status of the cluster, and after restoration from the backup, the cluster should serve exactly the same as before. The Spec of a paused cluster is different from its real status, and there is basically no way to fetch that information, thus Kubeblocks do not support backing up a paused cluster.

pause a cluster:
kubectl annotate cluster CLUSTER_NAME controller.kubeblocks.io/controller-paused="true"
resume a cluster:
kubectl annotate cluster CLUSTER_NAME controller.kubeblocks.io/controller-paused-

@github-actions github-actions bot added the size/L Denotes a PR that changes 100-499 lines. label May 28, 2024
@yipeng1030 yipeng1030 changed the title Featue/support cluster pause and resume feat: support cluster pause and resume May 28, 2024
@CLAassistant
Copy link

CLAassistant commented May 28, 2024

CLA assistant check
All committers have signed the CLA.

@weicao weicao changed the title feat: support cluster pause and resume feat: support pause and resume reconcilation of a cluster May 29, 2024
controllers/apps/utils.go Outdated Show resolved Hide resolved
controllers/apps/configuration/configuration_controller.go Outdated Show resolved Hide resolved
controllers/apps/configuration/reconfigure_controller.go Outdated Show resolved Hide resolved
controllers/apps/transformer_cluster_pause.go Outdated Show resolved Hide resolved
controllers/apps/transformer_cluster_pause.go Outdated Show resolved Hide resolved
@free6om free6om added this to the Release 0.9.0 milestone May 29, 2024
@apecloud-bot apecloud-bot added the pre-approve Fork PR Pre Approve Test label May 31, 2024
controllers/apps/configuration/configuration_controller.go Outdated Show resolved Hide resolved
controllers/apps/transformer_cluster_pause.go Outdated Show resolved Hide resolved
controllers/apps/transformer_cluster_pause.go Outdated Show resolved Hide resolved
}
}

if hasPaused {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hasPaused is calculated based on component objects, so it should set the dependencies for components to those CM objects explicitly.

controllers/apps/transformer_cluster_pause.go Outdated Show resolved Hide resolved
controllers/apps/transformer_component_pause.go Outdated Show resolved Hide resolved
controllers/apps/transformer_component_pause.go Outdated Show resolved Hide resolved
controllers/extensions/const.go Outdated Show resolved Hide resolved
@apecloud-bot apecloud-bot removed the pre-approve Fork PR Pre Approve Test label Jun 4, 2024
@apecloud-bot apecloud-bot added the pre-approve Fork PR Pre Approve Test label Jul 16, 2024
@yipeng1030 yipeng1030 marked this pull request as ready for review July 16, 2024 12:57
@apecloud-bot apecloud-bot added pre-approve Fork PR Pre Approve Test and removed pre-approve Fork PR Pre Approve Test labels Jul 17, 2024
@apecloud-bot apecloud-bot added pre-approve Fork PR Pre Approve Test and removed pre-approve Fork PR Pre Approve Test labels Jul 17, 2024
Copy link

codecov bot commented Jul 17, 2024

Codecov Report

Attention: Patch coverage is 87.32394% with 18 lines in your changes missing coverage. Please review.

Project coverage is 61.38%. Comparing base (0ceeaa6) to head (cef893c).
Report is 3 commits behind head on main.

Files Patch % Lines
...ers/apps/configuration/configuration_controller.go 0.00% 3 Missing and 1 partial ⚠️
...llers/apps/configuration/reconfigure_controller.go 0.00% 3 Missing and 1 partial ⚠️
pkg/controller/model/transform_utils.go 0.00% 4 Missing ⚠️
controllers/apps/transform_utils.go 94.23% 2 Missing and 1 partial ⚠️
controllers/apps/transformer_cluster_pause.go 91.17% 2 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #7435      +/-   ##
==========================================
- Coverage   64.92%   61.38%   -3.54%     
==========================================
  Files         345      437      +92     
  Lines       42942    52098    +9156     
==========================================
+ Hits        27879    31982    +4103     
- Misses      12619    17472    +4853     
- Partials     2444     2644     +200     
Flag Coverage Δ
unittests 61.38% <87.32%> (-3.54%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@apecloud-bot apecloud-bot added pre-approve Fork PR Pre Approve Test and removed pre-approve Fork PR Pre Approve Test labels Jul 18, 2024
@yipeng1030
Copy link
Contributor Author

ready for review

@@ -131,6 +131,8 @@ func (r *ClusterReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ct
&clusterHaltTransformer{},
// handle cluster deletion
&clusterDeletionTransformer{},
// handle cluster pause and resume
&clusterPauseTransformer{},
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the pause and resume operations be executed before all transformers? According to your design, can a cluster that is being deleted be paused?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deletion has a higher priority than pause in my design, which is refer to the design of rollout pause of the k8s deployment.

@apecloud-bot apecloud-bot added pre-approve Fork PR Pre Approve Test and removed pre-approve Fork PR Pre Approve Test labels Jul 19, 2024
@apecloud-bot apecloud-bot added pre-approve Fork PR Pre Approve Test and removed pre-approve Fork PR Pre Approve Test labels Jul 22, 2024
@yipeng1030 yipeng1030 changed the title feat: support pause and resume reconcilation of a cluster feat: support pause and resume reconciliation of a cluster Jul 22, 2024
@github-actions github-actions bot modified the milestones: Release 0.9.1, Release 0.9.2 Aug 8, 2024
@github-actions github-actions bot modified the milestones: Release 0.9.2, Release 0.8.5 Oct 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/user-interaction pre-approve Fork PR Pre Approve Test size/L Denotes a PR that changes 100-499 lines.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Features] Support Pause and Resume Reconcilation of a Cluster
9 participants