Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(Support Case) fix rook check to allow manual fix when is upgrading from only #4599

Merged
merged 3 commits into from
Jun 20, 2023

Conversation

camilamacedo86
Copy link
Contributor

@camilamacedo86 camilamacedo86 commented Jun 8, 2023

What this PR does / why we need it:

In certain edge cases, while migrating away from Rook, we may encounter issues. Specifically, after we execute a pvmigrate operation to migrate the PVCs and to migrate the Object store, the system may transition to an unhealthy state. This problem appears to be connected to specific modules

[root@rook-ceph-operator-747c86774c-7v95s /]# ceph health detail 
HEALTH_ERR 2 mgr modules have failed
MGR_MODULE_ERROR 2 mgr modules have failed
    Module 'dashboard' has failed: error('No socket could be created',)
    Module 'prometheus' has failed: error('No socket could be created',) 

The proposed workaround ensures a smooth transition during the migration and upgrade processes, ultimately allowing for the successful deletion of Rook. To this end, this PR automates the resolution process by rectifying the Rook Ceph state and allowing the migration to proceed, given that Rook will be removed in the end. It's important to note that this automated fix is only applied during the checks performed when we are in the process of migrating away from Rook and when Rook's removal is the intended outcome.

Because of this we are duplicating the check and ensuring that just this process will use it.

Which issue(s) this PR fixes:

Fixes # [sc-79289]

Special notes for your reviewer:

This automated fix has been implemented specifically to mitigate unnecessary support calls in scenarios where they aren't required. It's worth noting that we perform a thorough status recheck after applying the workaround, prior to continuing with the process.

Steps to reproduce

Does this PR introduce a user-facing change?

Fixes Rook Ceph status prometheus and modules when migrating from Rook and we check the status is unhealthy with the errors `Module 'dashboard' has failed` and ` Module 'prometheus' has failed` to allow the migration continue.

Does this PR require documentation?

Comment on lines 400 to 408
echo ""
echo output
echo ""

if [[ $output == *"Module 'dashboard' has failed"* ]] || [[ $output == *"Module 'prometheus'"* ]]; then
echo "Disable modules to try fix status"
kubectl -n rook-ceph exec deployment/rook-ceph-tools -- ceph mgr module disable prometheus
kubectl -n rook-ceph exec deployment/rook-ceph-tools -- ceph mgr module disable dashboard
fi
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is the DELTA ONLY

@replicatedhq replicatedhq deleted a comment from github-actions bot Jun 8, 2023
@replicatedhq replicatedhq deleted a comment from github-actions bot Jun 8, 2023
@replicatedhq replicatedhq deleted a comment from github-actions bot Jun 8, 2023
@replicatedhq replicatedhq deleted a comment from github-actions bot Jun 8, 2023
@replicatedhq replicatedhq deleted a comment from github-actions bot Jun 8, 2023
@camilamacedo86 camilamacedo86 added type::bug Something isn't working bug::normal labels Jun 8, 2023
@camilamacedo86 camilamacedo86 changed the title fix rook check to allow manual fix when is upgrading from only (Support Case) fix rook check to allow manual fix when is upgrading from only Jun 8, 2023
scripts/common/rook.sh Outdated Show resolved Hide resolved
@github-actions
Copy link
Contributor

github-actions bot commented Jun 9, 2023

@github-actions
Copy link
Contributor

github-actions bot commented Jun 9, 2023

@github-actions
Copy link
Contributor

github-actions bot commented Jun 9, 2023

@github-actions
Copy link
Contributor

github-actions bot commented Jun 9, 2023

@github-actions
Copy link
Contributor

github-actions bot commented Jun 9, 2023

@github-actions
Copy link
Contributor

github-actions bot commented Jun 9, 2023

@github-actions
Copy link
Contributor

github-actions bot commented Jun 9, 2023

@github-actions
Copy link
Contributor

github-actions bot commented Jun 9, 2023

@github-actions
Copy link
Contributor

github-actions bot commented Jun 9, 2023

@github-actions
Copy link
Contributor

github-actions bot commented Jun 9, 2023

rrpolanco
rrpolanco previously approved these changes Jun 9, 2023
scripts/common/rook.sh Outdated Show resolved Hide resolved
Copy link
Contributor Author

@camilamacedo86 camilamacedo86 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rrpolanco I found a nit: c236023

Could you please re-check again?

rrpolanco
rrpolanco previously approved these changes Jun 9, 2023
scripts/common/rook.sh Show resolved Hide resolved
scripts/common/rook.sh Show resolved Hide resolved
@camilamacedo86 camilamacedo86 merged commit 1c6f121 into main Jun 20, 2023
@camilamacedo86 camilamacedo86 deleted the fix-cust1 branch June 20, 2023 10:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug::normal type::bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants