Reminder: If any problems are encountered and the procedure or command output does not provide relevant guidance, see Relevant troubleshooting links for upgrade-related issues.
-
(
ncn-m001#
) If a typescript session is already running in the shell, then first stop it with theexit
command. -
(
ncn-m001#
) Start a typescript.script -af /root/csm_upgrade.$(date +%Y%m%d_%H%M%S).stage_1.txt export PS1='\u@\H \D{%Y-%m-%d} \t \w # '
If additional shells are opened during this procedure, then record those with typescripts as well. When resuming a procedure after a break, always be sure that a typescript is running before proceeding.
Before starting Storage node image upgrade, access the Argo UI to view the progress of this stage. Note that the progress for the current stage will not show up in Argo before the storage node image upgrade script has been started.
For more information, see Using the Argo UI and Using Argo Workflows.
(ncn-m001#
) Run ncn-upgrade-worker-storage-nodes.sh
with the --upgrade
flag for all storage nodes to be upgraded. Provide the storage nodes in a comma-separated list, such as ncn-s001,ncn-s002,ncn-s003
. This upgrades the storage nodes sequentially.
/usr/share/doc/csm/upgrade/scripts/upgrade/ncn-upgrade-worker-storage-nodes.sh ncn-s001,ncn-s002,ncn-s003 --upgrade
NOTE
It is possible to upgrade a single storage node at a time using the following command.
/usr/share/doc/csm/upgrade/scripts/upgrade/ncn-upgrade-worker-storage-nodes.sh ncn-s001 --upgrade
Storage node image upgrade troubleshooting
- The best troubleshooting tool for this stage is the Argo UI. Information about accessing this UI and about using Argo Workflows is above.
- If the upgrade is 'waiting for Ceph
HEALTH_OK
', the output from commandsceph -s
andceph health detail
should provide information.- If a crash has occurred, dumping the Ceph crash data will return Ceph to healthy state and allow the upgrade to continue. The crash should be evaluated to determine if there is an issue that should be addressed.
- Refer to storage troubleshooting documentation for Ceph related issues.
- Refer to troubleshoot Ceph image with tag:'<none>' if running
podman images
on a storage node shows an image with tag:<none>.
(ncn-m001#
) Run the following commands to enable the rbd
stats collection on the pools.
ceph config set mgr mgr/prometheus/rbd_stats_pools "kube,smf"
ceph config set mgr mgr/prometheus/rbd_stats_pools_refresh_interval 600
For any typescripts that were started during this stage, stop them with the exit
command.
All the Ceph nodes have been rebooted into the new image.
This stage is completed. Continue to Stage 2.