-
Notifications
You must be signed in to change notification settings - Fork 41
Casm 5740 fm ha #6405
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: release/1.7
Are you sure you want to change the base?
Casm 5740 fm ha #6405
Conversation
- initial place holder for FM on baremetal docs
Signed-off-by: ravikanth-nalla-hpe <140072234+ravikanth-nalla-hpe@users.noreply.github.com>
Signed-off-by: ravikanth-nalla-hpe <140072234+ravikanth-nalla-hpe@users.noreply.github.com>
Signed-off-by: ravikanth-nalla-hpe <140072234+ravikanth-nalla-hpe@users.noreply.github.com>
Signed-off-by: ravikanth-nalla-hpe <140072234+ravikanth-nalla-hpe@users.noreply.github.com>
Signed-off-by: ravikanth-nalla-hpe <140072234+ravikanth-nalla-hpe@users.noreply.github.com>
Signed-off-by: ravikanth-nalla-hpe <140072234+ravikanth-nalla-hpe@users.noreply.github.com>
Signed-off-by: ravikanth-nalla-hpe <140072234+ravikanth-nalla-hpe@users.noreply.github.com>
Signed-off-by: ravikanth-nalla-hpe <140072234+ravikanth-nalla-hpe@users.noreply.github.com>
Signed-off-by: ravikanth-nalla-hpe <140072234+ravikanth-nalla-hpe@users.noreply.github.com>
Signed-off-by: ravikanth-nalla-hpe <140072234+ravikanth-nalla-hpe@users.noreply.github.com>
Signed-off-by: ravikanth-nalla-hpe <140072234+ravikanth-nalla-hpe@users.noreply.github.com>
Signed-off-by: ravikanth-nalla-hpe <140072234+ravikanth-nalla-hpe@users.noreply.github.com>
Signed-off-by: ravikanth-nalla-hpe <140072234+ravikanth-nalla-hpe@users.noreply.github.com>
Signed-off-by: ravikanth-nalla-hpe <140072234+ravikanth-nalla-hpe@users.noreply.github.com>
Signed-off-by: ravikanth-nalla-hpe <140072234+ravikanth-nalla-hpe@users.noreply.github.com>
Signed-off-by: ravikanth-nalla-hpe <140072234+ravikanth-nalla-hpe@users.noreply.github.com>
Signed-off-by: ravikanth-nalla-hpe <140072234+ravikanth-nalla-hpe@users.noreply.github.com>
Signed-off-by: ravikanth-nalla-hpe <140072234+ravikanth-nalla-hpe@users.noreply.github.com>
Signed-off-by: ravikanth-nalla-hpe <140072234+ravikanth-nalla-hpe@users.noreply.github.com>
…n_Baremetal.md Signed-off-by: ravikanth-nalla-hpe <140072234+ravikanth-nalla-hpe@users.noreply.github.com>
Signed-off-by: ravikanth-nalla-hpe <140072234+ravikanth-nalla-hpe@users.noreply.github.com>
Signed-off-by: ravikanth-nalla-hpe <140072234+ravikanth-nalla-hpe@users.noreply.github.com>
Signed-off-by: ravikanth-nalla-hpe <140072234+ravikanth-nalla-hpe@users.noreply.github.com>
Signed-off-by: ravikanth-nalla-hpe <140072234+ravikanth-nalla-hpe@users.noreply.github.com>
| ## Note: | ||
|
|
||
| * Fabric Manager Nodes (`FMNs`) can be added only after the CSM upgrade has been completed. | ||
| * By default, Fabric Manager on baremetal is disabled. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be rephrased to something like -
By default, Fabric Manager would be running on kubernetes as a pod
|
NCN add procedure document needs to be referenced from the top level document and modified to support Fabric Manager also - Things to note - Boot - Add switch configuration - Update firmware - |
Signed-off-by: ravikanth-nalla-hpe <140072234+ravikanth-nalla-hpe@users.noreply.github.com>
|
|
||
| * Validate SHCD with respect to FMNs | ||
| * Map FMNs in the SHCD to the node type: `Management_FabricManager` when building the CCJ file | ||
| * Generate switch configuration for the node based on the new Role: `Management` , SubRole: `FabricManager` pairing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generating switch configuration does not happen at this stage. This step also needs to be removed
| [Slingshot Host Software](../../glossary.md#slingshot-host-software-shs) | ||
|
|
||
| ### FM | ||
| [Fabric Manager](...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Glossary needs to be updated with description and linked here. That might be the part of next step, just adding a comment to not overlook. Same for FMN also
Signed-off-by: ravikanth-nalla-hpe <140072234+ravikanth-nalla-hpe@users.noreply.github.com>
Signed-off-by: ravikanth-nalla-hpe <140072234+ravikanth-nalla-hpe@users.noreply.github.com>
Signed-off-by: ravikanth-nalla-hpe <140072234+ravikanth-nalla-hpe@users.noreply.github.com>
Signed-off-by: ravikanth-nalla-hpe <140072234+ravikanth-nalla-hpe@users.noreply.github.com>
Signed-off-by: ravikanth-nalla-hpe <140072234+ravikanth-nalla-hpe@users.noreply.github.com>
Signed-off-by: ravikanth-nalla-hpe <140072234+ravikanth-nalla-hpe@users.noreply.github.com>
Signed-off-by: ravikanth-nalla-hpe <140072234+ravikanth-nalla-hpe@users.noreply.github.com>
Signed-off-by: ravikanth-nalla-hpe <140072234+ravikanth-nalla-hpe@users.noreply.github.com>
Signed-off-by: ravikanth-nalla-hpe <140072234+ravikanth-nalla-hpe@users.noreply.github.com>
|
|
||
| Verify that the BMC of each FMN is configured with the correct root user credentials. | ||
|
|
||
| ### Perform CANU validation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CANU validation will be covered as part of Add Procedure. So this section could be removed
|
|
||
| After creating the FMN base image, add FMN nodes to CSM by following the [NCN add procedure](../../operations/node_management/Add_Remove_Replace_NCNs/Add_Remove_Replace_NCNs.md) | ||
|
|
||
| **Note:** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This note can be removed now since they are modified in the Add Procedure
|
|
||
| After completion of the NCN add procedure, SLS, HSM, and BSS will contain the corresponding FMN data. | ||
|
|
||
| The following checks can be used to verify that the updates have been correctly applied: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These steps can be moved to final validation section
| cray bss bootparameters list --hosts Global --format json | ||
| ``` | ||
|
|
||
| ### Update Switch Configuration With CANU |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updating Switch Configuration is also handled as part of Add Procedure. Should be removed from here
| Take extreme care when manipulating ACLs, if CANU suggests moving a "permit any ..." rule be sure to create the new rule before removing the old one. It is possible to lose access to the switch if the ACLs are not applied in the correct order. | ||
|
|
||
|
|
||
| ## FMN Booting |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Node Booting is also present as part of Add Procedure. Needs to be removed from here
| Check NMN, CMN, HMN, CHN, metal and virtual IP configuration for both FMN nodes (`fmn001` and `fmn002`). | ||
|
|
||
| ```bash | ||
| ncn-m001:~/sav/csm-config # cray sls networks list |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| ncn-m001:~/sav/csm-config # cray sls networks list | |
| ncn-m001:~ # cray sls networks list |
|
|
||
| For install/ upgrade Fabric Manager on the FMNs please refer [FabricManager Install/ Upgrade](...) | ||
|
|
||
| ## Uninstall FMN Helm Chart |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This section could be removed from CSM documentation. It makes sense for Slingshot team to handle it
|
|
||
| ## Introduction | ||
|
|
||
| The Fabric Manager (FM) bare-metal enablement within the Cray System Management (CSM) framework introduces dedicated Fabric Manager Nodes (FMNs) that manage and monitor Slingshot fabric operations outside of a Kubernetes environment. While the overall bare-metal Fabric Manager solution is described in the Slingshot Fabric Manager HA documentation <reference>, this CSM detail design document focuses specifically on the CSM-level enhancements required to integrate and support FMNs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Last line could be modified. This is not a CSM detail design document.
| --mac-lan1 b8:59:9f:d9:9d:e9 | ||
| ``` | ||
| * Optional: For FMNs (Fabric Manager Nodes), we need to pass on `--fmn-image-id` parameter with FMN base image ID generated in the [FMN base image creation stage] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| * Optional: For FMNs (Fabric Manager Nodes), we need to pass on `--fmn-image-id` parameter with FMN base image ID generated in the [FMN base image creation stage] | |
| * Optional: For FMNs (Fabric Manager Nodes) where alias is fmn00*, we need to pass additional `--fmn-image-id` parameter with FMN base image ID generated in the [FMN base image creation stage] |
| ``` | ||
| * Optional: For FMNs (Fabric Manager Nodes), we need to pass on `--fmn-image-id` parameter with FMN base image ID generated in the [FMN base image creation stage] | ||
| (https://github.com/Cray-HPE/docs-csm/blob/CASM-5740-fm-ha/operations/fm_on_baremetal/Configure_FM_On_Baremetal.md#fmn-base-image-creation). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The link is not displayed properly
| For Example: Base image id of FMN is `06135c73-bcd9-4d38-928f-ada20bdf6a6` | ||
| ```bash | ||
| cd /usr/share/doc/csm/scripts/operations/node_management/Add_Remove_Replace_NCNs/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
--fmn-image-id parameter needs to be added for this command also.
| ``` | ||
|
|
||
| ## Add worker, storage, or master NCNs | ||
| 2. Optional: For adding FMNs (Fabric Manager Nodes) to CSM there is a new prompt added to confirm if the node getting added is an FMN or not: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not an optional prompt. This should be clubbed with above prompt only
| Use this procedure to add a worker, storage, or master NCN. | ||
| Existing prompt to add number of NCN nodes: | ||
|
|
||
| ```text |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There can be two examples provided - one with FMN prompt with no (existing case) and another with FMN prompt with yes (showing FMN details along with fmn-vips)
|
|
||
| ### Add FMN Nodes to CSM | ||
|
|
||
| After creating the FMN base image, add FMN nodes to CSM by following the [NCN add procedure](../../operations/node_management/Add_Remove_Replace_NCNs/Add_Remove_Replace_NCNs.md) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should point to specific add section instead of top level document which actually handles Add, Remove and Replace procedures.
https://github.com/Cray-HPE/docs-csm/blob/CASM-5740-fm-ha/operations/node_management/Add_Remove_Replace_NCNs/Add_Remove_Replace_NCNs.md#add-worker-storage-master-or-fmnfabric-manager-node-ncns
|
|
||
| ### Add FMN Nodes to CSM | ||
|
|
||
| After creating the FMN base image, add FMN nodes to CSM by following the [NCN add procedure](../../operations/node_management/Add_Remove_Replace_NCNs/Add_Remove_Replace_NCNs.md) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can explicitly mention to follow step 1 to step 6 of Add Procedure.
Also, a note could be added at the end of https://github.com/Cray-HPE/docs-csm/blob/CASM-5740-fm-ha/operations/node_management/Add_Remove_Replace_NCNs/Boot_NCN.md
to skip the next steps for FMN
| **NOTE**: | ||
|
|
||
| * `FMNs` are considered Management nodes. | ||
| * FM cannot be disabled after it has been enabled. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be removed completely as explicit mentioning would not be needed (or) rephrased to -
After Fabric Manager is migrated from a Kubernetes pod to bare-metal infrastructure, it cannot be reverted.
|
|
||
| * Fabric Manager Nodes (`FMNs`) can be added only after the CSM upgrade has been completed. | ||
| * By default, Fabric Manager on baremetal is disabled. | ||
| * Once enabled, Fabric Manager on baremetal cannot be disabled. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could either be removed as mentioning explicitly is not needed (or) rephrased to -
After Fabric Manager is migrated from a Kubernetes pod to bare-metal infrastructure, it cannot be reverted.
Description
Checklist
.github/CODEOWNERSwith the corresponding team in Cray-HPE.