Skip to content

Conversation

@ravikanth-nalla-hpe
Copy link
Contributor

Description

Checklist

  • If I added any command snippets, the steps they belong to follow the prompt conventions (see example).
  • If I added a new directory, I also updated .github/CODEOWNERS with the corresponding team in Cray-HPE.
  • My commits or Pull-Request Title contain my JIRA information, or I do not have a JIRA.

ravikanth-nalla-hpe and others added 25 commits November 14, 2025 10:38
- initial place holder for FM on baremetal docs
Signed-off-by: ravikanth-nalla-hpe <140072234+ravikanth-nalla-hpe@users.noreply.github.com>
Signed-off-by: ravikanth-nalla-hpe <140072234+ravikanth-nalla-hpe@users.noreply.github.com>
Signed-off-by: ravikanth-nalla-hpe <140072234+ravikanth-nalla-hpe@users.noreply.github.com>
Signed-off-by: ravikanth-nalla-hpe <140072234+ravikanth-nalla-hpe@users.noreply.github.com>
Signed-off-by: ravikanth-nalla-hpe <140072234+ravikanth-nalla-hpe@users.noreply.github.com>
Signed-off-by: ravikanth-nalla-hpe <140072234+ravikanth-nalla-hpe@users.noreply.github.com>
Signed-off-by: ravikanth-nalla-hpe <140072234+ravikanth-nalla-hpe@users.noreply.github.com>
Signed-off-by: ravikanth-nalla-hpe <140072234+ravikanth-nalla-hpe@users.noreply.github.com>
Signed-off-by: ravikanth-nalla-hpe <140072234+ravikanth-nalla-hpe@users.noreply.github.com>
Signed-off-by: ravikanth-nalla-hpe <140072234+ravikanth-nalla-hpe@users.noreply.github.com>
Signed-off-by: ravikanth-nalla-hpe <140072234+ravikanth-nalla-hpe@users.noreply.github.com>
Signed-off-by: ravikanth-nalla-hpe <140072234+ravikanth-nalla-hpe@users.noreply.github.com>
Signed-off-by: ravikanth-nalla-hpe <140072234+ravikanth-nalla-hpe@users.noreply.github.com>
Signed-off-by: ravikanth-nalla-hpe <140072234+ravikanth-nalla-hpe@users.noreply.github.com>
Signed-off-by: ravikanth-nalla-hpe <140072234+ravikanth-nalla-hpe@users.noreply.github.com>
Signed-off-by: ravikanth-nalla-hpe <140072234+ravikanth-nalla-hpe@users.noreply.github.com>
Signed-off-by: ravikanth-nalla-hpe <140072234+ravikanth-nalla-hpe@users.noreply.github.com>
Signed-off-by: ravikanth-nalla-hpe <140072234+ravikanth-nalla-hpe@users.noreply.github.com>
Signed-off-by: ravikanth-nalla-hpe <140072234+ravikanth-nalla-hpe@users.noreply.github.com>
…n_Baremetal.md

Signed-off-by: ravikanth-nalla-hpe <140072234+ravikanth-nalla-hpe@users.noreply.github.com>
Signed-off-by: ravikanth-nalla-hpe <140072234+ravikanth-nalla-hpe@users.noreply.github.com>
Signed-off-by: ravikanth-nalla-hpe <140072234+ravikanth-nalla-hpe@users.noreply.github.com>
Signed-off-by: ravikanth-nalla-hpe <140072234+ravikanth-nalla-hpe@users.noreply.github.com>
Signed-off-by: ravikanth-nalla-hpe <140072234+ravikanth-nalla-hpe@users.noreply.github.com>
## Note:

* Fabric Manager Nodes (`FMNs`) can be added only after the CSM upgrade has been completed.
* By default, Fabric Manager on baremetal is disabled.
Copy link
Collaborator

@sravani-sanigepalli sravani-sanigepalli Nov 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be rephrased to something like -
By default, Fabric Manager would be running on kubernetes as a pod

@sravani-sanigepalli
Copy link
Collaborator

NCN add procedure document needs to be referenced from the top level document and modified to support Fabric Manager also -
NCN add procedure

Things to note -
The above procedure also covers booting the nodes, adding switch configuration, updating firmware. These sections would need to reviewed and confirmed if they can be referenced from here directly instead of creating new sections in the main document. Its better to do it this way, since the current add procedure already have the complete flow setup.

Boot -

Boot NCN

Add switch configuration -

Add Switch Config

Update firmware -

Update Firmware

Signed-off-by: ravikanth-nalla-hpe <140072234+ravikanth-nalla-hpe@users.noreply.github.com>

* Validate SHCD with respect to FMNs
* Map FMNs in the SHCD to the node type: `Management_FabricManager` when building the CCJ file
* Generate switch configuration for the node based on the new Role: `Management` , SubRole: `FabricManager` pairing
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generating switch configuration does not happen at this stage. This step also needs to be removed

[Slingshot Host Software](../../glossary.md#slingshot-host-software-shs)

### FM
[Fabric Manager](...)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Glossary needs to be updated with description and linked here. That might be the part of next step, just adding a comment to not overlook. Same for FMN also

Signed-off-by: ravikanth-nalla-hpe <140072234+ravikanth-nalla-hpe@users.noreply.github.com>
Signed-off-by: ravikanth-nalla-hpe <140072234+ravikanth-nalla-hpe@users.noreply.github.com>
Signed-off-by: ravikanth-nalla-hpe <140072234+ravikanth-nalla-hpe@users.noreply.github.com>
Signed-off-by: ravikanth-nalla-hpe <140072234+ravikanth-nalla-hpe@users.noreply.github.com>
Signed-off-by: ravikanth-nalla-hpe <140072234+ravikanth-nalla-hpe@users.noreply.github.com>
Signed-off-by: ravikanth-nalla-hpe <140072234+ravikanth-nalla-hpe@users.noreply.github.com>
Signed-off-by: ravikanth-nalla-hpe <140072234+ravikanth-nalla-hpe@users.noreply.github.com>
Signed-off-by: ravikanth-nalla-hpe <140072234+ravikanth-nalla-hpe@users.noreply.github.com>
Signed-off-by: ravikanth-nalla-hpe <140072234+ravikanth-nalla-hpe@users.noreply.github.com>

Verify that the BMC of each FMN is configured with the correct root user credentials.

### Perform CANU validation
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CANU validation will be covered as part of Add Procedure. So this section could be removed


After creating the FMN base image, add FMN nodes to CSM by following the [NCN add procedure](../../operations/node_management/Add_Remove_Replace_NCNs/Add_Remove_Replace_NCNs.md)

**Note:**
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This note can be removed now since they are modified in the Add Procedure


After completion of the NCN add procedure, SLS, HSM, and BSS will contain the corresponding FMN data.

The following checks can be used to verify that the updates have been correctly applied:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These steps can be moved to final validation section

cray bss bootparameters list --hosts Global --format json
```

### Update Switch Configuration With CANU
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updating Switch Configuration is also handled as part of Add Procedure. Should be removed from here

Take extreme care when manipulating ACLs, if CANU suggests moving a "permit any ..." rule be sure to create the new rule before removing the old one. It is possible to lose access to the switch if the ACLs are not applied in the correct order.


## FMN Booting
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Node Booting is also present as part of Add Procedure. Needs to be removed from here

Check NMN, CMN, HMN, CHN, metal and virtual IP configuration for both FMN nodes (`fmn001` and `fmn002`).

```bash
ncn-m001:~/sav/csm-config # cray sls networks list
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ncn-m001:~/sav/csm-config # cray sls networks list
ncn-m001:~ # cray sls networks list


For install/ upgrade Fabric Manager on the FMNs please refer [FabricManager Install/ Upgrade](...)

## Uninstall FMN Helm Chart
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section could be removed from CSM documentation. It makes sense for Slingshot team to handle it


## Introduction

The Fabric Manager (FM) bare-metal enablement within the Cray System Management (CSM) framework introduces dedicated Fabric Manager Nodes (FMNs) that manage and monitor Slingshot fabric operations outside of a Kubernetes environment. While the overall bare-metal Fabric Manager solution is described in the Slingshot Fabric Manager HA documentation <reference>, this CSM detail design document focuses specifically on the CSM-level enhancements required to integrate and support FMNs.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Last line could be modified. This is not a CSM detail design document.

--mac-lan1 b8:59:9f:d9:9d:e9
```
* Optional: For FMNs (Fabric Manager Nodes), we need to pass on `--fmn-image-id` parameter with FMN base image ID generated in the [FMN base image creation stage]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Optional: For FMNs (Fabric Manager Nodes), we need to pass on `--fmn-image-id` parameter with FMN base image ID generated in the [FMN base image creation stage]
* Optional: For FMNs (Fabric Manager Nodes) where alias is fmn00*, we need to pass additional `--fmn-image-id` parameter with FMN base image ID generated in the [FMN base image creation stage]

```
* Optional: For FMNs (Fabric Manager Nodes), we need to pass on `--fmn-image-id` parameter with FMN base image ID generated in the [FMN base image creation stage]
(https://github.com/Cray-HPE/docs-csm/blob/CASM-5740-fm-ha/operations/fm_on_baremetal/Configure_FM_On_Baremetal.md#fmn-base-image-creation).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The link is not displayed properly

For Example: Base image id of FMN is `06135c73-bcd9-4d38-928f-ada20bdf6a6`
```bash
cd /usr/share/doc/csm/scripts/operations/node_management/Add_Remove_Replace_NCNs/
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--fmn-image-id parameter needs to be added for this command also.

```

## Add worker, storage, or master NCNs
2. Optional: For adding FMNs (Fabric Manager Nodes) to CSM there is a new prompt added to confirm if the node getting added is an FMN or not:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not an optional prompt. This should be clubbed with above prompt only

Use this procedure to add a worker, storage, or master NCN.
Existing prompt to add number of NCN nodes:

```text
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There can be two examples provided - one with FMN prompt with no (existing case) and another with FMN prompt with yes (showing FMN details along with fmn-vips)


### Add FMN Nodes to CSM

After creating the FMN base image, add FMN nodes to CSM by following the [NCN add procedure](../../operations/node_management/Add_Remove_Replace_NCNs/Add_Remove_Replace_NCNs.md)
Copy link
Collaborator

@sravani-sanigepalli sravani-sanigepalli Dec 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should point to specific add section instead of top level document which actually handles Add, Remove and Replace procedures.
https://github.com/Cray-HPE/docs-csm/blob/CASM-5740-fm-ha/operations/node_management/Add_Remove_Replace_NCNs/Add_Remove_Replace_NCNs.md#add-worker-storage-master-or-fmnfabric-manager-node-ncns


### Add FMN Nodes to CSM

After creating the FMN base image, add FMN nodes to CSM by following the [NCN add procedure](../../operations/node_management/Add_Remove_Replace_NCNs/Add_Remove_Replace_NCNs.md)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can explicitly mention to follow step 1 to step 6 of Add Procedure.
Also, a note could be added at the end of https://github.com/Cray-HPE/docs-csm/blob/CASM-5740-fm-ha/operations/node_management/Add_Remove_Replace_NCNs/Boot_NCN.md
to skip the next steps for FMN

**NOTE**:

* `FMNs` are considered Management nodes.
* FM cannot be disabled after it has been enabled.
Copy link
Collaborator

@sravani-sanigepalli sravani-sanigepalli Dec 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be removed completely as explicit mentioning would not be needed (or) rephrased to -
After Fabric Manager is migrated from a Kubernetes pod to bare-metal infrastructure, it cannot be reverted.


* Fabric Manager Nodes (`FMNs`) can be added only after the CSM upgrade has been completed.
* By default, Fabric Manager on baremetal is disabled.
* Once enabled, Fabric Manager on baremetal cannot be disabled.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could either be removed as mentioning explicitly is not needed (or) rephrased to -
After Fabric Manager is migrated from a Kubernetes pod to bare-metal infrastructure, it cannot be reverted.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants