Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change nncp to match new label added to nodes #507

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

tssala23
Copy link
Contributor

There are two types of worker nodes, non-GPUs and GPUs. The GPUs on this cluster only have one interface, this means that networking for these nodes are slightly different. This change makes it so this nncp is only applied to the non-GPU nodes.
Labels were added to the non-gpu nodes:
kubectl label node <non-gpu-wrk> gpu=false --as system:admin

Signed-off-by: tssala23 <tsalawu@redhat.com>
Copy link
Member

@larsks larsks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we running the nvidia node discovery operator on this cluster? I think we should be if we have GPU nodes.

@jtriley
Copy link
Contributor

jtriley commented Sep 10, 2024

Are we running the nvidia node discovery operator on this cluster? I think we should be if we have GPU nodes.

@tssala23 To get the nvidia-gpu and node-feature-discovery operators working on a cluster:

  1. Include node-feature-discovery bundle in your cluster-scope overlay's top-level kustomization.yaml
  2. Include nvidia-gpu-operator bundle in your cluster-scope overlay's top-level kustomization.yaml
  3. Create a cluster overlay in the nfd-discovery top-level directory
  4. Create a cluster overlay in the nvidia-gpu-operator top-level directory
  5. Add a node-feature-discovery argocd app to to the nerc-ocp-apps repo
  6. Add an nvidia-gpu-operator argocd app to the nerc-ocp-apps repo

For the argocd apps, see:

https://github.com/OCP-on-NERC/nerc-ocp-apps/blob/main/clusters/nerc-ocp-test/kustomization.yaml#L40-L54

@joachimweyl
Copy link
Contributor

@tssala23 what are the next steps on this?

@tssala23 tssala23 self-assigned this Oct 22, 2024
@tssala23
Copy link
Contributor Author

@joachimweyl This can be put in the icebox for now. We implemented the change without using nncp and the nncp dont currently work properly though I do want to come back to it at some point, Not important now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants