-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create links for ephemeral storage devices (NVMe) #1131
Comments
Is the use-case here to use NVMe drives as a PV for a pod? Or is it to just use the increased performance and storage that you're paying for on storage capable instance type variants (d, i, etc)? I have a proposed solution for the latter, which I think could be included into the bootstrap script as an option to RAID-0 the instance storage volumes and mount WDYT of something like this?
which would look for unmounted NVMe instance storage drives, and place them in a RAID-0 array w/ mdadm, and an XFS file-system. Any state in |
|
Yeah, I think that's possible. Would probably just change the arg to accept a parameter rather than a boolean.
👍 Yep, this is basically the general purpose case that I was solving for.
In Karpenter's case, yes, since we're able to know how much storage would be available at bin-packing time. I'm not sure if this would work with CAS. I think we'd also want to fail gracefully, so if there are no nvme instance storage disks, and the raid0 arg was enabled, then it wouldn't do anything and continue to use the EBS volume.
hmmm... this might be difficult. Do you mean labels on the node or tags on the instance? Karpenter already adds labels on the node for local nvme storage. Would that be enough so that workloads can select based on nvme storage available (or not) ? |
RE labeling, I was thinking a node label if the command was successful. If the arg was set but there were no NVMe drives no label(s). So effectively I could spin up both types of node, raid 0 & separate disks, and assign workload accordingly. Should this implementation wait on the AMI supporting multiple EBS like Bottlerocket? Or even be implemented in tandem? |
I suppose we could label the nodes, but it would be weird for systems like Karpenter that precompute the scheduling decision. The existence of the label may cause pods to not schedule even when kube-scheduler and karpenter think that it should. It would recover, just wouldn't be optimal. Karpenter should still cover this case with the nvme labels: https://karpenter.sh/v0.23.0/concepts/scheduling/#selecting-nodes:~:text=karpenter.k8s.aws/instance%2Dlocal%2Dnvme As discussed offline, I think this would can happen independently since the multi-EBS volume setup breaks backwards compatibility and the RAID setup is completely backwards compatible. |
@bwagner5 correct me if I'm wrong but labels only have an additive scheduling impact for Karpenter? Pods will schedule to node with unknown labels, but adding a label such as |
If there's an anti-affinity to the label, then that would affect scheduling, or a node selector to labels that Karpenter doesn't know about because they get applied at startup, so Karpenter would never provision a node since it doesn't know the requirement will be fulfilled. |
OK, fair enough. Is it documented for Karpenter that you shouldn't use dynamic labels? I think we've stopped using these now but I'd have to check. |
I don't think we explicitly call it out in the Karpenter docs. Maybe we should add that. I don't hear many people doing dynamic labels at startup though. I would guess the practice is known as bad since any node autoscaler should fallover in some regard when dynamic labels are at play since the scheduling simulation can't be accurate. |
What would you like to be added:
I'd like this AMI to mirror the Bottlerocket behaviour and link ephemeral storage devices as part of bootstrap (see bottlerocket-os/bottlerocket#1173).
Why is this needed:
I'd like to be able to easily make use of NVMe drives using sig-storage-local-static-provisioner.
The text was updated successfully, but these errors were encountered: