-
Notifications
You must be signed in to change notification settings - Fork 13
Partitionable model with a common shared shape #31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -59,14 +59,10 @@ type ResourcePoolSpec struct { | |
| // vendor of the driver. | ||
| DriverName string `json:"driverName" protobuf:"bytes,3,name=driverName"` | ||
|
|
||
| // SharedCapacity defines the set of shared capacity consumable by | ||
| // devices in this ResourceSlice. | ||
| // | ||
| // Must not have more than 128 entries. | ||
| // DeviceShape defines the common shape of all devices in this pool. | ||
| // | ||
| // +listType=atomic | ||
| // +optional | ||
| SharedCapacity []SharedCapacity `json:"sharedCapacity,omitempty"` | ||
| // +required | ||
| DeviceShape DeviceShape `json:"deviceShape"` | ||
|
|
||
| // Devices lists all available devices in this pool. | ||
| // | ||
|
|
@@ -81,15 +77,53 @@ type ResourcePoolSpec struct { | |
| const ResourcePoolMaxSharedCapacity = 128 | ||
| const ResourcePoolMaxDevices = 128 | ||
|
|
||
| type DeviceShape struct { | ||
| // Attributes defines the attributes of this device shape. | ||
| // The name of each attribute must be unique. | ||
| // | ||
| // Must not have more than 32 entries. | ||
| // | ||
| // +listType=atomic | ||
| // +optional | ||
| Attributes []DeviceAttribute `json:"attributes,omitempty" protobuf:"bytes,3,opt,name=attributes"` | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So these become shared attributes that all partitions share? No ability to define e.g. the common set of attributes just for e.g.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yeah, those still get layered in on a per-partition basis
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sure, but I want to be able to define e.g. And then layer them in as appropriate.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In other words, partitions individually have attributes. So, in the "device" that is presented to the claim side of the model, you have I do NOT go so far as to define a "PartitionShape" which common-izes attributes from a particular type of Partition. That is possible but there are diminishing returns. Option 5 does effectively do that (and more).
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is a byproduct of nesting. If everything is top-level and referenceable by a name then it "just works" and is fully flexible.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The goal of the "shape" captures the shared capacity / shared capacity consumed model as well as the common attributes once and re-use it across all devices. That's where the big size win comes in, not just with attributes but with the partition model. I am not clear on how you plan to capture both of those, it would be useful to see.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I tried top-level shared capacity groups in Option 2, it actually made things LARGER. Maybe I did it wrong? I don't think I did. I think because they are groups per-GPU, we list them over and over. Whereas with the nesting, we don't need to list them, because they are implicit in the structure. Maybe we can do that implicit duplication with shared groups at the top level? I am not sure how we know which to do that with and which to as independent. Maybe a flag? We also need to capture the repeated "consumption" models, rather than repeat them on a per-device basis, to get the full effect. I think that's much harder to understand, frankly, and the only thing I see us gaining is a theoretically ability to have an "independent" shared group that is across the slice not on the per-device basis. Which is something we could always add later at the slice level, if we needed it. We have no need for it at this point, that I know of.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is somewhat off topic, but something you said here made me think of this.... I haven't discussed it much, but MIG actually supports a second level of nesting. I think it could still be modeled with this Option 4 (the second level of nesting still pulls from the same pool of resources). However, we want to limit selection of devices from this second level of nesting to a single claim (because they all share access to the same physical memory). Meaning if someone requests e.g. a i.e. the following, but guarantee that these devices are pulled from a MIG device that has not had other 2nd-level devices pulled from it yet. We currently support this in our "classic" DRA driver (because we control the API through our CRD), but I don't see an easy way to support this restriction with structured parameters.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Any thougts on this?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So, do these nested partitions really need to be visible in K8s? Are they offered to different containers? Could this be part of the config? Since they are not consumable across claims, I don't see how K8s even knowing about them matters; the scheduler wouldn't have any flexibility but to give a whole MIG anyway. The only flexibility would be association with different containers. But maybe the driver can handle that for these (don't associate the MIG with any specific container). |
||
|
|
||
| // Partitions defines the set of partitions into which this device shape | ||
| // may be allocated. If not populated, then the device shape is always | ||
| // consumed in its entirety. | ||
| // | ||
| // +listType=atomic | ||
| // +optional | ||
| Partitions []DevicePartition `json:"partitions,omitempty"` | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In the "simple" case, when I just have a bunch of devices that are not partitionable, what does my resourceSlice look like? Do I need to still fill this in with just 1 device (which is actually not partitionable)?
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Edit: Nevermind, I see now -- this is not a replacement for
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Exactly |
||
|
|
||
| // SharedCapacity defines the set of shared capacity consumable by | ||
| // partitions in this DeviceShape. Not meaninful for non-partitioned | ||
| // devices. | ||
| // | ||
| // Must not have more than 128 entries. | ||
| // | ||
| // +listType=atomic | ||
| // +optional | ||
| SharedCapacity []SharedCapacity `json:"sharedCapacity,omitempty"` | ||
| } | ||
|
|
||
| // Device represents one individual hardware instance that can be selected based | ||
| // on its attributes. | ||
| type Device struct { | ||
| // Name is unique identifier among all devices managed by | ||
| // the driver on the node. It must be a DNS label. | ||
| type DevicePartition struct { | ||
| // Name is unique identifier among all partitions for this device. The | ||
| // device name as recorded in the allocation will be the concatenation | ||
| // of the device name and the partition name with a '-' separator. | ||
| // | ||
| // NOTE: may need a better naming scheme | ||
| // | ||
| // It must be a DNS label. | ||
| Name string `json:"name" protobuf:"bytes,1,name=name"` | ||
|
|
||
| // Attributes defines the attributes of this device. | ||
| // The name of each attribute must be unique. | ||
| // Attributes defines the attributes of this partition. | ||
| // The name of each attribute must be unique. The values | ||
| // in here are overlayed on top of the values in the device | ||
| // shape (overwriting them if the names are the same). | ||
| // | ||
| // NOTE: probably can get away with fewer | ||
| // | ||
| // Must not have more than 32 entries. | ||
| // | ||
|
|
@@ -98,7 +132,7 @@ type Device struct { | |
| Attributes []DeviceAttribute `json:"attributes,omitempty" protobuf:"bytes,3,opt,name=attributes"` | ||
|
|
||
| // SharedCapacityConsumed defines the set of shared capacity consumed by | ||
| // this device. | ||
| // this partition. | ||
| // | ||
| // Must not have more than 32 entries. | ||
| // | ||
|
|
@@ -107,6 +141,25 @@ type Device struct { | |
| SharedCapacityConsumed []SharedCapacity `json:"sharedCapacityConsumed,omitempty"` | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Am I limited to the
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In the canonical example, the
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, that is what I'm saying. I don't necessarily see it for GPUs, but I'm sure there are other devices that will want the ability to do this.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Then we can add something later for that... |
||
| } | ||
|
|
||
| type Device struct { | ||
| // Name is unique identifier among all devices managed by | ||
| // the driver on the node. It must be a DNS label. | ||
| Name string `json:"name" protobuf:"bytes,1,name=name"` | ||
|
|
||
| // Attributes defines the attributes of this device. | ||
| // The name of each attribute must be unique. The values | ||
| // in here are overlayed on top of the values in the device | ||
| // shape (overwriting them if the names are the same). | ||
| // | ||
| // Must not have more than 32 entries. | ||
| // | ||
| // NOTE: probably can get away with fewer | ||
| // | ||
| // +listType=atomic | ||
| // +optional | ||
| Attributes []DeviceAttribute `json:"attributes,omitempty" protobuf:"bytes,3,opt,name=attributes"` | ||
| } | ||
|
|
||
| const ResourcePoolMaxAttributesPerDevice = 32 | ||
| const ResourcePoolMaxSharedCapacityConsumedPerDevice = 32 | ||
|
|
||
|
|
||
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still struggling to see how I share attributes across top level GPUs. Is that not supported in this model? Or do I need to put all of my top-level GPUs into one DeviceShape to take advantage of this?
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is exactly one shape for any slice. Check out the example YAML.
Cardinality:
What you cannot do in this as it exists here is have per-device, per-partition attribute. But otherwise it is as expressive as the fully denormalized model.
If we want to pack things in tighter, we could allow multiple shapes per slice, but I don't see any value in that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need each MIG device to have a reference to its Parent GPU's UUID (so that they can be aligned on that). How does one do that with this model? Likewise, for PCIEroot, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The device as seen by the scheduler will have all the attributes of the shape, the partition, and the device. So the device UUID will be there for every MIG.
Now, if we need to have a separate UUID for the MIG device and for the parent device, it is a bit more tricky, because the we may have name conflicts.
So, we may need a little more attribute manipulation magic here. And we may need a standard attribute to mean "partition". I prefer to see if we can get by with published conventions rather than manipulations. Another alternative is qualifying attributes by shape, partition, and device, but that seems very hard on the user.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea is that the scheduler sees an unrolled / flattened / projected version of this into a flat list of devices. So from the claim standpoint, it is equivalent to the current model.