Add in changes to support MPS settings#4744
Conversation
| /// GPU sharing in the device plugin. | ||
| fn run() -> Result<()> { | ||
| migrate(AddPrefixesMigration(vec![ | ||
| "settings.kubelet-device-plugins.nvidia.mps", |
There was a problem hiding this comment.
You might have to add "configuration-files.nvidia-mps-control-daemon-exec-start-conf",.
There was a problem hiding this comment.
I was wondering if you have to add a migration for "settings.kubelet-device-plugins.nvidia.device-sharing-strategy" as well because the allowed value can also be mps now.
fn run() -> Result<()> {
migrate(RestrictListsMigration(vec![ListRestriction {
setting: "settings.kubelet-device-plugins.nvidia.device-sharing-strategy",
allowed_vals: &["time-slicing"],
}]))
}
There was a problem hiding this comment.
I added a custom function for this since we don't have a helper for Enum variants.
There was a problem hiding this comment.
Check the setting - KubernetesCPUManagerPolicyOption (https://github.com/bottlerocket-os/bottlerocket-settings-sdk/pull/104/changes) in settings-sdk and this migration - https://github.com/bottlerocket-os/bottlerocket/blob/develop/sources/settings-migrations/v1.51.0/kubernetes-beta-cpu-manager-policy-options/src/main.rs
There was a problem hiding this comment.
It seems like the custom function would still allow "mps" upon downgrade.
There was a problem hiding this comment.
I saw this after re-writing my code, this is a nice find! My implementation is a bit more targeted and specific to this one use case but that example would have worked as well. would have worked as well.
sources/settings-migrations/v1.54.0/kubelet-device-plugins-mps-settings/src/main.rs
Outdated
Show resolved
Hide resolved
Add the additional services and configuration files for the nvidia-mps-control-daemon Signed-off-by: Matthew Yeazel <yeazelm@amazon.com>
|
^ Updated with core and kernel kit releases, Bottlerocket SDK update, and comments for migrations. |
Signed-off-by: Matthew Yeazel <yeazelm@amazon.com>
|
^ Updated the code to break into two migrations. The first migration's datastore would be lost when the second one ran. This keeps the two separate and ensures both complete. |
|
Just a note, if we wanted to do 2 sets of migrations in 1, this works as well: But I think I like the 2 separate migrations since they are different trees anyways. |
Issue number:
Closes # #4673
Description of changes:
Update the shared defaults configuration files and services to accomodate the
nvidia-mps-control-daemon. Add a migration for the new settings.Testing done:
cargo makecompletes. For full testing of the feature, look at bottlerocket-os/bottlerocket-core-kit#789Migration testing passed.
From the console when downgrading:
Terms of contribution:
By submitting this pull request, I agree that this contribution is dual-licensed under the terms of both the Apache License, version 2.0, and the MIT license.