-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
libcontainer: intelrdt: add support for Intel RDT/MBA Software Controller in runc #1919
libcontainer: intelrdt: add support for Intel RDT/MBA Software Controller in runc #1919
Conversation
e7d9fe1
to
d856edc
Compare
@@ -232,6 +251,11 @@ func findIntelRdtMountpointDir() (string, error) { | |||
return "", fmt.Errorf("Error found less than 3 fields post '-' in %q", text) | |||
} | |||
|
|||
// Check if MBA Software Controller is enabled through mount option "-o mba_MBps" | |||
if strings.Contains(postSeparatorFields[2], "mba_MBps") { | |||
isMbaScEnabled = true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a little bit confused by the name, why do we call "mega bytes per second" limit control for MBA "software controller"? Is percentage limit controlled by hardware controller or something?
Second, how do we use this flag in runc? I don't see it except for setting isMbaEnabled flag, doesn't seem to be enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hqhq Thank you for kind code review.
I'm a little bit confused by the name, why do we call "mega bytes per second" limit control for MBA "software controller"? Is percentage limit controlled by hardware controller or something?
The name "MBA Software Controller" stands for an enhancement of MBA feature in kernel. The motivation of "MBA Software Controller" in kernel is to mitigate some limitations in MBA which described in kernel documentation (search "Software Controller" in https://www.kernel.org/doc/Documentation/x86/intel_rdt_ui.txt for more details) and to make the interface more user friendly - we could specify memory bandwidth limit in "MBps" (Mega Bytes per second), it is more straightforward than memory bandwidth limit in "percentages" from end user's perspective.
The kernel underneath uses a software feedback mechanism or a "Software Controller" which reads the actual bandwidth using Memory Bandwidth Monitoring (MBM, an Intel RDT monitoring feature) counters and adjust the memory bandwidth percentages to ensure: "actual memory bandwidth < user specified memory bandwidth". In other words, MBA Software Controller also makes use of percentage limit control in hardware internally.
MBA Software Controller depends on MBA and MBM hardware capabilities, If both MBA and MBM are enabled by hardware, we could enable MBA Software Controller through mount option "-o mba_MBps":
mount -t resctrl resctrl -o mba_MBps /sys/fs/resctrl
Second, how do we use this flag in runc? I don't see it except for setting isMbaEnabled flag, doesn't seem to be enough.
In this PR, we add new flag "isMbaScEnabled" to indicate if MBA Software Controller is enabled. In step 2 of intelrdt.init(), this flag is set if mount option "mba_MBps" for resctrl filesystem is found during parsing /proc/self/mountinfo. In step 3 of intelrdt.init(), if "isMbaScEnabled" is true, we could mark "isMbaEnabled" true immediately. We don't need to double check if MBA is enabled through checking if /sys/fs/resctrl/info/MB/ is available.
In runc, both memory bandwidth schemata of original MBA and MBA Software Controller are in unified
format: MB:<cache_id0>=bandwidth0;<cache_id1>=bandwidth1;...
. the only difference is that the unit of memory bandwidth is specified in "percentages" by default, and in "MBps" if MBA Software Controller is enabled. So we deal with both original MBA and MBA Software Controller cases with exactly the same logic in runc.
In addition, we export function IsMbaScEnabled() to check if flag "isMbaScEnabled" is true or false. Currently it is only called by TestIntelRdtSetMemBwScSchema() in MBA Software Controller unit test in runc. But it will be useful to the caller in upper layer software (e.g., docker) in future. For example, the caller could check if MBA Software Controller is enabled. If yes, the memory bandwidth will be passed in "MBps" unit, otherwise the memory bandwidth will be passed in "percentages" unit.
I have rebased the code:
|
d856edc
to
0f11185
Compare
Rejected -- please don't bump the runtime-spec to a non-released version -- while Go does encourage doing this we absolutely do not want to be using a spec that's not released. You're only updating it for a comment change, so it's really not necessary. (As an aside I really don't like the format for |
@cyphar Thank you for code review.
The vendor update in commit 64daa60 only changes the code comments to avoid confusion to Intel RDT users. I am OK if we only add the second commit this time because there is no real code dependency. BTW, do you have any suggestion that how we could add a new "version tag" in runtime-spec which includes the minor change? We could vendor update to a new runtime-spec "version tag" in runc in future? Thank you.
More and more Cloud Service Provider (CSP) customers are deploying Intel RDT features in their servers. The native Intel RDT support in OCI container is an imperative demand from customers. But the container user scenarios and usages on top of kernel resctrl filesystem are varied. The format of "intelRdt" in runtime-spec config is "passing through" the schemata format in kernel resctrl filesystem. Our goal is to provide the maximum flexibility to Intel RDT end users in OCI container user scenarios. |
It also changes some of the types for VMs -- which is unrelated to this change. This is more about the principle than anything else, we don't use non-released spec versions no matter how minor the actual changes are.
Sure, but it would require asking one of the folks from @opencontainers/runtime-spec-maintainers to send out a version request email and get all the other maintainers to vote on it. It's not hard, but I doubt it's worth the effort for a change in a comment. We can do it some time in the future, but I think that's a separate topic to this PR.
I understand why it was done this way, but having an opaque format makes versioning, introspection, reasoning about configuration, etc much harder. As I said, this is water under the bridge and we have to support this format "forever" now. But that doesn't mean I am a fan of it -- the only reason I don't have more of a problem with it is that it's an Intel-specific thing and thus is a very niche system (compared to an architecture-agnostic cgroup setting for instance). |
@cyphar Really appreciated your code review and comments.
I understood now. I will remove the vendor update commit in this PR.
Thank you for the suggestion. |
0f11185
to
95af9ef
Compare
…ller in runc MBA Software Controller feature is introduced in Linux kernel v4.18. It is a software enhancement to mitigate some limitations in MBA which describes in kernel documentation. It also makes the interface more user friendly - we could specify memory bandwidth in "MBps" (Mega Bytes per second) as well as in "percentages". The kernel underneath would use a software feedback mechanism or a "Software Controller" which reads the actual bandwidth using MBM counters and adjust the memory bandwidth percentages to ensure: "actual memory bandwidth < user specified memory bandwidth". We could enable this feature through mount option "-o mba_MBps": mount -t resctrl resctrl -o mba_MBps /sys/fs/resctrl In runc, we handle both memory bandwidth schemata in unified format: "MB:<cache_id0>=bandwidth0;<cache_id1>=bandwidth1;..." The unit of memory bandwidth is specified in "percentages" by default, and in "MBps" if MBA Software Controller is enabled. For more information about Intel RDT and MBA Software Controller: https://www.kernel.org/doc/Documentation/x86/intel_rdt_ui.txt Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com>
This PR is the runc part of Intel RDT/MBA Software Controller support.
The runtime-spec part: opencontainers/runtime-spec#992
MBA Software Controller feature is introduced in Linux kernel v4.18.
It is a software enhancement to mitigate some limitations in MBA which
describes in kernel documentation. It also makes the interface more user
friendly - we could specify memory bandwidth in "MBps" (Mega Bytes per
second) as well as in "percentages".
The kernel underneath would use a software feedback mechanism or a
"Software Controller" which reads the actual bandwidth using MBM
counters and adjust the memory bandwidth percentages to ensure:
"actual memory bandwidth < user specified memory bandwidth".
We could enable this feature through mount option "-o mba_MBps":
mount -t resctrl resctrl -o mba_MBps /sys/fs/resctrl
In runc, we handle both memory bandwidth schemata in unified format:
"MB:<cache_id0>=bandwidth0;<cache_id1>=bandwidth1;..."
The unit of memory bandwidth is specified in "percentages" by default,
and in "MBps" if MBA Software Controller is enabled.
For more information about Intel RDT and MBA Software Controller:
https://www.kernel.org/doc/Documentation/x86/intel_rdt_ui.txt
Signed-off-by: Xiaochen Shen xiaochen.shen@intel.com