Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

libcontainer: intelrdt: add support for Intel RDT/MBA Software Controller in runc #1919

Merged

Conversation

xiaochenshen
Copy link
Contributor

@xiaochenshen xiaochenshen commented Oct 27, 2018

This PR is the runc part of Intel RDT/MBA Software Controller support.
The runtime-spec part: opencontainers/runtime-spec#992

MBA Software Controller feature is introduced in Linux kernel v4.18.
It is a software enhancement to mitigate some limitations in MBA which
describes in kernel documentation. It also makes the interface more user
friendly - we could specify memory bandwidth in "MBps" (Mega Bytes per
second) as well as in "percentages".

The kernel underneath would use a software feedback mechanism or a
"Software Controller" which reads the actual bandwidth using MBM
counters and adjust the memory bandwidth percentages to ensure:
"actual memory bandwidth < user specified memory bandwidth".

We could enable this feature through mount option "-o mba_MBps":
mount -t resctrl resctrl -o mba_MBps /sys/fs/resctrl

In runc, we handle both memory bandwidth schemata in unified format:
"MB:<cache_id0>=bandwidth0;<cache_id1>=bandwidth1;..."
The unit of memory bandwidth is specified in "percentages" by default,
and in "MBps" if MBA Software Controller is enabled.

For more information about Intel RDT and MBA Software Controller:
https://www.kernel.org/doc/Documentation/x86/intel_rdt_ui.txt

Signed-off-by: Xiaochen Shen xiaochen.shen@intel.com

@@ -232,6 +251,11 @@ func findIntelRdtMountpointDir() (string, error) {
return "", fmt.Errorf("Error found less than 3 fields post '-' in %q", text)
}

// Check if MBA Software Controller is enabled through mount option "-o mba_MBps"
if strings.Contains(postSeparatorFields[2], "mba_MBps") {
isMbaScEnabled = true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little bit confused by the name, why do we call "mega bytes per second" limit control for MBA "software controller"? Is percentage limit controlled by hardware controller or something?
Second, how do we use this flag in runc? I don't see it except for setting isMbaEnabled flag, doesn't seem to be enough.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hqhq Thank you for kind code review.

I'm a little bit confused by the name, why do we call "mega bytes per second" limit control for MBA "software controller"? Is percentage limit controlled by hardware controller or something?

The name "MBA Software Controller" stands for an enhancement of MBA feature in kernel. The motivation of "MBA Software Controller" in kernel is to mitigate some limitations in MBA which described in kernel documentation (search "Software Controller" in https://www.kernel.org/doc/Documentation/x86/intel_rdt_ui.txt for more details) and to make the interface more user friendly - we could specify memory bandwidth limit in "MBps" (Mega Bytes per second), it is more straightforward than memory bandwidth limit in "percentages" from end user's perspective.

The kernel underneath uses a software feedback mechanism or a "Software Controller" which reads the actual bandwidth using Memory Bandwidth Monitoring (MBM, an Intel RDT monitoring feature) counters and adjust the memory bandwidth percentages to ensure: "actual memory bandwidth < user specified memory bandwidth". In other words, MBA Software Controller also makes use of percentage limit control in hardware internally.

MBA Software Controller depends on MBA and MBM hardware capabilities, If both MBA and MBM are enabled by hardware, we could enable MBA Software Controller through mount option "-o mba_MBps":
mount -t resctrl resctrl -o mba_MBps /sys/fs/resctrl

Second, how do we use this flag in runc? I don't see it except for setting isMbaEnabled flag, doesn't seem to be enough.

In this PR, we add new flag "isMbaScEnabled" to indicate if MBA Software Controller is enabled. In step 2 of intelrdt.init(), this flag is set if mount option "mba_MBps" for resctrl filesystem is found during parsing /proc/self/mountinfo. In step 3 of intelrdt.init(), if "isMbaScEnabled" is true, we could mark "isMbaEnabled" true immediately. We don't need to double check if MBA is enabled through checking if /sys/fs/resctrl/info/MB/ is available.

In runc, both memory bandwidth schemata of original MBA and MBA Software Controller are in unified format: MB:<cache_id0>=bandwidth0;<cache_id1>=bandwidth1;.... the only difference is that the unit of memory bandwidth is specified in "percentages" by default, and in "MBps" if MBA Software Controller is enabled. So we deal with both original MBA and MBA Software Controller cases with exactly the same logic in runc.

In addition, we export function IsMbaScEnabled() to check if flag "isMbaScEnabled" is true or false. Currently it is only called by TestIntelRdtSetMemBwScSchema() in MBA Software Controller unit test in runc. But it will be useful to the caller in upper layer software (e.g., docker) in future. For example, the caller could check if MBA Software Controller is enabled. If yes, the memory bandwidth will be passed in "MBps" unit, otherwise the memory bandwidth will be passed in "percentages" unit.

@xiaochenshen
Copy link
Contributor Author

xiaochenshen commented Nov 11, 2018

I have rebased the code:

  1. Remove commit 8c4009c "[Don't merge] vendor: update runtime-spec for Intel RDT/MBA Software Controller support"
  2. Add a commit to update vendor runtime-spec for config-linux: documentation change for Intel RDT/MBA Software Controller support runtime-spec#992 has been merged.

@xiaochenshen xiaochenshen force-pushed the rdt-mba-software-controller branch from d856edc to 0f11185 Compare November 12, 2018 03:37
@cyphar
Copy link
Member

cyphar commented Nov 13, 2018

Rejected -- please don't bump the runtime-spec to a non-released version -- while Go does encourage doing this we absolutely do not want to be using a spec that's not released. You're only updating it for a comment change, so it's really not necessary.

(As an aside I really don't like the format for intelrdt because of how much magic is involved. But whatever, that's water under the bridge now.)

Rejected with PullApprove

@xiaochenshen
Copy link
Contributor Author

@cyphar Thank you for code review.

Rejected -- please don't bump the runtime-spec to a non-released version -- while Go does encourage doing this we absolutely do not want to be using a spec that's not released. You're only updating it for a comment change, so it's really not necessary.

The vendor update in commit 64daa60 only changes the code comments to avoid confusion to Intel RDT users. I am OK if we only add the second commit this time because there is no real code dependency.

BTW, do you have any suggestion that how we could add a new "version tag" in runtime-spec which includes the minor change? We could vendor update to a new runtime-spec "version tag" in runc in future? Thank you.

(As an aside I really don't like the format for intelrdt because of how much magic is involved. But whatever, that's water under the bridge now.)

More and more Cloud Service Provider (CSP) customers are deploying Intel RDT features in their servers. The native Intel RDT support in OCI container is an imperative demand from customers. But the container user scenarios and usages on top of kernel resctrl filesystem are varied. The format of "intelRdt" in runtime-spec config is "passing through" the schemata format in kernel resctrl filesystem. Our goal is to provide the maximum flexibility to Intel RDT end users in OCI container user scenarios.

@cyphar
Copy link
Member

cyphar commented Nov 13, 2018

@xiaochenshen

The vendor update in commit 64daa60 only changes the code comments to avoid confusion to Intel RDT users.

It also changes some of the types for VMs -- which is unrelated to this change. This is more about the principle than anything else, we don't use non-released spec versions no matter how minor the actual changes are.

BTW, do you have any suggestion that how we could add a new "version tag" in runtime-spec which includes the minor change? We could vendor update to a new runtime-spec "version tag" in runc in future? Thank you.

Sure, but it would require asking one of the folks from @opencontainers/runtime-spec-maintainers to send out a version request email and get all the other maintainers to vote on it. It's not hard, but I doubt it's worth the effort for a change in a comment.

We can do it some time in the future, but I think that's a separate topic to this PR.

The format of "intelRdt" in runtime-spec config is "passing through" the schemata format in kernel resctrl filesystem. Our goal is to provide the maximum flexibility to Intel RDT end users in OCI container user scenarios.

I understand why it was done this way, but having an opaque format makes versioning, introspection, reasoning about configuration, etc much harder. As I said, this is water under the bridge and we have to support this format "forever" now. But that doesn't mean I am a fan of it -- the only reason I don't have more of a problem with it is that it's an Intel-specific thing and thus is a very niche system (compared to an architecture-agnostic cgroup setting for instance).

@xiaochenshen
Copy link
Contributor Author

@cyphar Really appreciated your code review and comments.

It also changes some of the types for VMs -- which is unrelated to this change. This is more about the principle than anything else, we don't use non-released spec versions no matter how minor the actual changes are.

I understood now. I will remove the vendor update commit in this PR.

We can do it some time in the future, but I think that's a separate topic to this PR.

Thank you for the suggestion.

@xiaochenshen xiaochenshen force-pushed the rdt-mba-software-controller branch from 0f11185 to 95af9ef Compare November 13, 2018 15:16
…ller in runc

MBA Software Controller feature is introduced in Linux kernel v4.18.
It is a software enhancement to mitigate some limitations in MBA which
describes in kernel documentation. It also makes the interface more user
friendly - we could specify memory bandwidth in "MBps" (Mega Bytes per
second) as well as in "percentages".

The kernel underneath would use a software feedback mechanism or a
"Software Controller" which reads the actual bandwidth using MBM
counters and adjust the memory bandwidth percentages to ensure:
"actual memory bandwidth < user specified memory bandwidth".

We could enable this feature through mount option "-o mba_MBps":
mount -t resctrl resctrl -o mba_MBps /sys/fs/resctrl

In runc, we handle both memory bandwidth schemata in unified format:
"MB:<cache_id0>=bandwidth0;<cache_id1>=bandwidth1;..."
The unit of memory bandwidth is specified in "percentages" by default,
and in "MBps" if MBA Software Controller is enabled.

For more information about Intel RDT and MBA Software Controller:
https://www.kernel.org/doc/Documentation/x86/intel_rdt_ui.txt

Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com>
@xiaochenshen
Copy link
Contributor Author

@cyphar @hqhq @crosbymichael @mrunalp @rjnagal @vmarmol @dqminh

Per @cyphar's suggestion, I have removed the runtime-spec vendor update commit in this PR.
Could you help code review commit 95af9ef at your convenience? Thank you!

@cyphar
Copy link
Member

cyphar commented Nov 26, 2018

LGTM. I don't have the hardware to test this (and to be brutally honest I don't really like the kernel interfaces), but it's all based on upstream kernel features and I'm sure that you folks have tested it on your own hardware.

Approved with PullApprove

@crosbymichael
Copy link
Member

crosbymichael commented Nov 26, 2018

LGTM

Approved with PullApprove

@crosbymichael crosbymichael merged commit 4932620 into opencontainers:master Nov 26, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants