Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

libcontainer: add support for Intel RDT/CAT in runc #1279

Merged

Conversation

xiaochenshen
Copy link
Contributor

@xiaochenshen xiaochenshen commented Jan 17, 2017

v5 commits:
4d2756c libcontainer: add test cases for Intel RDT/CAT
692f6e1 libcontainer: add support for Intel RDT/CAT in runc
af3b0d9 libcontainer/SPEC.md: add documentation for Intel RDT/CAT

Changes in v5:
Reworked code according to @crosbymichael and @hqhq's comments:

  1. libcontainer: add support for Intel RDT/CAT in runc #1279 (comment)
  2. libcontainer: add support for Intel RDT/CAT in runc #1279 (comment)
  3. libcontainer: add support for Intel RDT/CAT in runc #1279 (comment)
  • Obsoleted the design of "refactor resource manager interface": removed generic resourceManagers, isolated IntelRdt functions without touching cgroups code, removed the static map with static keys on resourceManagers set.
  • Simplified IntelRdtManager implementation, removed "dead and empty" methods - GetPids(), GetAllPids(), Freeze(). Kept the required methods set - Apply(), GetStats(), Destroy(), GetPath(), and Set().
  • Re-implemented GetStats(), isolated cgroups.Stats and intelrdt.Stats and kept type safety on the Stats.
  • Changed test stuffs according to the functional rework.
  • Added read-only L3 cache info (CbmMask, MinCbmBits, and NumClosids) into Intelrdt.Stats. Runc users could run "runc events" to fetch the information, which helps know more about L3 cache capabilities and how to set correct L3 cache schema in different Intel Xeon platforms.

v4 commits:
b1c8366 libcontainer: add support for Intel RDT/CAT in runc
48d8ffe libcontainer/SPEC.md: add documentation for Intel RDT/CAT
8851a0d libcontainer: refactor resource manager interface
8064819 vendor: specs-go: update specs for Intel RDT/CAT

Changes in v4:
Rebased to latest master branch.


v3 commits:
be25a20 libcontainer: add support for Intel RDT/CAT in runc
96fd05f libcontainer/SPEC.md: add documentation for Intel RDT/CAT
d5ac70d libcontainer: refactor resource manager interface
c302b70 vendor: specs-go: update specs for Intel RDT/CAT

Changes in v3:
Addressed comment #1279 (comment) from @yummypeng :

  • Add parameter path to record IntelRdtPath in NewIntelRdtManager.

v2 commits (for keeping records in code review):
ab32d21 libcontainer: add support for Intel RDT/CAT in runc
5972fd5 libcontainer/SPEC.md: add documentation for Intel RDT/CAT
7f8b321 libcontainer: refactor resource manager interface
9b03a51 vendor: specs-go: update specs for Intel RDT/CAT

Changes in v2:
Addressed the comments from @hqhq and @yummypeng :

  • Added intelrdt config validator.
  • Removed resctrl filesystem re-mount logic in isIntelRdtMounted().
  • Optimized IsIntelRdtEnabled() to avoid unnecessary time-consuming mount points parsing.
  • Changed Intel RDT related json items to snake_case style.
  • Fixed some redundant codes with regard to intelRdtPath.
  • Fixed potential null pointer issue in IntelRdtManager.Set().
  • Fixed some typos (Sets, num_closids and etc.)
  • Improved libcontainer/SPEC.md: added minimal supported kernel version, provided a more sophisticated config example.

v1 commits (for keeping records in code review):
4752fd2 libcontainer: add support for Intel RDT/CAT in runc
c8486a7 libcontainer/SPEC.md: add documentation for Intel RDT/CAT
aefd02a libcontainer: refactor resource manager interface
8e8e5d9 vendor: specs-go: update specs for Intel RDT/CAT


This PR fixes issue #433

About Intel RDT/CAT feature:
Intel platforms with new Xeon CPU support Intel Resource Director Technology
(RDT). Cache Allocation Technology (CAT) is a sub-feature of RDT, which
currently supports L3 cache resource allocation.

This feature provides a way for the software to restrict cache allocation to a
defined 'subset' of L3 cache which may be overlapping with other 'subsets'.
The different subsets are identified by class of service (CLOS) and each CLOS
has a capacity bitmask (CBM).

For more information about Intel RDT/CAT can be found in the section 17.17
of Intel Software Developer Manual.

About Intel RDT/CAT kernel interface:
In Linux 4.10 kernel or newer, the interface is defined and exposed via
"resource control" filesystem, which is a "cgroup-like" interface.

Comparing with cgroups, it has similar process management lifecycle and
interfaces in a container. But unlike cgroups' hierarchy, it has single level
filesystem layout.

Intel RDT "resource control" filesystem hierarchy:

mount -t resctrl resctrl /sys/fs/resctrl
tree /sys/fs/resctrl
/sys/fs/resctrl/
|-- info
|   |-- L3
|       |-- cbm_mask
|       |-- min_cbm_bits
|       |-- num_closids
|-- cpus
|-- schemata
|-- tasks
|-- <container_id>
    |-- cpus
    |-- schemata
    |-- tasks

For runc, we can make use of tasks and schemata configuration for L3 cache
resource constraints.

The file tasks has a list of tasks that belongs to this group (e.g.,
<container_id>" group). Tasks can be added to a group by writing the task ID
to the "tasks" file (which will automatically remove them from the previous
group to which they belonged). New tasks created by fork(2) and clone(2) are
added to the same group as their parent. If a pid is not in any sub group, it
Is in root group.

The file schemata has allocation bitmasks/values for L3 cache on each socket,
which contains L3 cache id and capacity bitmask (CBM).

	Format: "L3:<cache_id0>=<cbm0>;<cache_id1>=<cbm1>;..."

For example, on a two-socket machine, L3's schema line could be L3:0=ff;1=c0
which means L3 cache id 0's CBM is 0xff, and L3 cache id 1's CBM is 0xc0.

The valid L3 cache CBM is a contiguous bits set and number of bits that can
be set is less than the max bit. The max bits in the CBM is varied among
supported Intel Xeon platforms. In Intel RDT "resource control" filesystem
layout, the CBM in a group should be a subset of the CBM in root. Kernel will
check if it is valid when writing. e.g., 0xfffff in root indicates the max bits
of CBM is 20 bits, which mapping to entire L3 cache capacity. Some valid CBM
values to set in a group: 0xf, 0xf0, 0x3ff, 0x1f00 and etc.

For more information about Intel RDT/CAT kernel interface:
https://www.kernel.org/doc/Documentation/x86/intel_rdt_ui.txt

An example for runc:

Consider a two-socket machine with two L3 caches where the default CBM is
0xfffff and the max CBM length is 20 bits. With this configuration, tasks
inside the container only have access to the "upper" 80% of L3 cache id 0 and
the "lower" 50% L3 cache id 1:

"linux": {
	"intelRdt": {
		"l3CacheSchema": "L3:0=ffff0;1=3ff"
	}
}

Signed-off-by: Xiaochen Shen xiaochen.shen@intel.com

@xiaochenshen
Copy link
Contributor Author

@cyphar @crosbymichael @hqhq @mrunalp @vishh
/cc @opencontainers/runc-maintainers

This PR will obsolete #1198

To address @crosbymichael and @cyphar 's comments #1198 (comment) and #1198 (comment), the design is updated:

It adds a new "ResourceManager" structure as the base interface for all resource managers, such as cgroups manager and incoming IntelRdt manager.

All registered resource managers are consolidated in linuxContainer structure. We can apply to unified operations (e.g., Apply(), Set(), Destroy()) using all of the registered resource managers.

Currently, cgroups manager is the single resource manager in libcontainer. Linux kernel 4.10 will introduce Intel RDT/CAT feature, the kernel interface is exposed via "resource control" filesystem, which is a cgroup-like interface. In order to support Intel RDT/CAT in libcontainer, we need a new resource manager (IntelRdt manager) outside cgroups.

@crosbymichael
Copy link
Member

Thanks. Design looks much better than before.

@xiaochenshen xiaochenshen force-pushed the rdt-cat-resource-manager-v1 branch 2 times, most recently from 04e7cec to aaa98cb Compare February 9, 2017 04:47
@xiaochenshen
Copy link
Contributor Author

@opencontainers/runc-maintainers
Any comments for this PR? Thank you.

@xiaochenshen xiaochenshen force-pushed the rdt-cat-resource-manager-v1 branch from aaa98cb to abf51f3 Compare February 9, 2017 05:00
@xiaochenshen
Copy link
Contributor Author

@LK4D4 @cyphar @crosbymichael @hqhq @mrunalp @vishh
/cc @opencontainers/runc-maintainers

Who can help forward me the simplest steps to build runc in my forked repo (github.com/xiaochenshen/runc)? Thank you.

When I tried to rebase the code against latest runc master branch to resolve the conflict, I found the Godeps is replaced by vndr tool recently. I have tried to build with "make" in github.com/opencontainers/runc, it works. But in my forked dir, I got following errors. It looks like vndr related issues.

# make
go build -i -ldflags "-X main.gitCommit="291bf601105c97dc1aa631fdfb7fca63c947319b" -X main.version=1.0.0-rc2" -tags "seccomp" -o runc .
# github.com/xiaochenshen/runc
./restore.go:111: cannot use spec (type *"github.com/xiaochenshen/runc/vendor/github.com/opencontainers/runtime-spec/specs-go".Spec) as type *"github.com/opencontainers/runc/vendor/github.com/opencontainers/runtime-spec/specs-go".Spec in field value
./utils_linux.go:154: cannot use spec (type *"github.com/xiaochenshen/runc/vendor/github.com/opencontainers/runtime-spec/specs-go".Spec) as type *"github.com/opencontainers/runc/vendor/github.com/opencontainers/runtime-spec/specs-go".Spec in field value
make: *** [runc] Error 2

@cyphar
Copy link
Member

cyphar commented Mar 7, 2017

@xiaochenshen Make sure that this project is cloned properly inside your GOPATH. In particular you should have something like this set up:

% mkdir ~/yourgopath
% export GOPATH=~/yourgopath
% mkdir -p $GOPATH/src/github.com/opencontainers/runc
% git clone https://github.com/opencontainers/runc $GOPATH/src/github.com/opencontainers/runc
# ...
% cd $GOPATH/src/github.com/opencontainers/runc
% make
# should work

@xiaochenshen
Copy link
Contributor Author

xiaochenshen commented Mar 8, 2017

@cyphar Thank you for your guide. It works with github.com/opencontainers/runc.
But this issue happens only in my forked repo github.com/xiaochenshen/runc which is also under $GOPATH ($GOPATH/src/github.com/xiaochenshen/runc).
Should I modify vendor.conf file?

Or git clone my repo https://github.com/xiaochen/runc into the directory $GOPATH/src/github.com/opencontainers/runc? It looks weird, to some extent:

git clone https://github.com/xiaochenshen/runc $GOPATH/src/github.com/opencontainers/runc

@xiaochenshen xiaochenshen force-pushed the rdt-cat-resource-manager-v1 branch 2 times, most recently from 2d4d43f to e414983 Compare March 8, 2017 08:06
@xiaochenshen xiaochenshen force-pushed the rdt-cat-resource-manager-v1 branch from e414983 to d1b743f Compare March 10, 2017 10:08
@xiaochenshen
Copy link
Contributor Author

@crosbymichael /cc @opencontainers/runc-maintainers

To address your suggestion in opencontainers/runtime-spec#630 (review):
IntelRdt struct has been moved out of Resources and inside Linux now.

This PR has been updated accordingly.

@xiaochenshen xiaochenshen force-pushed the rdt-cat-resource-manager-v1 branch from d1b743f to f9f1347 Compare March 10, 2017 10:18
@xiaochenshen xiaochenshen force-pushed the rdt-cat-resource-manager-v1 branch from f9f1347 to 4752fd2 Compare April 5, 2017 18:24
@rh-atomic-bot
Copy link

269/277 passed on RHEL - Failed.
261/275 passed on CentOS - Failed.
275/276 passed on Fedora - Failed.
Log - https://aos-ci.s3.amazonaws.com/opencontainers/runc/runc-integration-tests-prs/314/fullresults.xml

@xiaochenshen
Copy link
Contributor Author

xiaochenshen commented Apr 5, 2017

ping @crosbymichael @cyphar @mrunalp @hqhq
/cc @rjnagal @vmarmol @dqminh @avagin

opencontainers/runtime-spec#630 + this PR to fix issue #433. And opencontainers/runtime-spec#630 has been merged.

Could you help code review this PR at your convenience? Thank you!
BTW, I just rebased the code to fix conflict with rootless cgroup manager.

// restore the object later
GetPaths() map[string]string

// Set the resource as configured

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: Set->Sets

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for finding the typo.

@@ -432,9 +451,14 @@ func (c *linuxContainer) newSetnsProcess(p *Process, cmd *exec.Cmd, parentPipe,
if err != nil {
return nil, err
}
intelRdtPath, err := intelrdt.GetIntelRdtPath(c.ID())

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you have already done this in the above state, err := c.currentState(), there is no need to do it again. Just reference state.IntelRdtPath in the following return &setnsProcess{...}.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you are right. I will fix it.

if path == "" {
t.Fatal("intel rdt path should not be empty")
}
if intelRdtPath := path; intelRdtPath != expectedIntelRdtPath {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess that maybe you want to be consistent with the previous code, but I still think there is no need to define a new variable named intelRdtPath here. Maybe you can change path to intelRdtPath for clarity.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I will fix it.

return false
}

// If not mounted, we try to mount again:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why? I think it's users' responsibility to guarantee that resctrl is mounted before they use it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with you now. I will remove this.

I was thinking that it is no harm for it. Unlike cgroups, Intel RDT resource control filesystem will likely not be mounted during boot-up by default in popular Linux OS distributions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be part of default mount info as we did for cgroups in https://github.com/opencontainers/runc/blob/master/libcontainer/specconv/example.go#L111-L116 and be handled as bind mount in mountToRootfs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hqhq
Unlike cgroup and cgroup filesystem, Intel RDT and resource control (resctrl) filesystem heavily depend on hardware/CPU support.

Likely, resctrl filesystem will not be mounted to rootfs either (1) h/w doesn't support Intel RDT or (2) kernel version older than 4.10, doesn't support Intel RDT.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean resctrl should also be bind mounted to container if we mount it on host by default, I didn't notice you're going to remove this, I'm OK with that, let users mount this on host and add that mount info to config.json specifically would be better.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hqhq

let users mount this on host and add that mount info to config.json specifically would be better.

I agree that, if Intel RDT is supported by h/w and kernel, it is user's responsibility to mount the resctrl filesystem in host, and then to add mount info to config.json accordingly.

In most cases (e.g., non-Intel Xeon platforms), Intel RDT is not supported and resctrl filesystem is not enabled. In my opinion, we may not add "fixed" rectrl filesystem mount info into specs.Mount in libcontainer/specconv/example.go by default.

if err != nil {
return nil, err
}
numClosid, err := getIntelRdtParamUint(path, "num_closid")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: num_closid -> num_closids

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for finding this typo.


// Returns Intel RDT "resource control" filesystem path to save in
// a state file and to be able to restore the object later
GetPath() string

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If GetPaths() is not a suitable method in IntelRdtManager, is it possible that we delete it from Manager interface and just add it to cgroups.Manager ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently I kept GetPaths() here intentionally. In future, we may add more Intel RDT resource constrains besides L3 cache. If it is really not needed at that time, I will reorg the Manager interface just like you suggested.


m.mu.Lock()
defer m.mu.Unlock()
path, err := d.join(m.Id)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I am wrong, but have you ever thought about the scenario of docker-in-docker?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yummypeng

For the nested containers case, I added comments in:
#1279 (comment)

@xiaochenshen
Copy link
Contributor Author

@yummypeng Thank you for your help on code review. I will check up the issues one by one later.

@yummypeng
Copy link

I tried to add the configuration of intelRdt in my config.json, then run a container in an environment without support of intelRdt, and unexpectedly, the container was started successfully. Seems there needs an validation of Spec maybe in https://github.com/opencontainers/runc/blob/master/libcontainer/configs/validate/validator.go#L24.

// e.g., 0xfffff in root indicates the max bits of CBM is 20 bits,
// which mapping to entire L3 cache capacity. Some valid CBM values
// to set in a group: 0xf, 0xf0, 0x3ff, 0x1f00 and etc.
l3CacheSchema := container.IntelRdt.L3CacheSchema

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This requires a check if container.IntelRdt == nil, otherwise this will panic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you are right. I will add check in case of null pointer.

@yummypeng
Copy link

Since resctrl is a "cgroup-like" interface, why not mount it into container like cgroup does?

@xiaochenshen
Copy link
Contributor Author

@yummypeng

Since resctrl is a "cgroup-like" interface, why not mount it into container like cgroup does?

There are discussions about that in obsolete #1198. You can refer to @crosbymichael and @cyphar 's comments: #1198 (comment) and #1198 (comment).

isFlagSet, err := parseCpuInfoFile("/proc/cpuinfo")
if err != nil {
return false
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Parsing mount points is time consuming, I think we can return if isFlagSet is false in most cases.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I agree with you.

@@ -0,0 +1,34 @@
// +build linux
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this gonna break Solaris and Windows?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hqhq Which is better in your opinion? Thank you.
(1) Just remove this line (+// +build linux) in resource_manager_linux.go.
(2) Add new file resource_manager_unsupported.go for !linux cases:

// +build !linux

package resourcemanager

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After a second check, I think current implementation is fine, it's only imported by packages which are wrapped by linux build flag, sorry for the noise.

}

func (raw *intelRdtData) join(id string) (string, error) {
path := filepath.Join(raw.root, id)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about nested container cases? You can refer to the cgroup join function to see how cgroup handle this.

Copy link
Contributor Author

@xiaochenshen xiaochenshen May 3, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hqhq

I was thinking if Intel RDT/CAT is possible to support nested containers case. Any more suggestions or comments?

  1. Unlike cgroups' hierarchy, Intel RDT resource control filesystem supports only single level filesystem layout (see details in section "Intel RDT "resource control" filesystem hierarchy" in this PR).

  2. The other limitation is that, Intel RDT/CAT only supports limited number of groups (directories), which is indicated in info/L3/num_closids. It is h/w limitation by nature.

events.go Outdated
type intelRdt struct {
// The read-only default "schema" in root, for reference
L3CacheSchemaRoot string `json:"l3CacheSchemaRoot,omitempty"`
L3CacheSchema string `json:"l3CacheSchema,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

json variables should be snake_case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. I will fix it.

hardware and kernel support Intel RDT/CAT.

In Linux kernel, it is exposed via "resource control" filesystem, which is a
"cgroup-like" interface.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you specify the minimum linux kernel version that supports intelrdt feature?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No problem - 4.10 kernel.

+Linux kernel 4.10 introduces Intel RDT/CAT support, it is exposed via "resource control" filesystem...

// Check if Intel RDT is enabled
func IsIntelRdtEnabled() bool {
// We have checked the flag before
if isIntelRdtEnabled {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't this racy? you should probably use a sync.Once for initialization of this bool

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@crosbymichael
Thank you for good suggestion. I will fix it in follow-up PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I opened a new PR in #1589 to fix this issue.

@crosbymichael
Copy link
Member

crosbymichael commented Sep 6, 2017

LGTM

Can you fix my comment in a follow up PR?

Approved with PullApprove

@mrunalp
Copy link
Contributor

mrunalp commented Sep 6, 2017

LGTM. We can fix up smaller issues in follow-on PRs. Thanks!

Approved with PullApprove

@mrunalp
Copy link
Contributor

mrunalp commented Sep 6, 2017

LGTM

@mrunalp mrunalp merged commit 5274430 into opencontainers:master Sep 6, 2017
@xiaochenshen
Copy link
Contributor Author

@crosbymichael @mrunalp

Thank you for your code review.
I will open a new PR to fix remaining issues.

xiaochenshen added a commit to xiaochenshen/runc that referenced this pull request Sep 8, 2017
This is the follow-up PR of opencontainers#1279 to fix remaining issues:

Use init() to avoid race condition in IsIntelRdtEnabled().
Add also rename some variables and functions.

Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com>
stefanberger pushed a commit to stefanberger/runc that referenced this pull request Sep 8, 2017
This is the follow-up PR of opencontainers#1279 to fix remaining issues:

Use init() to avoid race condition in IsIntelRdtEnabled().
Add also rename some variables and functions.

Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com>
stefanberger pushed a commit to stefanberger/runc that referenced this pull request Sep 8, 2017
This is the follow-up PR of opencontainers#1279 to fix remaining issues:

Use init() to avoid race condition in IsIntelRdtEnabled().
Add also rename some variables and functions.

Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com>
stefanberger pushed a commit to stefanberger/runc that referenced this pull request Sep 9, 2017
This is the follow-up PR of opencontainers#1279 to fix remaining issues:

Use init() to avoid race condition in IsIntelRdtEnabled().
Add also rename some variables and functions.

Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com>
@xiaochenshen
Copy link
Contributor Author

xiaochenshen commented Sep 13, 2017

@crosbymichael @hqhq @cyphar @mrunalp
/cc @runc-maintainers @yummypeng

I planned to add Intel RDT/CAT support for Docker. More information can be found in #433 .
But I am confused with the "vendor" configuration among Docker related projects (containerd, moby and docker-cli). I found a lot of dependency issues because recent runc changes have not been merged into these projects yet.

Who could share me the tips or documents about "vendor" tools?
Thank you in advance.

@cyphar
Copy link
Member

cyphar commented Sep 13, 2017

There is a vendor.conf in the root of Docker's source tree. You need to just update the commit ID in that file, and then run vndr github.com/opencontainers/runc in the source tree (the tool is available at https://github.com/lk4d4/vndr).

@xiaochenshen
Copy link
Contributor Author

@cyphar
Got it. Thank you.

xiaochenshen added a commit to xiaochenshen/runc that referenced this pull request Oct 31, 2017
Memory Bandwidth Allocation (MBA) is a resource allocation sub-feature
of Intel Resource Director Technology (RDT) which is supported on some
Intel Xeon platforms. Intel RDT/MBA provides indirect and approximate
throttle over memory bandwidth for the software. A user controls the
resource by indicating the percentage of maximum memory bandwidth.

Hardware details of Intel RDT/MBA can be found in section 17.18 of
Intel Software Developer Manual:
https://software.intel.com/en-us/articles/intel-sdm

In Linux 4.12 kernel and newer, Intel RDT/MBA is enabled by kernel
config CONFIG_INTEL_RDT. If hardware support, CPU flags 'rdt_a' and
'mba' will be set in /proc/cpuinfo.

Intel RDT "resource control" filesystem hierarchy:
mount -t resctrl resctrl /sys/fs/resctrl
tree /sys/fs/resctrl
/sys/fs/resctrl/
|-- info
|   |-- L3
|   |   |-- cbm_mask
|   |   |-- min_cbm_bits
|   |   |-- num_closids
|   |-- MB
|       |-- bandwidth_gran
|       |-- delay_linear
|       |-- min_bandwidth
|       |-- num_closids
|-- ...
|-- schemata
|-- tasks
|-- <container_id>
    |-- ...
    |-- schemata
    |-- tasks

For MBA support for `runc`, we will reuse the infrastructure and code
base of Intel RDT/CAT which implemented in opencontainers#1279. We could also make
use of `tasks` and `schemata` configuration for memory bandwidth
resource constraints.

The file `tasks` has a list of tasks that belongs to this group (e.g.,
<container_id>" group). Tasks can be added to a group by writing the
task ID to the "tasks" file (which will automatically remove them from
the previous group to which they belonged). New tasks created by
fork(2) and clone(2) are added to the same group as their parent.

The file `schemata` has a list of all the resources available to this
group. Each resource (L3 cache, memory bandwidth) has its own line and
format.

Memory bandwidth schema:
It has allocation values for memory bandwidth on each socket, which
contains L3 cache id and memory bandwidth percentage.
    Format: "MB:<cache_id0>=bandwidth0;<cache_id1>=bandwidth1;..."

The minimum bandwidth percentage value for each CPU model is predefined
and can be looked up through "info/MB/min_bandwidth". The bandwidth
granularity that is allocated is also dependent on the CPU model and
can be looked up at "info/MB/bandwidth_gran". The available bandwidth
control steps are: min_bw + N * bw_gran. Intermediate values are
rounded to the next control step available on the hardware.

For more information about Intel RDT kernel interface:
https://www.kernel.org/doc/Documentation/x86/intel_rdt_ui.txt

An example for runc:
Consider a two-socket machine with two L3 caches where the minimum
memory bandwidth of 10% with a memory bandwidth granularity of 10%.
Tasks inside the container may use a maximum memory bandwidth of 20%
on socket 0 and 70% on socket 1.

"linux": {
    "intelRdt": {
        "memBwSchema": "MB:0=20;1=70"
    }
}

Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com>
xiaochenshen added a commit to xiaochenshen/runc that referenced this pull request Oct 31, 2017
Memory Bandwidth Allocation (MBA) is a resource allocation sub-feature
of Intel Resource Director Technology (RDT) which is supported on some
Intel Xeon platforms. Intel RDT/MBA provides indirect and approximate
throttle over memory bandwidth for the software. A user controls the
resource by indicating the percentage of maximum memory bandwidth.

Hardware details of Intel RDT/MBA can be found in section 17.18 of
Intel Software Developer Manual:
https://software.intel.com/en-us/articles/intel-sdm

In Linux 4.12 kernel and newer, Intel RDT/MBA is enabled by kernel
config CONFIG_INTEL_RDT. If hardware support, CPU flags `rdt_a` and
`mba` will be set in /proc/cpuinfo.

Intel RDT "resource control" filesystem hierarchy:
mount -t resctrl resctrl /sys/fs/resctrl
tree /sys/fs/resctrl
/sys/fs/resctrl/
|-- info
|   |-- L3
|   |   |-- cbm_mask
|   |   |-- min_cbm_bits
|   |   |-- num_closids
|   |-- MB
|       |-- bandwidth_gran
|       |-- delay_linear
|       |-- min_bandwidth
|       |-- num_closids
|-- ...
|-- schemata
|-- tasks
|-- <container_id>
    |-- ...
    |-- schemata
    |-- tasks

For MBA support for `runc`, we will reuse the infrastructure and code
base of Intel RDT/CAT which implemented in opencontainers#1279. We could also make
use of `tasks` and `schemata` configuration for memory bandwidth
resource constraints.

The file `tasks` has a list of tasks that belongs to this group (e.g.,
<container_id>" group). Tasks can be added to a group by writing the
task ID to the "tasks" file (which will automatically remove them from
the previous group to which they belonged). New tasks created by
fork(2) and clone(2) are added to the same group as their parent.

The file `schemata` has a list of all the resources available to this
group. Each resource (L3 cache, memory bandwidth) has its own line and
format.

Memory bandwidth schema:
It has allocation values for memory bandwidth on each socket, which
contains L3 cache id and memory bandwidth percentage.
    Format: "MB:<cache_id0>=bandwidth0;<cache_id1>=bandwidth1;..."

The minimum bandwidth percentage value for each CPU model is predefined
and can be looked up through "info/MB/min_bandwidth". The bandwidth
granularity that is allocated is also dependent on the CPU model and
can be looked up at "info/MB/bandwidth_gran". The available bandwidth
control steps are: min_bw + N * bw_gran. Intermediate values are
rounded to the next control step available on the hardware.

For more information about Intel RDT kernel interface:
https://www.kernel.org/doc/Documentation/x86/intel_rdt_ui.txt

An example for runc:
Consider a two-socket machine with two L3 caches where the minimum
memory bandwidth of 10% with a memory bandwidth granularity of 10%.
Tasks inside the container may use a maximum memory bandwidth of 20%
on socket 0 and 70% on socket 1.

"linux": {
    "intelRdt": {
        "memBwSchema": "MB:0=20;1=70"
    }
}

Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com>
xiaochenshen added a commit to xiaochenshen/runc that referenced this pull request Nov 8, 2017
Memory Bandwidth Allocation (MBA) is a resource allocation sub-feature
of Intel Resource Director Technology (RDT) which is supported on some
Intel Xeon platforms. Intel RDT/MBA provides indirect and approximate
throttle over memory bandwidth for the software. A user controls the
resource by indicating the percentage of maximum memory bandwidth.

Hardware details of Intel RDT/MBA can be found in section 17.18 of
Intel Software Developer Manual:
https://software.intel.com/en-us/articles/intel-sdm

In Linux 4.12 kernel and newer, Intel RDT/MBA is enabled by kernel
config CONFIG_INTEL_RDT. If hardware support, CPU flags `rdt_a` and
`mba` will be set in /proc/cpuinfo.

Intel RDT "resource control" filesystem hierarchy:
mount -t resctrl resctrl /sys/fs/resctrl
tree /sys/fs/resctrl
/sys/fs/resctrl/
|-- info
|   |-- L3
|   |   |-- cbm_mask
|   |   |-- min_cbm_bits
|   |   |-- num_closids
|   |-- MB
|       |-- bandwidth_gran
|       |-- delay_linear
|       |-- min_bandwidth
|       |-- num_closids
|-- ...
|-- schemata
|-- tasks
|-- <container_id>
    |-- ...
    |-- schemata
    |-- tasks

For MBA support for `runc`, we will reuse the infrastructure and code
base of Intel RDT/CAT which implemented in opencontainers#1279. We could also make
use of `tasks` and `schemata` configuration for memory bandwidth
resource constraints.

The file `tasks` has a list of tasks that belongs to this group (e.g.,
<container_id>" group). Tasks can be added to a group by writing the
task ID to the "tasks" file (which will automatically remove them from
the previous group to which they belonged). New tasks created by
fork(2) and clone(2) are added to the same group as their parent.

The file `schemata` has a list of all the resources available to this
group. Each resource (L3 cache, memory bandwidth) has its own line and
format.

Memory bandwidth schema:
It has allocation values for memory bandwidth on each socket, which
contains L3 cache id and memory bandwidth percentage.
    Format: "MB:<cache_id0>=bandwidth0;<cache_id1>=bandwidth1;..."

The minimum bandwidth percentage value for each CPU model is predefined
and can be looked up through "info/MB/min_bandwidth". The bandwidth
granularity that is allocated is also dependent on the CPU model and
can be looked up at "info/MB/bandwidth_gran". The available bandwidth
control steps are: min_bw + N * bw_gran. Intermediate values are
rounded to the next control step available on the hardware.

For more information about Intel RDT kernel interface:
https://www.kernel.org/doc/Documentation/x86/intel_rdt_ui.txt

An example for runc:
Consider a two-socket machine with two L3 caches where the minimum
memory bandwidth of 10% with a memory bandwidth granularity of 10%.
Tasks inside the container may use a maximum memory bandwidth of 20%
on socket 0 and 70% on socket 1.

"linux": {
    "intelRdt": {
        "memBwSchema": "MB:0=20;1=70"
    }
}

Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com>
xiaochenshen added a commit to xiaochenshen/runc that referenced this pull request Sep 5, 2018
Memory Bandwidth Allocation (MBA) is a resource allocation sub-feature
of Intel Resource Director Technology (RDT) which is supported on some
Intel Xeon platforms. Intel RDT/MBA provides indirect and approximate
throttle over memory bandwidth for the software. A user controls the
resource by indicating the percentage of maximum memory bandwidth.

Hardware details of Intel RDT/MBA can be found in section 17.18 of
Intel Software Developer Manual:
https://software.intel.com/en-us/articles/intel-sdm

In Linux 4.12 kernel and newer, Intel RDT/MBA is enabled by kernel
config CONFIG_INTEL_RDT. If hardware support, CPU flags `rdt_a` and
`mba` will be set in /proc/cpuinfo.

Intel RDT "resource control" filesystem hierarchy:
mount -t resctrl resctrl /sys/fs/resctrl
tree /sys/fs/resctrl
/sys/fs/resctrl/
|-- info
|   |-- L3
|   |   |-- cbm_mask
|   |   |-- min_cbm_bits
|   |   |-- num_closids
|   |-- MB
|       |-- bandwidth_gran
|       |-- delay_linear
|       |-- min_bandwidth
|       |-- num_closids
|-- ...
|-- schemata
|-- tasks
|-- <container_id>
    |-- ...
    |-- schemata
    |-- tasks

For MBA support for `runc`, we will reuse the infrastructure and code
base of Intel RDT/CAT which implemented in opencontainers#1279. We could also make
use of `tasks` and `schemata` configuration for memory bandwidth
resource constraints.

The file `tasks` has a list of tasks that belongs to this group (e.g.,
<container_id>" group). Tasks can be added to a group by writing the
task ID to the "tasks" file (which will automatically remove them from
the previous group to which they belonged). New tasks created by
fork(2) and clone(2) are added to the same group as their parent.

The file `schemata` has a list of all the resources available to this
group. Each resource (L3 cache, memory bandwidth) has its own line and
format.

Memory bandwidth schema:
It has allocation values for memory bandwidth on each socket, which
contains L3 cache id and memory bandwidth percentage.
    Format: "MB:<cache_id0>=bandwidth0;<cache_id1>=bandwidth1;..."

The minimum bandwidth percentage value for each CPU model is predefined
and can be looked up through "info/MB/min_bandwidth". The bandwidth
granularity that is allocated is also dependent on the CPU model and
can be looked up at "info/MB/bandwidth_gran". The available bandwidth
control steps are: min_bw + N * bw_gran. Intermediate values are
rounded to the next control step available on the hardware.

For more information about Intel RDT kernel interface:
https://www.kernel.org/doc/Documentation/x86/intel_rdt_ui.txt

An example for runc:
Consider a two-socket machine with two L3 caches where the minimum
memory bandwidth of 10% with a memory bandwidth granularity of 10%.
Tasks inside the container may use a maximum memory bandwidth of 20%
on socket 0 and 70% on socket 1.

"linux": {
    "intelRdt": {
        "memBwSchema": "MB:0=20;1=70"
    }
}

Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com>
xiaochenshen added a commit to xiaochenshen/runc that referenced this pull request Sep 5, 2018
Memory Bandwidth Allocation (MBA) is a resource allocation sub-feature
of Intel Resource Director Technology (RDT) which is supported on some
Intel Xeon platforms. Intel RDT/MBA provides indirect and approximate
throttle over memory bandwidth for the software. A user controls the
resource by indicating the percentage of maximum memory bandwidth.

Hardware details of Intel RDT/MBA can be found in section 17.18 of
Intel Software Developer Manual:
https://software.intel.com/en-us/articles/intel-sdm

In Linux 4.12 kernel and newer, Intel RDT/MBA is enabled by kernel
config CONFIG_INTEL_RDT. If hardware support, CPU flags `rdt_a` and
`mba` will be set in /proc/cpuinfo.

Intel RDT "resource control" filesystem hierarchy:
mount -t resctrl resctrl /sys/fs/resctrl
tree /sys/fs/resctrl
/sys/fs/resctrl/
|-- info
|   |-- L3
|   |   |-- cbm_mask
|   |   |-- min_cbm_bits
|   |   |-- num_closids
|   |-- MB
|       |-- bandwidth_gran
|       |-- delay_linear
|       |-- min_bandwidth
|       |-- num_closids
|-- ...
|-- schemata
|-- tasks
|-- <container_id>
    |-- ...
    |-- schemata
    |-- tasks

For MBA support for `runc`, we will reuse the infrastructure and code
base of Intel RDT/CAT which implemented in opencontainers#1279. We could also make
use of `tasks` and `schemata` configuration for memory bandwidth
resource constraints.

The file `tasks` has a list of tasks that belongs to this group (e.g.,
<container_id>" group). Tasks can be added to a group by writing the
task ID to the "tasks" file (which will automatically remove them from
the previous group to which they belonged). New tasks created by
fork(2) and clone(2) are added to the same group as their parent.

The file `schemata` has a list of all the resources available to this
group. Each resource (L3 cache, memory bandwidth) has its own line and
format.

Memory bandwidth schema:
It has allocation values for memory bandwidth on each socket, which
contains L3 cache id and memory bandwidth percentage.
    Format: "MB:<cache_id0>=bandwidth0;<cache_id1>=bandwidth1;..."

The minimum bandwidth percentage value for each CPU model is predefined
and can be looked up through "info/MB/min_bandwidth". The bandwidth
granularity that is allocated is also dependent on the CPU model and
can be looked up at "info/MB/bandwidth_gran". The available bandwidth
control steps are: min_bw + N * bw_gran. Intermediate values are
rounded to the next control step available on the hardware.

For more information about Intel RDT kernel interface:
https://www.kernel.org/doc/Documentation/x86/intel_rdt_ui.txt

An example for runc:
Consider a two-socket machine with two L3 caches where the minimum
memory bandwidth of 10% with a memory bandwidth granularity of 10%.
Tasks inside the container may use a maximum memory bandwidth of 20%
on socket 0 and 70% on socket 1.

"linux": {
    "intelRdt": {
        "memBwSchema": "MB:0=20;1=70"
    }
}

Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com>
xiaochenshen added a commit to xiaochenshen/runc that referenced this pull request Sep 11, 2018
Memory Bandwidth Allocation (MBA) is a resource allocation sub-feature
of Intel Resource Director Technology (RDT) which is supported on some
Intel Xeon platforms. Intel RDT/MBA provides indirect and approximate
throttle over memory bandwidth for the software. A user controls the
resource by indicating the percentage of maximum memory bandwidth.

Hardware details of Intel RDT/MBA can be found in section 17.18 of
Intel Software Developer Manual:
https://software.intel.com/en-us/articles/intel-sdm

In Linux 4.12 kernel and newer, Intel RDT/MBA is enabled by kernel
config CONFIG_INTEL_RDT. If hardware support, CPU flags `rdt_a` and
`mba` will be set in /proc/cpuinfo.

Intel RDT "resource control" filesystem hierarchy:
mount -t resctrl resctrl /sys/fs/resctrl
tree /sys/fs/resctrl
/sys/fs/resctrl/
|-- info
|   |-- L3
|   |   |-- cbm_mask
|   |   |-- min_cbm_bits
|   |   |-- num_closids
|   |-- MB
|       |-- bandwidth_gran
|       |-- delay_linear
|       |-- min_bandwidth
|       |-- num_closids
|-- ...
|-- schemata
|-- tasks
|-- <container_id>
    |-- ...
    |-- schemata
    |-- tasks

For MBA support for `runc`, we will reuse the infrastructure and code
base of Intel RDT/CAT which implemented in opencontainers#1279. We could also make
use of `tasks` and `schemata` configuration for memory bandwidth
resource constraints.

The file `tasks` has a list of tasks that belongs to this group (e.g.,
<container_id>" group). Tasks can be added to a group by writing the
task ID to the "tasks" file (which will automatically remove them from
the previous group to which they belonged). New tasks created by
fork(2) and clone(2) are added to the same group as their parent.

The file `schemata` has a list of all the resources available to this
group. Each resource (L3 cache, memory bandwidth) has its own line and
format.

Memory bandwidth schema:
It has allocation values for memory bandwidth on each socket, which
contains L3 cache id and memory bandwidth percentage.
    Format: "MB:<cache_id0>=bandwidth0;<cache_id1>=bandwidth1;..."

The minimum bandwidth percentage value for each CPU model is predefined
and can be looked up through "info/MB/min_bandwidth". The bandwidth
granularity that is allocated is also dependent on the CPU model and
can be looked up at "info/MB/bandwidth_gran". The available bandwidth
control steps are: min_bw + N * bw_gran. Intermediate values are
rounded to the next control step available on the hardware.

For more information about Intel RDT kernel interface:
https://www.kernel.org/doc/Documentation/x86/intel_rdt_ui.txt

An example for runc:
Consider a two-socket machine with two L3 caches where the minimum
memory bandwidth of 10% with a memory bandwidth granularity of 10%.
Tasks inside the container may use a maximum memory bandwidth of 20%
on socket 0 and 70% on socket 1.

"linux": {
    "intelRdt": {
        "memBwSchema": "MB:0=20;1=70"
    }
}

Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com>
xiaochenshen added a commit to xiaochenshen/runc that referenced this pull request Oct 14, 2018
Memory Bandwidth Allocation (MBA) is a resource allocation sub-feature
of Intel Resource Director Technology (RDT) which is supported on some
Intel Xeon platforms. Intel RDT/MBA provides indirect and approximate
throttle over memory bandwidth for the software. A user controls the
resource by indicating the percentage of maximum memory bandwidth.

Hardware details of Intel RDT/MBA can be found in section 17.18 of
Intel Software Developer Manual:
https://software.intel.com/en-us/articles/intel-sdm

In Linux 4.12 kernel and newer, Intel RDT/MBA is enabled by kernel
config CONFIG_INTEL_RDT. If hardware support, CPU flags `rdt_a` and
`mba` will be set in /proc/cpuinfo.

Intel RDT "resource control" filesystem hierarchy:
mount -t resctrl resctrl /sys/fs/resctrl
tree /sys/fs/resctrl
/sys/fs/resctrl/
|-- info
|   |-- L3
|   |   |-- cbm_mask
|   |   |-- min_cbm_bits
|   |   |-- num_closids
|   |-- MB
|       |-- bandwidth_gran
|       |-- delay_linear
|       |-- min_bandwidth
|       |-- num_closids
|-- ...
|-- schemata
|-- tasks
|-- <container_id>
    |-- ...
    |-- schemata
    |-- tasks

For MBA support for `runc`, we will reuse the infrastructure and code
base of Intel RDT/CAT which implemented in opencontainers#1279. We could also make
use of `tasks` and `schemata` configuration for memory bandwidth
resource constraints.

The file `tasks` has a list of tasks that belongs to this group (e.g.,
<container_id>" group). Tasks can be added to a group by writing the
task ID to the "tasks" file (which will automatically remove them from
the previous group to which they belonged). New tasks created by
fork(2) and clone(2) are added to the same group as their parent.

The file `schemata` has a list of all the resources available to this
group. Each resource (L3 cache, memory bandwidth) has its own line and
format.

Memory bandwidth schema:
It has allocation values for memory bandwidth on each socket, which
contains L3 cache id and memory bandwidth percentage.
    Format: "MB:<cache_id0>=bandwidth0;<cache_id1>=bandwidth1;..."

The minimum bandwidth percentage value for each CPU model is predefined
and can be looked up through "info/MB/min_bandwidth". The bandwidth
granularity that is allocated is also dependent on the CPU model and
can be looked up at "info/MB/bandwidth_gran". The available bandwidth
control steps are: min_bw + N * bw_gran. Intermediate values are
rounded to the next control step available on the hardware.

For more information about Intel RDT kernel interface:
https://www.kernel.org/doc/Documentation/x86/intel_rdt_ui.txt

An example for runc:
Consider a two-socket machine with two L3 caches where the minimum
memory bandwidth of 10% with a memory bandwidth granularity of 10%.
Tasks inside the container may use a maximum memory bandwidth of 20%
on socket 0 and 70% on socket 1.

"linux": {
    "intelRdt": {
        "memBwSchema": "MB:0=20;1=70"
    }
}

Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com>
xiaochenshen added a commit to xiaochenshen/runc that referenced this pull request Oct 16, 2018
Memory Bandwidth Allocation (MBA) is a resource allocation sub-feature
of Intel Resource Director Technology (RDT) which is supported on some
Intel Xeon platforms. Intel RDT/MBA provides indirect and approximate
throttle over memory bandwidth for the software. A user controls the
resource by indicating the percentage of maximum memory bandwidth.

Hardware details of Intel RDT/MBA can be found in section 17.18 of
Intel Software Developer Manual:
https://software.intel.com/en-us/articles/intel-sdm

In Linux 4.12 kernel and newer, Intel RDT/MBA is enabled by kernel
config CONFIG_INTEL_RDT. If hardware support, CPU flags `rdt_a` and
`mba` will be set in /proc/cpuinfo.

Intel RDT "resource control" filesystem hierarchy:
mount -t resctrl resctrl /sys/fs/resctrl
tree /sys/fs/resctrl
/sys/fs/resctrl/
|-- info
|   |-- L3
|   |   |-- cbm_mask
|   |   |-- min_cbm_bits
|   |   |-- num_closids
|   |-- MB
|       |-- bandwidth_gran
|       |-- delay_linear
|       |-- min_bandwidth
|       |-- num_closids
|-- ...
|-- schemata
|-- tasks
|-- <container_id>
    |-- ...
    |-- schemata
    |-- tasks

For MBA support for `runc`, we will reuse the infrastructure and code
base of Intel RDT/CAT which implemented in opencontainers#1279. We could also make
use of `tasks` and `schemata` configuration for memory bandwidth
resource constraints.

The file `tasks` has a list of tasks that belongs to this group (e.g.,
<container_id>" group). Tasks can be added to a group by writing the
task ID to the "tasks" file (which will automatically remove them from
the previous group to which they belonged). New tasks created by
fork(2) and clone(2) are added to the same group as their parent.

The file `schemata` has a list of all the resources available to this
group. Each resource (L3 cache, memory bandwidth) has its own line and
format.

Memory bandwidth schema:
It has allocation values for memory bandwidth on each socket, which
contains L3 cache id and memory bandwidth percentage.
    Format: "MB:<cache_id0>=bandwidth0;<cache_id1>=bandwidth1;..."

The minimum bandwidth percentage value for each CPU model is predefined
and can be looked up through "info/MB/min_bandwidth". The bandwidth
granularity that is allocated is also dependent on the CPU model and
can be looked up at "info/MB/bandwidth_gran". The available bandwidth
control steps are: min_bw + N * bw_gran. Intermediate values are
rounded to the next control step available on the hardware.

For more information about Intel RDT kernel interface:
https://www.kernel.org/doc/Documentation/x86/intel_rdt_ui.txt

An example for runc:
Consider a two-socket machine with two L3 caches where the minimum
memory bandwidth of 10% with a memory bandwidth granularity of 10%.
Tasks inside the container may use a maximum memory bandwidth of 20%
on socket 0 and 70% on socket 1.

"linux": {
    "intelRdt": {
        "memBwSchema": "MB:0=20;1=70"
    }
}

Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com>
verm666 pushed a commit to verm666/runc that referenced this pull request Oct 16, 2018
Memory Bandwidth Allocation (MBA) is a resource allocation sub-feature
of Intel Resource Director Technology (RDT) which is supported on some
Intel Xeon platforms. Intel RDT/MBA provides indirect and approximate
throttle over memory bandwidth for the software. A user controls the
resource by indicating the percentage of maximum memory bandwidth.

Hardware details of Intel RDT/MBA can be found in section 17.18 of
Intel Software Developer Manual:
https://software.intel.com/en-us/articles/intel-sdm

In Linux 4.12 kernel and newer, Intel RDT/MBA is enabled by kernel
config CONFIG_INTEL_RDT. If hardware support, CPU flags `rdt_a` and
`mba` will be set in /proc/cpuinfo.

Intel RDT "resource control" filesystem hierarchy:
mount -t resctrl resctrl /sys/fs/resctrl
tree /sys/fs/resctrl
/sys/fs/resctrl/
|-- info
|   |-- L3
|   |   |-- cbm_mask
|   |   |-- min_cbm_bits
|   |   |-- num_closids
|   |-- MB
|       |-- bandwidth_gran
|       |-- delay_linear
|       |-- min_bandwidth
|       |-- num_closids
|-- ...
|-- schemata
|-- tasks
|-- <container_id>
    |-- ...
    |-- schemata
    |-- tasks

For MBA support for `runc`, we will reuse the infrastructure and code
base of Intel RDT/CAT which implemented in opencontainers#1279. We could also make
use of `tasks` and `schemata` configuration for memory bandwidth
resource constraints.

The file `tasks` has a list of tasks that belongs to this group (e.g.,
<container_id>" group). Tasks can be added to a group by writing the
task ID to the "tasks" file (which will automatically remove them from
the previous group to which they belonged). New tasks created by
fork(2) and clone(2) are added to the same group as their parent.

The file `schemata` has a list of all the resources available to this
group. Each resource (L3 cache, memory bandwidth) has its own line and
format.

Memory bandwidth schema:
It has allocation values for memory bandwidth on each socket, which
contains L3 cache id and memory bandwidth percentage.
    Format: "MB:<cache_id0>=bandwidth0;<cache_id1>=bandwidth1;..."

The minimum bandwidth percentage value for each CPU model is predefined
and can be looked up through "info/MB/min_bandwidth". The bandwidth
granularity that is allocated is also dependent on the CPU model and
can be looked up at "info/MB/bandwidth_gran". The available bandwidth
control steps are: min_bw + N * bw_gran. Intermediate values are
rounded to the next control step available on the hardware.

For more information about Intel RDT kernel interface:
https://www.kernel.org/doc/Documentation/x86/intel_rdt_ui.txt

An example for runc:
Consider a two-socket machine with two L3 caches where the minimum
memory bandwidth of 10% with a memory bandwidth granularity of 10%.
Tasks inside the container may use a maximum memory bandwidth of 20%
on socket 0 and 70% on socket 1.

"linux": {
    "intelRdt": {
        "memBwSchema": "MB:0=20;1=70"
    }
}

Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants