Skip to content

Conversation

@kolyshkin
Copy link
Contributor

@kolyshkin kolyshkin commented Jun 18, 2025

For changelog, see https://github.com/opencontainers/cgroups/releases/tag/v0.0.3

This fixes three runc issues:

  1. Since cgroups v0.0.2, Cgroups.Resources may be nil after loading, but cgroup v1 fs manager is not expecting it.

  2. JSON incompatibility introduced in cgroups v0.0.2 (see Remove annotations from Resources cgroups#22).

  3. Bad CPU shares to CPU weight conversion (see Conversion of cgroup v1 CPU shares to v2 CPU weight causes workloads to have low CPU priority #4772).

Due to item 3, modify some tests accordingly (see #4772).

Fixes: #4772

@kolyshkin kolyshkin force-pushed the cgroups-v003 branch 4 times, most recently from 11b933a to f1abb32 Compare June 19, 2025 00:41
@kolyshkin kolyshkin requested a review from Copilot June 19, 2025 01:16
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR bumps the cgroups dependency to v0.0.3 to address JSON incompatibility and CPU shares conversion issues. The changes include updating dependency versions in go.mod, revising the CPU shares to CPU weight conversion logic in the test helper, and adjusting corresponding tests accordingly.

Reviewed Changes

Copilot reviewed 4 out of 8 changed files in this pull request and generated no comments.

File Description
tests/integration/update.bats Updated test comments and assertion functions to reflect revised conversion logic.
tests/integration/helpers.bash Modified CPU shares conversion, using a multi-line awk script to compute weight.
libcontainer/factory_linux.go Added a safeguard to initialize Resources when nil for cgroup v1 compatibility.
go.mod Bumped the dependency version for github.com/opencontainers/cgroups to v0.0.3.

@kolyshkin kolyshkin changed the title deps: bump cgroups to v0.0.3 deps: bump cgroups to v0.0.3, fix a few regressions Jun 19, 2025
Since opencontainers/cgroups v0.0.2 (commit b206a01), all stuct
Resources fields are annotated with "omitempty" attribute.
As a result, the loaded configuration may have Resources == nil.

It is totally OK (rootless containers may have no resources configured)
except since commit 6c5441e, cgroup v1 fs manager requires Resources to
be set in the call to NewManager (this is a cgroup v1 deficiency,
or maybe our implementation deficiency, or both).

To work around this, let's add code to ensure Resources is never nil
after loading from state.json.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
For changelog, see https://github.com/opencontainers/cgroups/releases/tag/v0.0.3

This fixes two runc issues:

1. JSON incompatibility introduced in cgroups v0.0.2 (see
   opencontainers/cgroups#22).

2. Bad CPU shares to CPU weight conversion (see
   opencontainers#4772).

Due to item 2, modify some tests accordingly.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
@kolyshkin kolyshkin requested review from a team, cyphar and lifubang June 19, 2025 17:28
Copy link
Member

@thaJeztah thaJeztah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thx!

@thaJeztah thaJeztah merged commit ff2494b into opencontainers:main Jun 19, 2025
31 checks passed
jaredledvina added a commit to DataDog/runc that referenced this pull request Sep 23, 2025
Signed-off-by: Jared Ledvina <jared.ledvina@datadoghq.com>
jaredledvina added a commit to DataDog/runc that referenced this pull request Sep 23, 2025
Signed-off-by: Jared Ledvina <jared.ledvina@datadoghq.com>
jaredledvina added a commit to DataDog/runc that referenced this pull request Sep 23, 2025
Signed-off-by: Jared Ledvina <jared.ledvina@datadoghq.com>
@kolyshkin kolyshkin added the backport/1.3-done A PR in main branch which has been backported to release-1.3 label Sep 24, 2025
@kolyshkin kolyshkin mentioned this pull request Oct 1, 2025
dd-mergequeue bot pushed a commit to DataDog/datadog-agent that referenced this pull request Dec 15, 2025
… >= 1.3.2 (#43760)

### What does this PR do?
 - Auto-detect which conversion formula the runtime uses by comparing the actual cgroup weight with expected values from Docker Inspect
  - Support both old (linear) and new (quadratic) formulas for backward compatibility
  - Use the geometric mean approach for the inverse formula to handle ceil() rounding
  
New formula for shares to weight:
```
  exponent = (l² + 125l) / 612 - 7/34; 
  weight = ceil(10^exponent)
```
We are doing inverse formula to get shares from weight in this PR:
See **cpuWeightToSharesNonLinear**(cpuWeight float64) function, same logic can be found in other reference: https://github.com/cloudfoundry/guardian/blob/main/rundmc/utils.go

### Motivation
 On cgroups v2 systems running runc >= 1.3.2 (or crun >= 1.23), the docker.cpu.shares metric reported incorrect values. For example, containers configured with the default --cpu-shares 1024 would show
  2597 instead of 1024.

  Root cause: The container runtime changed its CPU shares-to-weight conversion formula from linear to quadratic (opencontainers/runc#4785), but the Agent was still using the old linear inverse formula.

  Old formula (linear):
  - weight = 1 + ((shares - 2) * 9999) / 262142
  - 1024 shares → weight 39 (incorrect default mapping)

  New formula (quadratic):
  - Uses logarithmic function to ensure min, max, and default values align
  - 1024 shares → weight 100 (correct default mapping)

  The Datadog Agent was using the old formula's inverse to convert weight back to shares, causing:
  - With new runc: weight 100 → 2597 shares (should be 1024)

### Describe how you validated your changes
unit test

### Additional Notes


Co-authored-by: minyi.zhu <minyi.zhu@datadoghq.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport/1.3-done A PR in main branch which has been backported to release-1.3

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Conversion of cgroup v1 CPU shares to v2 CPU weight causes workloads to have low CPU priority

3 participants