Skip to content

fix(sysinfo): use RSS instead of usage_in_bytes for cgroup memory stats#1739

Merged
gaius-qi merged 7 commits intomainfrom
feature/bbr
Mar 20, 2026
Merged

fix(sysinfo): use RSS instead of usage_in_bytes for cgroup memory stats#1739
gaius-qi merged 7 commits intomainfrom
feature/bbr

Conversation

@gaius-qi
Copy link
Copy Markdown
Member

Description

This pull request introduces several improvements and refactors to the system resource monitoring components, primarily focusing on enhanced accuracy, async handling, and richer diagnostics. The most notable changes are the switch to async CPU stat collection, improved cgroup support for container environments, and the addition of detailed debug logging throughout the sysinfo modules.

Resource Monitoring Enhancements

  • Made CPU stat collection methods (get_stats and get_process_stats in CPU) asynchronous, allowing for more accurate and up-to-date readings by refreshing and waiting between samples. [1] [2]
  • Refactored cgroup stats handling for both CPU and memory, including improved calculation for used percent, and added support for reporting cgroup stats when running in a container. [1] [2] [3] [4]

Container Environment Support

  • Added detection for container environments and conditional reporting of cgroup CPU and memory stats in the SchedulerAnnouncer. [1] [2] [3] [4]
  • Updated overload detection logic to use the new async CPU stat methods and improved memory percent calculation. [1] [2]

Debug Logging Improvements

  • Introduced detailed debug logging in sysinfo modules for CPU, memory, disk, and network, providing visibility into resource usage calculations and aiding troubleshooting. [1] [2] [3] [4] [5] [6] [7] [8] [9] [10]

Version Updates

  • Bumped the workspace and crate versions from 1.2.17 to 1.2.18 in Cargo.toml, and updated all workspace dependency versions accordingly. [1] [2]

Code Structure and Accuracy Improvements

  • Improved calculation of used percent for CPU and memory, clamping results to valid ranges, and using logical core count consistently. [1] [2] [3] [4] [5] [6]

These changes collectively increase the reliability and transparency of system resource monitoring, especially in containerized environments, and facilitate easier debugging and maintenance.

Related Issue

Motivation and Context

Screenshots (if appropriate)

@gaius-qi gaius-qi added this to the v2.5.0 milestone Mar 20, 2026
@gaius-qi gaius-qi self-assigned this Mar 20, 2026
@gaius-qi gaius-qi added the enhancement New feature or request label Mar 20, 2026
@gaius-qi gaius-qi changed the title Feature/bbr fix(sysinfo): use RSS instead of usage_in_bytes for cgroup memory stats Mar 20, 2026
Copy link
Copy Markdown
Member

@chlins chlins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@codecov
Copy link
Copy Markdown

codecov bot commented Mar 20, 2026

Codecov Report

❌ Patch coverage is 16.66667% with 5 lines in your changes missing coverage. Please review.
✅ Project coverage is 46.82%. Comparing base (f27be76) to head (4073bac).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
dragonfly-client-util/src/sysinfo/memory.rs 0.00% 3 Missing ⚠️
dragonfly-client-util/src/ratelimiter/bbr.rs 33.33% 2 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #1739      +/-   ##
==========================================
+ Coverage   46.79%   46.82%   +0.03%     
==========================================
  Files          87       87              
  Lines       24883    24881       -2     
==========================================
+ Hits        11643    11650       +7     
+ Misses      13240    13231       -9     
Files with missing lines Coverage Δ
dragonfly-client-util/src/ratelimiter/bbr.rs 79.20% <33.33%> (ø)
dragonfly-client-util/src/sysinfo/memory.rs 26.31% <0.00%> (+0.89%) ⬆️

... and 1 file with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@gaius-qi gaius-qi merged commit 0360d12 into main Mar 20, 2026
7 checks passed
@gaius-qi gaius-qi deleted the feature/bbr branch March 20, 2026 08:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants