Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Guard against -1 Returned from sysconf for the Cache Sizes Causing Large Gen0 Sizes and Budgets for Certain Linux Distributions. #100502

Merged
merged 14 commits into from
Apr 3, 2024

Conversation

mrsharm
Copy link
Member

@mrsharm mrsharm commented Apr 1, 2024

Issue Summary

  • GetLogicalProcessorCacheSizeFromOS retrieves a bogus value of SIZE_MAX (-1 but type casted to a size_t) from the sysconf call for the L4 Cache Size that's now recognized has the last cache size (as it's not 0) causing a significantly larger gen0 size and budget for Debian 12 vs. Debian 11 for the same hardware i.e., same cache sizes, architecture etc. This behavior also seems to show up for Ubuntu 22.04.
  • This is confirmed by running getconf -a | grep LEVEL4_CACHE on a Debian 12 (and an Ubuntu 22.04) machine results in an empty string for the L4 cache size - this is different from the behavior of Debian 11 that returns a 0 for the L4 cache size.
  • SIZE_MAX from GetLogicalProcessorCacheSizeFromOS is also returned from GetCacheSizePerLogicalCpu and causes an extremely large gen0 size thereby affecting the gen0 budget for Server GC.

Further Details

Running getconf -a | grep LEVEL4_CACHE Outputs:

Debian 11

LEVEL4_CACHE_SIZE 0
LEVEL4_CACHE_ASSOC 0
LEVEL4_CACHE_LINESIZE 0

Debian 12

LEVEL4_CACHE_SIZE
LEVEL4_CACHE_ASSOC
LEVEL4_CACHE_LINESIZE


Ubuntu 20.04

LEVEL4_CACHE_SIZE 0
LEVEL4_CACHE_ASSOC 0
LEVEL4_CACHE_LINESIZE 0

Ubuntu 22.04

LEVEL4_CACHE_SIZE
LEVEL4_CACHE_ASSOC
LEVEL4_CACHE_LINESIZE

How To Detect This Issue

Run getconf -a | grep LEVEL4_CACHE and observe if the value for the LEVEL4_CACHE is empty or 0. If it is empty, it indicates that sysconf will return -1 and therefore, we will be susceptible to a large gen0 size and not if it's 0.

Solution

  • Guard against -1 from being returned from sysconf that we cast to a size_t and therefore becomes SIZE_MAX to prevent much larger gen0 sizes and budgets than if -1 wasn't returned from sysconf.

Testing

First, we confirmed that running a standalone C program on Debian 11 vs. Debian 12 with code:

#include <unistd.h>
#include <stdio.h>

int main (int argc, char **argv) {
    printf( " Raw LEVEL4 Cache Size: %lu \n",  sysconf( _SC_LEVEL4_CACHE_SIZE ));
    return 0;
}

Results in:

Debian 11:
Raw LEVEL4 Cache Size: 0

Debian 12:
Raw LEVEL4 Cache Size: 18446744073709551615 or SIZE_MAX

Ubuntu 22.04:
Raw L4 Cache Value: 18446744073709551615 or SIZE_MAX

Additionally, with this fix, the customer confirmed that this fix resulted in their memory going back to previous levels.

@mrsharm mrsharm changed the title Cache Values Guard Guard against Spurious Values from sysconf for Debian 12 Apr 2, 2024
@Maoni0
Copy link
Member

Maoni0 commented Apr 2, 2024

                cacheSize = std::max(cacheSize, size);

what if size is SIZE_MAX here?


Refers to: src/coreclr/gc/unix/gcenv.unix.cpp:925 in 5868885. [](commit_id = 5868885, deletion_comment = False)

@mrsharm mrsharm marked this pull request as ready for review April 2, 2024 07:50
@mrsharm mrsharm changed the title Guard against Spurious Values from sysconf for Debian 12 Guard against SIZE_MAX Returned from sysconf for the Cache Sizes Apr 2, 2024
@mrsharm mrsharm changed the title Guard against SIZE_MAX Returned from sysconf for the Cache Sizes Guard against -1 Returned from sysconf for the Cache Sizes Apr 2, 2024
@mrsharm
Copy link
Member Author

mrsharm commented Apr 2, 2024

                cacheSize = std::max(cacheSize, size);

what if size is SIZE_MAX here?

Refers to: src/coreclr/gc/unix/gcenv.unix.cpp:925 in 5868885. [](commit_id = 5868885, deletion_comment = False)

Guarding against this now by rejecting any 0 or SIZE_MAX values here.

@mrsharm mrsharm changed the title Guard against -1 Returned from sysconf for the Cache Sizes Guard against -1 Returned from sysconf for the Cache Sizes Causing Large Gen0 Sizes and Budgets for Certain Linux Distributions. Apr 2, 2024
@mrsharm mrsharm requested a review from Maoni0 April 3, 2024 00:05
Copy link
Member

@jkotas jkotas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me as a minimal delta fix for servicing backport.

(I do not particularly like how some of the local variables are used in the macro, but others are not. It makes the code hard to follow. Cleaning it up is not necessary for the servicing fix.)

@mrsharm mrsharm merged commit 0c0b8b2 into dotnet:main Apr 3, 2024
92 checks passed
@mrsharm
Copy link
Member Author

mrsharm commented Apr 3, 2024

/backport to release/8.0

Copy link
Contributor

github-actions bot commented Apr 3, 2024

Started backporting to release/8.0: https://github.com/dotnet/runtime/actions/runs/8533880062

@mrsharm
Copy link
Member Author

mrsharm commented Apr 3, 2024

/backport to release/8.0-staging

Copy link
Contributor

github-actions bot commented Apr 3, 2024

Started backporting to release/8.0-staging: https://github.com/dotnet/runtime/actions/runs/8533996443

@janvorli
Copy link
Member

janvorli commented Apr 3, 2024

I do not particularly like how some of the local variables are used in the macro, but others are not. It makes the code hard to follow.

I completely agree that it would be good to eventually clean this up. Macros using ambient stuff are always difficult to follow, debug and reason about.

@mrsharm
Copy link
Member Author

mrsharm commented Apr 3, 2024

Looks good to me as a minimal delta fix for servicing backport.

(I do not particularly like how some of the local variables are used in the macro, but others are not. It makes the code hard to follow. Cleaning it up is not necessary for the servicing fix.)

Addressed: #100596

matouskozak pushed a commit to matouskozak/runtime that referenced this pull request Apr 30, 2024
…rge Gen0 Sizes and Budgets for Certain Linux Distributions. (dotnet#100502)

* Logging.

* Fixed comparison check

* Fix logical operations

* Completely guard against the cacheSize as UINTMAX_MAX

* Fix for right macro

* Ensure we are guarded against all cases where cacheSize == SIZE_MAX

* Added an extra guard and removed redundant case

* Comment clean

* Added some additional asserts

* Removed unnecessary checks for cacheSize == SIZE_MAX

* Cleaned up logic

* Fix type casting comparison

* Removed redundant comment

* Removed one more unneccesary guard
@debracey
Copy link

debracey commented May 2, 2024

Possibly cause of

dotnet/aspnetcore#54405

Ruihan-Yin pushed a commit to Ruihan-Yin/runtime that referenced this pull request May 30, 2024
…rge Gen0 Sizes and Budgets for Certain Linux Distributions. (dotnet#100502)

* Logging.

* Fixed comparison check

* Fix logical operations

* Completely guard against the cacheSize as UINTMAX_MAX

* Fix for right macro

* Ensure we are guarded against all cases where cacheSize == SIZE_MAX

* Added an extra guard and removed redundant case

* Comment clean

* Added some additional asserts

* Removed unnecessary checks for cacheSize == SIZE_MAX

* Cleaned up logic

* Fix type casting comparison

* Removed redundant comment

* Removed one more unneccesary guard
@github-actions github-actions bot locked and limited conversation to collaborators Jun 2, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants