Skip to content

GC picks wrong L3 cache size on Linux #76290

@smoogipoo

Description

@smoogipoo

Description

In #48937 it was found that my gen0 budget was 32MiB. Investigating this further, I believe it may even be as high as 64MiB which causes the erratic Gen0 collection times I'm seeing.

System configuration:

  • AMD Ryzen 3950x

I wrote a simple app following the implementations that the GC uses on x86 (as far as I can tell) for both Linux and Windows:

  • Linux:
    size_t cacheLevel = 0;
    size_t cacheSize = 0;
    size_t size;
    #ifdef _SC_LEVEL1_DCACHE_SIZE
    size = ( size_t) sysconf(_SC_LEVEL1_DCACHE_SIZE);
    UPDATE_CACHE_SIZE_AND_LEVEL(size, 1)
    #endif
    #ifdef _SC_LEVEL2_CACHE_SIZE
    size = ( size_t) sysconf(_SC_LEVEL2_CACHE_SIZE);
    UPDATE_CACHE_SIZE_AND_LEVEL(size, 2)
    #endif
    #ifdef _SC_LEVEL3_CACHE_SIZE
    size = ( size_t) sysconf(_SC_LEVEL3_CACHE_SIZE);
    UPDATE_CACHE_SIZE_AND_LEVEL(size, 3)
    #endif
    #ifdef _SC_LEVEL4_CACHE_SIZE
    size = ( size_t) sysconf(_SC_LEVEL4_CACHE_SIZE);
    UPDATE_CACHE_SIZE_AND_LEVEL(size, 4)
    #endif
  • Windows:
    DWORD nEntries = 0;
    // Try to use GetLogicalProcessorInformation API and get a valid pointer to the SLPI array if successful. Returns NULL
    // if API not present or on failure.
    SYSTEM_LOGICAL_PROCESSOR_INFORMATION *pslpi = GetLPI(&nEntries) ;
    if (pslpi == NULL)
    {
    // GetLogicalProcessorInformation not supported or failed.
    goto Exit;
    }
    // Crack the information. Iterate through all the SLPI array entries for all processors in system.
    // Will return the greatest of all the processor cache sizes or zero
    {
    size_t last_cache_size = 0;
    for (DWORD i=0; i < nEntries; i++)
    {
    if (pslpi[i].Relationship == RelationCache)
    {
    if (last_cache_size < pslpi[i].Cache.Size)
    {
    last_cache_size = pslpi[i].Cache.Size;
    cache_level = pslpi[i].Cache.Level;
    }
    }
    }
    cache_size = last_cache_size;
    }

... And found that on Windows it outputs a cache size of 16MiB and on Linux of 64MiB.

I believe the 64MiB value to be incorrectly chosen for this system given the CPU topology:
image

This makes sense, since _SC_LEVEL3_CACHE_SIZE returns the total L3 size.

On other platforms, the GC queries /sys/devices/system/cpu/cpu0/cache/index*/size to determine the cache size: https://github.com/filipnavara/runtime/blob/1955928833e178392f3a40ac1509f0d4a6ca7632/src/coreclr/gc/unix/gcenv.unix.cpp#L901-L935

... Which results in more reasonable values:

$ cat /sys/devices/system/cpu/cpu0/cache/index0/size
32K
$ cat /sys/devices/system/cpu/cpu0/cache/index1/size
32K
$ cat /sys/devices/system/cpu/cpu0/cache/index2/size
512K
$ cat /sys/devices/system/cpu/cpu0/cache/index3/size
16384K

Reproduction Steps

I'm not sure how to extract the Gen0 budget from the GC, so I wrote an app that uses the same method as the GC to determine cache size: https://github.com/smoogipoo/CacheSizeTest

It can be run on Windows and Linux.

Must be run with a multi-CCX CPU such as Ryzen 3950x.

Expected behavior

The cache size on Linux should be 16MiB.

Actual behavior

The cache size on Linux is 64MiB.

Regression?

No response

Known Workarounds

No response

Configuration

No response

Other information

No response

Metadata

Metadata

Assignees

Labels

area-GC-coreclrin-prThere is an active PR which will close this issue when it is merged

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions