Improve total L2 size calculation logic on osx-arm64 #75881

neon-sunset · 2022-09-20T09:43:07Z

While looking at #75854 and double-checking sysctl behavior, I've noticed that on M1 Pro the actual value reported for hw.perflevel0.l2cachesize is 12582912 instead of roughly 24MB which is its actual L2 cache size.

However, sysctl has another key hw.perflevel0.cpusperl2 which allows us to calculate the total size of L2 of all performance cores.

macOS 13.0 22A5342f arm64 | M1 Pro 2E+6P

hw.perflevel0.physicalcpu: 6
hw.perflevel0.physicalcpu_max: 6
hw.perflevel0.logicalcpu: 6
hw.perflevel0.logicalcpu_max: 6
hw.perflevel0.l1icachesize: 196608
hw.perflevel0.l1dcachesize: 131072
hw.perflevel0.l2cachesize: 12582912
hw.perflevel0.cpusperl2: 3
hw.perflevel0.name: Performance

EgorBo · 2022-09-20T12:58:06Z

I am not sure we're interested in total size, L3 that we use to calculate gen0 budget is expected to be a single chip, like per core group or a unified one. It's likely that we already return total on other platform but we'd better fix those IMO.

The idea, how I understand it, is to be able to put the while gen0 into a single piece of cache memory in order to walk it efficiently (especially here on Workstation GC as macOS is unlikely to use Server GC)

neon-sunset · 2022-09-20T13:52:55Z

Core groups within performance cluster on M1 chips (sorry, there's no official terminology so it's confusing) work mostly (as far as I understand) as power/clock domains, while they don't share L2, regular x86 don't do that either. They are, however, within the same cluster similar to single CCX of AMD CPU rather than two different ones.

This is mostly to account for the fact that no L3 data is available on osx-arm64. Even when not hitting L2, it appears that for now we can assume that SLC which effectively works like memory-side L3 is of similar/larger size to total L2 cache size. SLC for M1 is 8MB (12MB of L2$), Pro is 24MB and Max is 48MB.

My understanding is that other code paths already calculate total L2 size without even accounting for the fact whether it's shared or not if L3 number is unavailable so I think it will be reporting numbers closer to x86_64 counterparts.

If you have any benchmarks on hand or other links to look into to gauge the difference pre and post this change - please let me know!

As for which size of L2 is observable for CPU cores with low latency, it is unclear and reverse-engineered data varies.

See: https://www.realworldtech.com/forum/?threadid=205277&curpostid=205283 and https://www.anandtech.com/show/17024/apple-m1-max-performance-review/2

p.s.: fun cursed fact, performance cores on M1 have variable sizes of their respective L2 slices, M1 is 5-1-3-3 and M1 Pro/Max is 1-5-5-1 and 3-3-3-3 megabytes of L2$.

EgorBo · 2022-09-20T14:05:37Z

If you have any benchmarks on hand or other links to look into to gauge the difference pre and post this change - please let me know!

You can try these #64576

and could you also measure the working set size difference, presumably it will be higher with this.

mangod9 · 2022-10-24T15:18:35Z

Hi @neon-sunset, is this an experimental PR, we could mark it as a draft. Looks like you are still running some perf tests to check what the impact is here?

neon-sunset · 2022-10-24T15:51:10Z

Hi @neon-sunset, is this an experimental PR, we could mark it as a draft. Looks like you are still running some perf tests to check what the impact is here?

Yes, I haven't been able to work on it recently so if you could mark it as draft - please do and I will change it back to ready for review once there is data available to back up (or not) the suggested change. Thanks!

ghost · 2022-11-23T17:02:13Z

Draft Pull Request was automatically closed for 30 days of inactivity. Please let us know if you'd like to reopen it.

Calculate total L2 size of all performance cores on osx-arm64

4dc3c58

ghost added community-contribution Indicates that the PR has been added by a community member area-PAL-coreclr labels Sep 20, 2022

Fix formatting

3781b7e

neon-sunset marked this pull request as ready for review September 20, 2022 11:50

EgorBo mentioned this pull request Sep 28, 2022

GC picks wrong L3 cache size on Linux #76290

Closed

theolivenbaum mentioned this pull request Sep 30, 2022

App hangs on macOS / Apple Silicon when run under Rosetta #76356

Closed

neon-sunset closed this Oct 24, 2022

neon-sunset reopened this Oct 24, 2022

mangod9 marked this pull request as draft October 24, 2022 15:52

build-analysis bot mentioned this pull request Oct 24, 2022

Microsoft.DotNet.CoreSetup.Test.HostActivation tests failing in CI #75429

Closed

ghost closed this Nov 23, 2022

ghost locked as resolved and limited conversation to collaborators Dec 23, 2022

This pull request was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improve total L2 size calculation logic on osx-arm64 #75881

Improve total L2 size calculation logic on osx-arm64 #75881

Uh oh!

neon-sunset commented Sep 20, 2022

Uh oh!

EgorBo commented Sep 20, 2022

Uh oh!

neon-sunset commented Sep 20, 2022 •

edited

Loading

Uh oh!

EgorBo commented Sep 20, 2022

Uh oh!

mangod9 commented Oct 24, 2022

Uh oh!

neon-sunset commented Oct 24, 2022 •

edited

Loading

Uh oh!

ghost commented Nov 23, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Improve total L2 size calculation logic on osx-arm64 #75881

Improve total L2 size calculation logic on osx-arm64 #75881

Uh oh!

Conversation

neon-sunset commented Sep 20, 2022

Uh oh!

EgorBo commented Sep 20, 2022

Uh oh!

neon-sunset commented Sep 20, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

EgorBo commented Sep 20, 2022

Uh oh!

mangod9 commented Oct 24, 2022

Uh oh!

neon-sunset commented Oct 24, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ghost commented Nov 23, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

neon-sunset commented Sep 20, 2022 •

edited

Loading

neon-sunset commented Oct 24, 2022 •

edited

Loading