Skip to content

Conversation

@Flamefire
Copy link
Contributor

@Flamefire Flamefire commented May 6, 2025

Only consider processors with an APIC_ID as valid.

For processors with Hyperthreads but disabled SMT the APIC_ID will never be set by CPUInfo and stays zero for the "disabled" threads. Among other inconsistencies this causes the first active thread/processor to be considered "the same" as all the disabled ones as they share APIC_ID=0

It also doesn't make sense to make and decisions based on the APIC_ID if we don't have any information on it.

So this patch adds a check for CPUINFO_LINUX_FLAG_APIC_ID.
The flag CPUINFO_LINUX_FLAG_APIC_ID will be set if and only if the APIC_ID has been set. If it has not then CPUInfo doesn't has any information about the APIC_ID, hence the default value of zero. See

processor->apic_id = apic_id;
processor->flags |= CPUINFO_LINUX_FLAG_APIC_ID;

On a dual socket AMD EPYC 9334 system (2x32 cores, 128 threads total (2/core)) the current output of cache-info is

Max cache size (upper bound): 33554432 bytes
L1 instruction cache: 64 x 32 KB, 8-way set associative (64 sets), 64 byte lines, shared by 65 processors
L1 data cache: 64 x 32 KB, 8-way set associative (64 sets), 64 byte lines, shared by 65 processors
L2 unified cache: 64 x 1 MB (inclusive), 8-way set associative (2048 sets), 64 byte lines, shared by 65 processors
L3 unified cache: 8 x 32 MB (exclusive), 16-way set associative (32768 sets), 64 byte lines, shared by 72 processors

Note the "65 processors" being one-off.

With this patch:

Max cache size (upper bound): 33554432 bytes
L1 instruction cache: 64 x 32 KB, 8-way set associative (64 sets), 64 byte lines, shared by 1 processors
L1 data cache: 64 x 32 KB, 8-way set associative (64 sets), 64 byte lines, shared by 1 processors
L2 unified cache: 64 x 1 MB (inclusive), 8-way set associative (2048 sets), 64 byte lines, shared by 1 processors
L3 unified cache: 8 x 32 MB (exclusive), 16-way set associative (32768 sets), 64 byte lines, shared by 8 processor

This matches the Zen4 architecture having 32KB Data and 32KB instruction L1 cache per core, 1MB L2 cache per core and 256MB L3 cache total.

Fixes #238

@malfet
Copy link
Contributor

malfet commented May 12, 2025

@Flamefire can you please fix lint? (namely run clang-format on your PR)
Also, do you mind adding a few more references in the doc to the PR description that will help with reviewing this change

@Flamefire
Copy link
Contributor Author

@Flamefire can you please fix lint? (namely run clang-format on your PR) Also, do you mind adding a few more references in the doc to the PR description that will help with reviewing this change

Fixed the clang format issue and squashed it into the initial commit

I'm not sure what to put into the description in addition to what I have. I added a paragraph linking to the logic of CPUInfo itself. I.e. the issue is that while CPUInfo "knows" that it doesn't has information on the APIC ID it still uses it as-if it was valid.
As for WHY the information is not available I wasn't able to find any reliable information. But the change at least makes sense from the perspective of the existing logic and observations.

@malfet
Copy link
Contributor

malfet commented May 22, 2025

@pytorchbot rebase

Only consider processors with an APIC_ID as valid.

For processors with Hyperthreads but disabled SMT the APIC_ID will never be set and stays zero for the "disabled" threads.
Among other inconsistencies this causes the first active thread/processor to be considered "the same" as all the disabled ones as they share `APIC_ID=0`

It also doesn't make sense to make and decisions based on the APIC_ID if we don't have any information on it.
@Flamefire
Copy link
Contributor Author

Rebased manually. Not sure why the bot didn't do it.

@fbarchard
Copy link
Collaborator

#238 is for nnpack? Did you mean xnnpack?
The change looks okay.

I've seen similar issues on arm where cores are unavailable, but for different reasons
on low end devices such as nexus 5, overheating would turn off cpus, starting with the big cores.
on high end devices, but running on 32 bit, only medium cores are available.. the little and big/prime cores are disabled.
the cores are there, but not available, so we would prefer count the number of cores actually available, when deciding how many threads to use, and make achitectual choices based on the best available core

@Flamefire
Copy link
Contributor Author

Flamefire commented Jun 8, 2025

#238 is for nnpack? Did you mean xnnpack? The change looks okay.

NNPACK is correct, that's where I ran into a crash due to this. See the issue there: Maratyszcza/NNPACK#218

@GregoryComer GregoryComer merged commit 0d5985d into pytorch:main Nov 5, 2025
21 of 22 checks passed
@Flamefire Flamefire deleted the patch-1 branch November 5, 2025 18:13
@fbarchard
Copy link
Collaborator

This PR caused crashes on Android and Linux platforms
#339

So I've prepared a PR to rollback to the previous version
#340

@Flamefire
Copy link
Contributor Author

Flamefire commented Nov 12, 2025

Are you sure this is caused by this PR? I can't see how.

Stacktrace:

#00 pc 0910812b cpuinfo_compute_max_cache_size+37
#01 pc 091069a7 cpuinfo_x86_linux_init+4504

This leads to this call: https://github.com/Flamefire/cpuinfo/blob/b9558e923a0f915ddc2357116abc0d6aaaaea4e8/src/x86/linux/init.c#L638 and then this function:

uint32_t cpuinfo_compute_max_cache_size(const struct cpuinfo_processor* processor) {

There can only be a segfault if processor is invalid, which it is clearly not in this case. Or if one of the cache pointers is pointing to a wrong address.

Those pointers are set to members of an array whose sizes are calculated by cpuinfo_x86_count_objects using the same logic as that code that sets the pointers.
Additionally those pointers are assigned to in cpuinfo_x86_linux_init already so if writing to those doesn't crash, how could reading them?

Any idea what actually causes the crash on Android?

Because this code works for me on multiple (non-Android) systems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Incorrect cache-info output with nosmt linux kernel command line

5 participants